Duplication of data plagues every database and mailing list. Duplication
is inevitable, constantly keeps growing and significantly erodes the
quality of your data. You can slow its progress, but it’s virtually
impossible to stop duplication of data completely.
In databases, duplication means that data mining totals, aggregates
and the key business decisions made on them will be inaccurate and therefore
misleading. With mailing lists the most obvious reason is to save the
money that would normally be wasted on printing, labeling and mailing
two or more mailing pieces to the same address; a fruitless repetition
of efforts. But there are even subtler and more deadly effects from
mailing duplicate pieces than simply wasting your money.
Topics:
deduplication service
de-duplication
Deduping Data
Data Scrubbing
Mailing List Cleaning
Mailing List Cleanup
Deduplication Prevents Customer Relations Disaster
Your customers feel offended if they get more than one piece
per mailing; after all, don’t you? The best that a customer will
think about getting several of the same mailing pieces from you
is that you're not very competent at keeping track of them (. . .or
of your own records, for that matter).
With donor appeals, the results can be even more disastrous.
Besides the normal drawbacks in customer perception, there is
also a chance that you have alienated a benefactor, loosing
their future donations as a result. There are different types
of duplicate records that occur in your mailing list:
Topics:
deduplication service
de-duplication
Deduping Data
Data Scrubbing
Mailing List Cleaning
Mailing List Cleanup
Exact Duplicates
Definition: In these records names and addresses are spelled
exactly the same way, including all spaces and any punctuation. These
are relatively easy to find. One of the most common ways is through
building and matching with a special field calculated for just this
purpose.
Cause: Since the consistency in order taking, sales
lead collection and mailing list management operations
required to create a large number of exact duplicate
records is almost non-existent in the business world,
a high count of exact duplicates indicates that the same
records have been appended to your table/file more than
once. Your data entry is probably not the problem; look
to your IT department or Database Administrator (DBA)
for the answer.
Near Duplicates:
Definition: This is where the people and destinations in
your list are the same but there is variation in the spelling
of the information or typos in the data. These are much more
difficult to cull out of your database; they don't really match,
but they are True Duplicates (see below) nonetheless. It takes
very sophisticated software to find these (see table below).
Cause: This happens primarily because of two things: Incomplete
or undecipherable data submitted for data entry (i.e. hand-written),
or bad data entry. (Lists gathered from Web sites where each person
enters their own data cause this and many other problems.) Avoiding
this problem is all but impossible, however training your data entry
people, or using professional data entry personnel to enter sales and
response data will help a lot.
True Duplicates:
Definition: These are also the same people and addresses
but are normally not identified by mail house de-duping
software because they are too different where the computer's
judgment is concerned. They can be an exaggerated Near Duplicate
but much more difficult to find. No matter what form they're in,
any set of records that will send more than one piece of mail
to the same people at the same location are True Duplicates.
There are special software applications on the market that can ferret these
anomalies out of your database, but they all require take some
skill to use properly for a fully cleaned list. Here at Northwest
Database Services, we use specially written de-duplication
software that can easily root out the kinds of Near and True
Duplicates seen in the table below:
Cause: Near duplicate records are difficult to find lurking
in your mailing list and yet they are constantly being put in during
data entry. Clean data and the assignment of a data management person
to review all new records before they are incorporated into your database
will go a long way to solving this common problem. A set of these in a typical
database might look like the table below . . .
Topics:
deduplication service
de-duplication
Deduping Data
Data Scrubbing
Mailing List Cleaning
Mailing List Cleanup
Data Pre - Deduplication
| Prefix |
First Name |
Last Name |
| Dr |
B J |
Harmon |
| Dr |
Robert |
Harmon MD |
| Mrs |
Mary |
Harmon |
| |
MaryBeth |
Harman |
| |
Robert & Mary Beth |
Harmon |
| Dr & Mrs |
Robert J |
Harmon |
| |
R J & Mary |
Harmonn |
| |
Bob |
Harman MD |
| |
Rob |
Harman |
| |
Dr |
Harmon |
| |
Mrs |
Harmon |
| |
Dr & Mrs |
Harmon |
Topics:
deduplication service
de-duplication
Deduping Data
Data Scrubbing
Mailing List Cleaning
Mailing List Cleanup