Data services include deduplication and data scrubbing from Northwesst Database Services

Deduplication Services Prevent Database & Mailing List Disasters

Data Duplication Wastes Your Money

Duplication of data plagues every database and mailing list. Duplication is inevitable, constantly keeps growing and significantly erodes the quality of your data. You can slow its progress, but it’s virtually impossible to stop duplication of data completely.

In databases, duplication means that data mining totals, aggregates and the key business decisions made on them will be inaccurate and therefore misleading. With mailing lists the most obvious reason is to save the money that would normally be wasted on printing, labeling and mailing two or more mailing pieces to the same address; a fruitless repetition of efforts. But there are even subtler and more deadly effects from mailing duplicate pieces than simply wasting your money.

Deduplication Prevents Customer Relations Disaster

Your customers feel offended if they get more than one piece per mailing; after all, don’t you? The best that a customer will think about getting several of the same mailing pieces from you is that you’re not very competent at keeping track of them (. . .or of your own records, for that matter).

With donor appeals, the results can be even more disastrous. Besides the normal drawbacks in customer perception, there is also a chance that you have alienated a benefactor, loosing their future donations as a result. There are different types of duplicate records that occur in your mailing list:

Exact Duplicates

Definition: In these records names and addresses are spelled exactly the same way, including all spaces and any punctuation. These are relatively easy to find. One of the most common ways is through building and matching with a special field calculated for just this purpose.

Cause: Since the consistency in order taking, sales lead collection and mailing list management operations required to create a large number of exact duplicate records is almost non-existent in the business world, a high count of exact duplicates indicates that the same records have been appended to your table/file more than once. Your data entry is probably not the problem; look to your IT department or Database Administrator (DBA) for the answer.

Near Duplicates:

Definition: This is where the people and destinations in your list are the same but there is variation in the spelling of the information or typos in the data. These are much more difficult to cull out of your database; they don’t really match, but they are True Duplicates (see below) nonetheless. It takes very sophisticated software to find these (see table below).

Cause: This happens primarily because of two things: Incomplete or undecipherable data submitted for data entry (i.e. hand-written), or bad data entry. (Lists gathered from Web sites where each person enters their own data cause this and many other problems.) Avoiding this problem is all but impossible, however training your data entry people, or using professional data entry personnel to enter sales and response data will help a lot.

True Duplicates:

Definition: These are also the same people and addresses but are normally not identified by mail house de-duping software because they are too different where the computer’s judgment is concerned. They can be an exaggerated Near Duplicate but much more difficult to find. No matter what form they’re in, any set of records that will send more than one piece of mail to the same people at the same location are True Duplicates. There are special software applications on the market that can ferret these anomalies out of your database, but they all require take some skill to use properly for a fully cleaned list. Here at Northwest Database Services, we use specially written de-duplication software that can easily root out the kinds of Near and True Duplicates seen in the table below:

Cause: Near duplicate records are difficult to find lurking in your mailing list and yet they are constantly being put in during data entry. Clean data and the assignment of a data management person to review all new records before they are incorporated into your database will go a long way to solving this common problem.