cancel
Showing results for 
Search instead for 
Did you mean: 

which is first Enrichment or De-duplication?

Former Member
0 Kudos

Hi Friends,

Which is we need to handle first for data quality, Enrichment or De-duplication?

Thank you

Shankar

Accepted Solutions (0)

Answers (5)

Answers (5)

Former Member
0 Kudos

HI Shankar,

Adding to the useful inputs from above,I think before any deduplication one has to have a data study which could tell you what is the shape of data,duplicacy,fill rate,fill rate of important attributes which are needed in Deduplication procedure etc.Such DQ report will come in handy in strategising way ahead.

If enrichment can be done in-house it should preceed the deduplication process so that it gives best results.If enrichment is a paid service one can go for phased deduplication after enrichment.

Thanks,

Ravi

0 Kudos

The steps should be as follows:

1) Eliminate obvious duplicates by performing initial screening of the data within the systems

2) Enrich the data (like D&B)

3) Identify duplicates across the systems

4) Eliminate duplicates

Former Member
0 Kudos

Hi all,

Enrichment is a misleading word because some times it includes also address validation while others refer to it as to "add information which was not there originally".

Hence if 3 different DQ functionalities are considered for CDI data (address validation, enrichment and de-duplication): address validation should be performed first, it will improve the quality of both de-duplication and enrichment (D&B for example). The enrichment will be next because it will also improve the de-duplication quality, and last will be the de-duplication.

If costs of enrichment are to be considered then you should verify if you pay per request or per successful response. If the charge is per request, then data should be as accurate as possible beforehand to reduce payment of failing to enrich bad data.

Edna

Former Member
0 Kudos

Hi Shankar,

Which process shd we take first ,depends upon various factors like No of records,Charge of the webservices to enrich data like D&B etc..

Server capabilitiesetc.

It is right to enrich data first but if data is to large then to enrich data willl be of great cost and then to maintain that also will be a tedious task.So i think first de-duplicate the data then go foe enrichment job.

Brad

Former Member
0 Kudos

Hi Shankar,

First we have to Enrich the Data.

After enriching the data only, we will be able to run de-duplication with precision. otherwise the de-duplication will be running on non-quality data and wont be that effective.

Hope this clarifies,

+ An

Former Member
0 Kudos

Hi,

Thats a very good question. It would depend on how you are enriching the data.

Say, you are going to use paid webservices like D&B which charge per record basis then you dont want to enrich your duplicate records n spend money on them! In this case, it makes sense to de-duplicate and then enrich.

But ifyou are using packaged tool or free webservices then as An said, perform enrichment first and then depulicate for better results.

rgds,

Ketan

Do reward points of reply was helpful

Former Member
0 Kudos

Thanks Ketan for adding the value.

+ An