cancel
Showing results for 
Search instead for 
Did you mean: 

What is the Complete procedure for finding Duplicates from EXcel Data.

Former Member
0 Kudos

What is the Complete procedure for finding Duplicates from EXcel Data.Give me Complete Description means from Start to End.

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

hello Chiru:

Unfortunately there's no "recipe" as you request. You first must asses the quality of your data.

The normal flow for you to find duplicates on Excel (or on any source for that matter) would be as follows:

1) Create the repository using MDM Console

2) Under MDM Import Manager, open your source data. Map field by field into the MDM Repository

3) Choose one or a combinations of matching fields, that is, a criteria which by you will know for sure if two records are the same. For instance, if you are talking about Employees, you can choose as Matching Fields "Name, LastName, Date of Birth". In this example, if two records have the same Name, LastName and Date of Birth you know for sure they are the same and MDM Import will deduplicate them.

4) Choose the Import actions, that is, what will happen in case of duplicated records, multiple matching, new records creation, etc

5) Perform the Import

6) Under MDM Data Manager, go into Matching mode

7) Create Transformations for specific fields you must identify. In the employee Example, you should create transformations for the Name Field, for instance, if someone made a mistake and wrote "Alejandro." instead of "Alejandro" (without the period). Eliminate, transformate and try to standarize relevant fields

😎 Create Rules that will be fed with your transformations. Once you have transformed the "Name" field, choose how many "Points" you will assign if after the transformation, a match is found. For instance, if two records have the same Name, you can assign 10 points (because it's pretty common that two people have the same name" but if birthdate is the same, assign 40 points, once again, because two people to have the same birth date is not that common.

9) Create strategies based on the rules. This will contain several rules (one for Names, one for Birthdates and so on)

10) Select the records on the MDM Repository

11) Run the recenlty created strategy

12) MDM will point out which records are more likely to be the same, based on the points awarded on your Rules.

13) Choose those records with enough points to be considered candidates to be deduplicated and Merge them, by choosing which caracteristics will "survive"

14) You have now found and consolidated duplicates

As you can see, you still have to create the rules that will tell you which records are duplicated. This is done by your data quality assesment and depending of your scenario (Employess, Material, Products and so on)

You can see detailed explanation on these on the Import Manager Reference guide and the Data Manager Reference guide too.

I hope this helps

Alejandro