on 12-07-2005 3:40 PM
Hello:
I've been working with MDM and clients usually find useful Import Manager features for de-duplication using an specific field. However, a big problem is how to find the duplicated registers that are not that evident, such as addresses, names, etc.
I usually give ideas such as sorting the data, keyword free-search etc, however I was wondering if anyone have other ideas for data de-duplication on, let's say, 10,000,000 registers scenarios, where sorting and searching do not seem that appealing.
Thanks
Alejandro
Hi
How do the capabilities of MDM5.5 SP4 matching compare to something like a dedicated 3rd party dedup tool such as Trillium?
Do the features such as fuzzy matching or phonetic matching in some of the 3rd party tools also exist in the native MDM matching?
When would you need to consider one over the other?
Thanks
Lawrence
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
At the risk of delivering a shameless plug, there are third-party certified connectors for data quality and deduplication. My company makes one of them. Trillium Software.
The tools can be used either as an interim step when migrating data, to cleanse data in place in SAP applications, or in real-time as users are entering names and address data.
http://www.trilliumsoftware.com/site/content/products/sap-data-migrations.asp
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I should check back more often. Yes, Trillium does offer CRM 5.0 integration.
Most companies use it (metaphorically) to clean the pond first, then the river. So, a batch process can be used to clean source systems during instance consolidation or legacy migration to CRM 5. Then, you'd keep the rivers of data clean with the real-time integration.
Trillium has address validation/standardization AND fuzzy matching. Both happen in a sub-second. In a real-time environment, we've developed it to be highly scaleable, so if you have a big call center with many concurrent transactions, you can add servers to keep it fast.
Hope that helps.
Hello,
We developed a matching strategy using multiple iterations of the import manager, we have successfully found duplicates and have even created a small program that does automatic merging of data that has a match rate over a certain threshhold.
The system works by normailzing the data while importing into MDM and then also tokenizing fields if required.
We run this as a batch process and create groups of similar items.
Contact me for more information if required.
Stanley Levin.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Nicolas and Alejandro,
We are currently completing and documenting the process, I would like to delay for about two weeks and then do a complete session where I can show you the process and also some documentation so that you can run it on your data.
I will update you as soon as we are ready.
Regards,
Stanley.
Hi Stanley,
I am new to MDM. We are also looking for a process to identify the duplicates and determine the data quality.
Would you share the process that you have developed and your experiences regarding data cleansing. How can I contact you?
My email is abhay_mhatre@colpal.com
Thanks and Regards,
Abhay
Hi Alejandro,
I guess the best way to find duplicates is to work with a strategy that calculates scores regarding the similarity of records.
You could then define a higher treshold, so that the system automatically merges records that have a score higher than that treshold. If the score is lower than a defined lower treshold, the compared records are no duplicates. For scores that are in between the higher and the lower treshold, user interaction is necessary to decide. That of course could be a lot of work.
The matching strategies that were delivered for MDM 3.00 give a good example on how such a strategy could work.
There is some excellent documentation available for these matching strategies in the Service Marketplace:
https://websmp204.sap-ag.de/instguides -> SAP Netweaver ->
Release 04 -> Operations -> Component SAP MDM -> MDM 3.00 - Operations Guides
Br
Lars
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Lars,
we implemented exactly this scenario with MDM3.0
Now the customer wants the same functionality in MDM5.5
Since there is no Content Integrator to define any duplication logic and since the de-duplication part promised for SP03 was not delivered we have to find a workaround to mirror the old functionality.
Do you have any advices how to define a threshold with the MDM5.5 standard?
Regards
Nico
User | Count |
---|---|
85 | |
10 | |
10 | |
9 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.