on 05-07-2009 7:48 AM
Hi Experts,
My client may want to let MDM do data cleansing by using MDM Matching mode. Here I've got several questions:
1> Does the Token check in 'Transformation' tab have anything to do with the 'Token Equal' rule? For example,
For a token equal rule for 'Account Name' Field, I've added a) Account Name; b) a transformation with 'Corp' to 'Company' to it. The matching results has no difference whether I check or uncheck the transformation's token check box - why?
2> same as above condiction, if recrod A's account name = Wonderland Corp, B's account = Wonderland Company, if I apply the above rule to them, should the result be the exact success match score? I find the result has a partial success score even with a transformation from Corp to Company - why?
3> performance concern: if need to run matching strategy for daily dozen of records among a Million level records, how long will it probably take under the simpliest one-field-token-equal-rule?
Thank you and best regard!
Angela
Hi Angela,
Does the Token check in 'Transformation' tab have anything to do with the 'Token Equal' rule? For example,
For a token equal rule for 'Account Name' Field, I've added a) Account Name; b) a transformation with 'Corp' to 'Company' to it. The matching results has no difference whether I check or uncheck the transformation's token check box - why?
2> same as above condiction, if recrod A's account name = Wonderland Corp, B's account = Wonderland Company, if I apply the above rule to them, should the result be the exact success match score? I find the result has a partial success score even with a transformation from Corp to Company - why?
3> performance concern: if need to run matching strategy for daily dozen of records among a Million level records, how long will it probably take under the simpliest one-field-token-equal-rule
Yes, you have to create the matching rules ( in other fields) which you feel are the potential duplicates carriers. In case you are using the tokens in the transfomation tab such as " Corp" to "Company", and you are checking the Token Checkbox, that means
that only Corp will change to Company
otherwise, if keep it unchecked, then suppose your Account Name Field contains values as
AppleCorporation Corp - This will change to AppleCompanyoration Company ( That means every corp will change to Company) which might not be desirable. Hence go for Checking the tick box.
then for the Account Name field you will have to choose Token equals in Matching rules.
Also, Token equals goes for Fuzzy matches and hence can give partial success.
Here is an extract from Data Manager reference guide:
Equals is faster than Token Equals, which must perform more steps
2 Uses the keyword parsing, which enables matching to use existing keyword indexes
2 Score = Success * Number of Unique Matching Tokens / Total Number of Unique Tokens
NOTE ►► To greatly improve matching speeds, enable sort indexing
on fields used for Equals searches and enable keyword indexing on
fields used for Token Equals operations
And also if you have selected multiple rules in your stratergy, then the matching result will be considered taking all the rules into account.
Hope it helps.
Thanks and Regards
Nitin Jain
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
82 | |
10 | |
10 | |
9 | |
6 | |
6 | |
5 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.