on 06-09-2016 11:04 AM
Hi All
I have few doubts related to the results of the Uniqueness Profiling being displayed. I have a table holding Customer Related information. If i select
Address,City,Combined_Names,Company_Name,Country, the output gets displayed as 100% unique:-
1)Why the fields Email, Phone1,Phone2,State,Web are highlighted in black? In my original data the codes for the states are repeated within the same column "State", the country code of the phone numbers are repeated in phone1 and phone2,there is repetition in the web "http://www" etc.
2)For comparison will the complete field or the part of the field is even considered if certain repetitions are there:-
The above shows a part of my input file. Under the "Address" field can the records ending up with st and Ave,similarly,the records ending up with "ESQ" under "Company_Name", and "Mr., Mrs. under the Combined names be considered as non unique records be considered as non unique (because of their repetition as a part of the records)?
3) Why unique is displayed if we take all the data(Address,city,Combined_Names,Company_Name,Country),whereas if we take only City and Country the output involves non _Uniqueness as well??
1/. From the documentation, SAP Informatuion Steward User Guide, section 2.9.6 View Uniqueness profile results:
"
For both the unique and non-unique data in the bottom pane, the columns that are part of the uniqueness criteria are highlighted.
"
This might be a bug , because the wrong columns are in bold. I get the same behaviour, but never noticed it .
2/. Uniqueness is based on full columns only.
3/. Uniqueness is based on the combination of columns you specify. That means for you that any combination of Address, City, Combined_Names, Company_Name and Country is unique. But that there are records having the same combination of City and Country (companies in the same city, obviously).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Dirk
I do have few doubts further.Attached below the screenshots of the outputs for both the cases:-
In this case there are certain exact matching data,whereas there are few records where part of one record(or entire record is contained in another record), like :
Fairbanks | Fairbanks North Star |
How can we visualize such data?Are They repetitive or uniqueApart from these the data sets highlighted in orange are actually unique ,however such data has been displayed I mean only duplicate records should have been displayed right??
Regards
Moumita
1/. This cannot be done by profiling. Use Rules for identifying those records.
Note: don't get confused between rows or records (horizontal) and columlns (vertical).
2/. Are you sure this is the duplicate record output? If so, that means there are at least 2 companies in each location. IS always shows one record only.
Hi Dirk
Kindly Excuse me,there has been a silly misunderstanding w.r.t to results from my end
I did not notice the "Duplicate Count", which is displaying the count of repetition of the records in this case, and my doubts are clear! The Information Steward is looking for the combination of records which are repeating for e.g:-
Fairbanks | Fairbanks North Star |
i.e the count of the above combination of the records are being searched across the columns and their frequency of repetition has been recorded.Here the above combination has been repeated twice across the entire table
Thanks a lot for clarifying the thing
Regards
Moumita
User | Count |
---|---|
81 | |
10 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.