cancel
Showing results for 
Search instead for 
Did you mean: 

Doubts in Uniqueness Profiling in SAP BOIS?

former_member188628
Participant
0 Kudos

Hi All

      I have few doubts related to the results of the Uniqueness Profiling being displayed. I have a table holding Customer Related information. If i select

Address,City,Combined_Names,Company_Name,Country, the output gets displayed as 100% unique:-

1)Why the  fields Email, Phone1,Phone2,State,Web are highlighted in black? In my original data the codes for the states are repeated within the same column "State", the  country code of the phone numbers are repeated in phone1 and phone2,there is repetition in the web "http://www" etc.


2)For comparison will the complete field or the part of the field is even considered if certain repetitions are there:-

The above shows a part of my input file. Under the "Address" field can  the records ending up with st and Ave,similarly,the records ending up with "ESQ" under "Company_Name", and "Mr., Mrs. under the Combined names be considered as non unique records be considered as non unique (because of  their repetition as a part of the records)?

3) Why unique is displayed if we take all the data(Address,city,Combined_Names,Company_Name,Country),whereas if we take only City and Country the output involves non _Uniqueness as well??


Accepted Solutions (1)

Accepted Solutions (1)

former_member187605
Active Contributor
0 Kudos

1/. From the documentation, SAP Informatuion Steward User Guide, section 2.9.6 View Uniqueness profile results:

"

For both the unique and non-unique data in the bottom pane, the columns that are part of the uniqueness criteria are highlighted.

"

This might be a bug , because the wrong columns are in bold. I get the same behaviour, but never noticed it .

2/. Uniqueness is based on full columns only.

3/. Uniqueness is based on the combination of columns you specify. That means for you that any combination of Address, City, Combined_Names, Company_Name and Country is unique. But that there are records having the same combination of City and Country (companies in the same city, obviously).

former_member188628
Participant
0 Kudos

Hi Dirk

    I do have few doubts further.Attached below the screenshots of the outputs for both the cases:-

In this  case there are certain exact matching data,whereas there are few records where part of one record(or entire record is contained in another record), like :

FairbanksFairbanks North Star

How can we visualize such data?Are They repetitive or uniqueApart from these the data sets highlighted in orange are actually unique ,however such data has been displayed I mean only duplicate records should have been displayed right??

Regards

Moumita

former_member187605
Active Contributor
0 Kudos

1/. This cannot be done by profiling. Use Rules for identifying those records.

Note: don't get confused between rows or records (horizontal) and columlns (vertical).

2/. Are you sure this is the duplicate record output? If so, that means there are at least 2 companies in each location. IS always shows one record only.

former_member188628
Participant
0 Kudos

Hi Dirk

    Kindly Excuse me,there has been a silly misunderstanding w.r.t to results from my end

I did not notice the "Duplicate Count", which is displaying the count of repetition of the records in this case, and my doubts are clear! The Information Steward is looking for the combination of records which are repeating for e.g:-

FairbanksFairbanks North Star

i.e the count of the above combination of the  records are being searched across the columns and their frequency of repetition has been recorded.Here the above combination has been repeated twice across the entire table

Thanks a lot for clarifying the thing

Regards

Moumita

Answers (0)