cancel
Showing results for 
Search instead for 
Did you mean: 

Input additional records are not parsed by Cleansing Package Builder

venkataramana_paidi
Contributor
0 Kudos

Hi ,

I have created the cleansing package builder  using the below sample data.

large mushrooms sausage pepperoni stuffed crust

medium deluxe thin crust

small vegetarian handtossed

personal hamburger bacon cheese pan

large cheese thin crust

After creating the cleansing package  I generated the atl file and imported into Data Services.

Now I added one additional record . Now my input is  as shown below

large mushrooms sausage pepperoni stuffed crust

medium deluxe thin crust

small vegetarian handtossed

personal hamburger bacon cheese pan

large cheese thin crust

handtossed sausage large

But after execution of the first 5 records parsed correctly but last record is not parsed . It moved to extra record and  generated the quality codes R001,I901.

In the last record  handtossed already there in the 3rd record . large is in first record.  It should give only sausage as extra/additional. 

But why last record is not parsed by the transform.

How I am going to fix this issue.

Thanks & Regards,

Ramana.

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

Hi Ramana,

What are your rules for this custom cleansing package? Since this is not parsing I am guessing you do not have a rule that recognizes the pattern of the new record you added. To parse that record you will need a rule that recognizes the pattern, Crust - Topping - Size. Adding an appropriate rule should allow the record to parse. Thanks.

Doug

venkataramana_paidi
Contributor
0 Kudos

Hi Doug,

How can I add new rule in the cleansing package .  I have tried in the advance mode , there create custom rule option is there . But  We can create rule using only user defined patterns only.

If I want to create the custom rule using existing attributes , how can I follow.

Please suggest me in this case.

Thanks & Regards,

Ramana.

0 Kudos

Hi Ramana,

If you are using the Advanced Mode and are clicking on the category context menu and choosing Add Rule, that should add a new rule blank to the package. You will need to manually add the rule definition and rule action information to parse the data you want. For example something like the following:

Rule Defintion:

CRUST+

TOPPING+

SIZE;

Rule Action:

action = PIZZA;

PIZZA = 1 : CRUST : 1;

PIZZA = 1 : TOPPING : 2;

PIZZA = 1 : SIZE : 3;

format = PIZZA : TOPPING : 2;

format = PIZZA : CRUST : 1;

format = PIZZA : SIZE : 3;

format = PIZZA : PIZZA : 2 + " " + 3 + " " + 1;

end_action

This would parse 'handtossed sausage large' assuming those three words are in the cleansing package with the appropriate attribute.

You could also use Design mode and import a new sample file that has the example records you want to parse. You can drag new entries to the appropriate attribute and rules will be autogenerated. I hope that helps.

Doug

venkataramana_paidi
Contributor
0 Kudos

Hi Doug,

Thanks for your support . Now it is working fine.

Thanks & Regards,

Ramana.

Answers (2)

Answers (2)

Former Member
0 Kudos

Hi Ramana,

I really thankful for such clear example and explanation...

First thing in parsing we nee to understand the data, delimeters(data seperaters) and patterns, so we can write the rules accordingly to extract accurate data.

As per my understanding, lets say if we have 3 kinds of pattern available in source, we need to maintain 3 rules inorder to extract the data.

Thanks!!

venkataramana_paidi
Contributor
0 Kudos

Yes Lavanya,

Number of the rule files based on number of patterns of the input data .

Thanks & Regards,

Ramana.

Former Member
0 Kudos

Hi Ramana/ Doug,

I am new to Data Quality. Can you please Explain me what is parsing and why do we use.

Thanks,

Lavanya

venkataramana_paidi
Contributor
0 Kudos

Hi Lavanya,

Parsing part of address cleansing ,data cleansing  and  text data processing also .  Parsing works with functionality of lookup.

If you have  one source field  , it extract the particular value from the source field and populate on the target fields.

You can see my above example  for understanding the parsing .

Let explain the same example how it will work.

This is my input data  about pizza information .

large mushrooms sausage pepperoni stuffed crust

medium deluxe thin crust

small vegetarian handtossed

personal hamburger bacon cheese pan

large cheese thin crust

handtossed sausage large

This is the one field description .

Here large,medium,small,personal are pizza sizes.

thin crust ,handtossed, pan,Stuffed Crust are pizza crust details.

mushrooms, sasuage,pepperoni,deluxe,vegetarian,humburger,bacon and cheese are toppings.

We want to extract the these values of respective pizza information  into size,crust,topping fields .

Come to normal data integration , we can prepare the lookup values with two fields like below

TYPE,VALUE

SIZE, large

SIZE,medium

SIZE,small

CRUST,thin crust

CRUST,handtossed

TOPPING,cheese

-------------------

------------------

First we have description with combination of the all values . we have to do element analysis of the source . Means we have break the description into different values . If we want to break also  very difficult to maintain the criteria . For breaking the description  consider space is delimiter  then how can we extract the thin crust ,stuffed crust etc values having the spaces.   Some times we may having the multiple values for single field like mushrooms pepperoni sausage are three different toppings  in the first description

Come to Data quality .

In this case we will go with data quality .   Here also we need to parse the values on some criteria or rules .

See first 5 descriptions having the same order of size, topping and crust. If you see in the last description having the crust,topping and size values .

Here we have to build two types of rules to parse the values because of order.  for multiple values also  you have to maintain the rule file . Base on rule file  quality transform will parse.

Even address cleansing , text data processing will work on same criteria only.

I hope you will understand the parsing .

Thanks & Regards,

Ramana.