cancel
Showing results for 
Search instead for 
Did you mean: 

Cleaning Up Dataset Using Sort?

Former Member
0 Kudos

Problem:

Source Message contains all employee records. An employee is unique by the combination of EMPLOYEEGUID__C and EMPLOYEENUMBER. This is because an employee who started as a contractor will have both an FTE number and Contractor number, however, only one EMPLOYEEGUID__C can have HRACTIVE__C = TRUE.

I have been asked to provide an output that contains Active employee records and Inactive employee records. The target system primary key is only the EMPLOYEEGUID__C field, so I need to remove source EmployeeInformation elements that contain an EMPLOYEEGUID__C for an inactive EMPLOYEENUMBER which is also the same for an Active EMPLOYEENUMBER.

Example Scenario (Note, target field names used for example)::

<EmployeeInformation>

<EMPLOYEEGUID__C>1000</EMPLOYEEGUID__C>

<EMPLOYEENUMBER>90010</EMPLOYEENUMBER>

<HRACTIVE__C>FALSE</HRACTIVE__C>

</EmployeeInformation>

<EmployeeInformation>

<EMPLOYEEGUID__C>1001</EMPLOYEEGUID__C>

<EMPLOYEENUMBER>22</EMPLOYEENUMBER>

<HRACTIVE__C>TRUE</HRACTIVE__C>

</EmployeeInformation>

<EmployeeInformation>

<EMPLOYEEGUID__C>1000</EMPLOYEEGUID__C>

<EMPLOYEENUMBER>28</EMPLOYEENUMBER>

<HRACTIVE__C>TRUE</HRACTIVE__C>

</EmployeeInformation>

So in this example the result should only have two records: the EMPLOYEEGUID__C = 1000 has two employee numbers, the first one being inactive (90010) and the second is active (28).

The source records are not sorted by any of these three fields. So the active record for a GUID may not be first or last. Also a single GUID may have more than 2 employee numbers, only 1 active.

[Example of Mapping Structure|http://i53.tinypic.com/2lnvdk5.jpg]

I tried using Sort and SortByKey, but am struggling to figure out how to get all fields of an element sorted together. Also, not sure how to sort by two fields, EMPLOYEEGUID__C and Then by HRACTIVE__C. Can this be done with graphical mapping? Or do I need to use java or multiple mappings? Any help is appreciated.

Accepted Solutions (1)

Accepted Solutions (1)

stefan_grube
Active Contributor
0 Kudos

I recommend using XSLT as pre-mapping step to sort the fields. In graphical mapping you have to sort each field individually with sortbykey function.which makes the mapping very complex.

If you want to sort by two fields, you can do this with concat function.

Check my blog for sort and sortbykey:

/people/stefan.grube/blog/2005/12/29/new-functions-in-the-graphical-mapping-tool-xi-30-sp13

Former Member
0 Kudos

Stefan, I did see your blog prior to posting my question and was not sure if I had to use SortByKey for each field, and you confirmed that I would need that. I will try to put together a XSLT mapping as I agree doing this graphically might be much more work.

Former Member
0 Kudos

I implemented the XSLT solution to provide the sorting and filtering I require, which works but is very slow. My data file is over 30MB and it takes about 28 minutes on my SAP system to complete processing. I am not an XSL expert, and might have a design issue, but from review of XSLT by O'Rielly publishing, I came up with the following solution to provide my sorting. I do not know if the performance issue is simply XSL mapping, or the nested sort operation that is within the XSL. Any recommendations on performance improvement? (Note: My simple graphical mapping, which did not sort properly, completed in just under 5 min.) Source data set has over 17,000 EmployeeInformation elements.

	<xsl:template match="n:EmployeeDataset">
		<xsl:for-each select="n:EmployeeInformation">
			<xsl:sort select="n:EmployeeGUID"/>
			<!--<xsl:sort select="n:EmployeeKnownAsName"/>-->
			<xsl:variable name="uniquekey" select="n:EmployeeGUID"/>

			<xsl:if test="not(preceding-sibling::n:EmployeeInformation/n:EmployeeGUID=$uniquekey)">
				<xsl:for-each select="//n:EmployeeInformation[n:EmployeeGUID=$uniquekey]">
					<xsl:sort select="n:ActionType"/>
					<xsl:variable name="count" select="position()"/>
					<xsl:if test="$count=1">
						<USER_INFORMATION>
							<LASTNAME><xsl:value-of select="n:EmployeeLastName"/></LASTNAME>
							<FIRSTNAME><xsl:value-of select="n:EmployeeKnownAsName"/></FIRSTNAME>
							<DIVISION><xsl:value-of select="n:DepartmentName"/></DIVISION>
							<EMPLOYEESTATUS__C><xsl:value-of select="n:EmployeeStatus"/></EMPLOYEESTATUS__C>
							<HRACTIVE__C>
								<xsl:choose>
									<xsl:when test="n:LOAStatus = 'Active'">TRUE</xsl:when>
									<xsl:otherwise>FALSE</xsl:otherwise>
								</xsl:choose>
							</HRACTIVE__C>
						</USER_INFORMATION>
					</xsl:if>
				</xsl:for-each>
			</xsl:if>
		</xsl:for-each>
	</xsl:template>

stefan_grube
Active Contributor
0 Kudos

An XSLT mapping for 3ß MB file is indead very slow. You should mention such important things in SDN forum questions.

When the data come from ABAP system, couldn't you sort data before you send?

Former Member
0 Kudos

No, the data is source XML (from a third party system). In my case, neither end of the process involves an SAP system. The only middleware system I have available is XI, so trying to complete the XML to XML transformation with XI. Would re-doing the mapping in Java make it run faster? I do not know ABAP.

stefan_grube
Active Contributor
0 Kudos

You need not sort it. All you have to figure out is if there are redundant entries.

So you can write a UDF, find duplicates and give true or false as output.

inside the UDF you fisrt give all values into a hash table, then you easily find duplicates.

There are some examples in SDN forum that I used for similar stuff.

stefan_grube
Active Contributor
0 Kudos

Here a scetch of the Java routines:

First create the hashtable for all active employees:

// use hashtable for storing lookup data

Hashtable<String, String> hashtable = new Hashtable<String, String>();
      for(int i =0;i<a.length;i++){
    	 if (!(a<i>.equals(ResultList.CC)||a<i>.equals(ResultList.SUPPRESS))){ 
                                            if (b<i>.equals("TRUE"))
    		 hashtable.put(a<i>,c<i>);
                      }
     }

Then check hashtable for all inactive employees:

  for(int i =0;i<a.length;i++){
    	 if (!(a<i>.equals(ResultList.CC)||a<i>.equals(ResultList.SUPPRESS))){ 
                                            if (b<i>.equals("TRUE"))  // employee is active
                                                       result.addValue("true");
                                            else{
   			hashkey = a<i>;
    			hashvalue = hashtable.get(hashkey);
                                                                if (hashvalue == null){ // no active employee found
                                                                    result.addValue("true");
                                                                    // add this inactive employee also to hashtable, first come first serve
                                                                    hashtable.put(a<i>,c<i>);
                                                                }
                                                                else // active or ather inactive employee exists
                                                                    result.addValue("false");

                                           }
                      }
     }

Former Member
0 Kudos

Thank you Stefan for your solution. The performance is great! Your help is very much appreciated.

Answers (0)