cancel
Showing results for 
Search instead for 
Did you mean: 

Looking for UDF in mapping for splitting street

stefan_grube
Active Contributor
0 Kudos

Hi experts,

I need to split a string into street, house number and supplement, for example:

"Hauptstr. 15 b" -> Street = Hauptstr, House number = 15, supplement = b

Does anyone have implemented this already and likes to share the Java code for UDF?

Thanks in advance

Stefan

Accepted Solutions (1)

Accepted Solutions (1)

former_member184720
Active Contributor
0 Kudos

var1 is your inpuy - Type String

var2 is the position - Type Int (1 for Street, 2- House Number, 3 - suppliment)

String[] splited = var1.split("\\s+");

if(splited.length > var2)

{

return splited[var2];

}

else

{

return "";

}

stefan_grube
Active Contributor
0 Kudos

Thank you Hareesh, I think this helps me a lot.

Addresses not always have supplement, but sometimes the street comes with two parts:

Schlierseer Str. 3

Esterfelder Stiege 60 a

Nikolausstr. 15

Forsthausstr. 1-3 b

How can I identify, which part includes the number?

former_member184720
Active Contributor
0 Kudos

>>Addresses not always have supplement

This should be fine as long as there are no additional spaces in the input


>>>But sometimes the street comes with two parts; How can I identify, which part includes the number?

Hmm.. that's difficult to handle without knowing the pattern

At least does the street name ends with  "."  ?? like below

Schlierseer Str. 3

Nikolausstr. 15

Forsthausstr. 1-3 b

stefan_grube
Active Contributor
0 Kudos

I have only some test data, so I cannot be sure, that real data come always like this.

I think I will check for numbers, as street name and supplement should not include any number.

former_member184720
Active Contributor
0 Kudos

If that's true then below should work..

String[] splited = var1.split("\\s+");

ArrayList<String> list = new ArrayList<String>();

list.add("");

int flag = 0;

for (int i = 0; i < splited.length; i++) {

if ((!splited[i].matches(".*\\d+.*"))) {

if (flag == 0) {

list.set(0, list.get(0) + " " + splited[i]);

}

else {

list.add(splited[i]);

}

}

else {

flag = 1;

list.add(splited[i]);

}

}

if (list.size() > var2) {

return list.get(var2);

}

else {

return "";

}

engswee
Active Contributor
0 Kudos

Hi Stefan

Here is another option you can try out. It uses the capturing groups feature of regex.

I'm not sure what your source structure is, but the UDF below assumes the possibility that you might have multiple records having the street field. So it is written for execution type = "All values of a context", with single input and 3 output fields. Since it's all values of a context, it does not handle context change or suppress values.

You can change the delimiter into something that you won't be expecting in an address field, I just tried with the equals sign here, but it can be swapped with pipe (|), exclamation, question mark or some other character.


public void splitAddress(String[] address, ResultList street, ResultList house, ResultList supp, Container container)throws StreamTransformationException {

  String delimiter = "=";

 

  for(int i = 0; i < address.length; i++) {

   String extract = null;

   // Extract using regex capturing groups

   if(address[i].matches(".*\\d$")) {

    extract = address[i].replaceAll("(\\D+)\\W+(\\d+[-]*\\d*)", "$1" + delimiter + "$2");

   } else {

    extract = address[i].replaceAll("(\\D+)\\W+(\\d+[-]*\\d*)\\W+(\\D+)", "$1" + delimiter + "$2" + delimiter + "$3");

   }

   // Split extracted content to output fields

   String[] lines = extract.split(delimiter);

   if(lines.length > 0) {

    street.addValue(lines[0]);

    house.addValue(lines[1]);

    if(lines.length > 2) {

     supp.addValue(lines[2]);

    } else {

     supp.addValue(""); // Optional, can be removed if null context is preferred

    }

   }

  }

}

I've tested this on Eclipse on the 5 values you provided and it manages to split it for all 5 cases. It also handles the case where the number field has a dash in the middle.

Rgds

Eng Swee

stefan_grube
Active Contributor
0 Kudos

This works perfect. Thank you a lot.

Answers (0)