on 04-03-2015 6:09 PM
Hi experts,
I need to split a string into street, house number and supplement, for example:
"Hauptstr. 15 b" -> Street = Hauptstr, House number = 15, supplement = b
Does anyone have implemented this already and likes to share the Java code for UDF?
Thanks in advance
Stefan
var1 is your inpuy - Type String
var2 is the position - Type Int (1 for Street, 2- House Number, 3 - suppliment)
String[] splited = var1.split("\\s+");
if(splited.length > var2)
{
return splited[var2];
}
else
{
return "";
}
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
>>Addresses not always have supplement
This should be fine as long as there are no additional spaces in the input
>>>But sometimes the street comes with two parts; How can I identify, which part includes the number?
Hmm.. that's difficult to handle without knowing the pattern
At least does the street name ends with "." ?? like below
Schlierseer Str. 3
Nikolausstr. 15
Forsthausstr. 1-3 b
If that's true then below should work..
String[] splited = var1.split("\\s+");
ArrayList<String> list = new ArrayList<String>();
list.add("");
int flag = 0;
for (int i = 0; i < splited.length; i++) {
if ((!splited[i].matches(".*\\d+.*"))) {
if (flag == 0) {
list.set(0, list.get(0) + " " + splited[i]);
}
else {
list.add(splited[i]);
}
}
else {
flag = 1;
list.add(splited[i]);
}
}
if (list.size() > var2) {
return list.get(var2);
}
else {
return "";
}
Hi Stefan
Here is another option you can try out. It uses the capturing groups feature of regex.
I'm not sure what your source structure is, but the UDF below assumes the possibility that you might have multiple records having the street field. So it is written for execution type = "All values of a context", with single input and 3 output fields. Since it's all values of a context, it does not handle context change or suppress values.
You can change the delimiter into something that you won't be expecting in an address field, I just tried with the equals sign here, but it can be swapped with pipe (|), exclamation, question mark or some other character.
public void splitAddress(String[] address, ResultList street, ResultList house, ResultList supp, Container container)throws StreamTransformationException {
String delimiter = "=";
for(int i = 0; i < address.length; i++) {
String extract = null;
// Extract using regex capturing groups
if(address[i].matches(".*\\d$")) {
extract = address[i].replaceAll("(\\D+)\\W+(\\d+[-]*\\d*)", "$1" + delimiter + "$2");
} else {
extract = address[i].replaceAll("(\\D+)\\W+(\\d+[-]*\\d*)\\W+(\\D+)", "$1" + delimiter + "$2" + delimiter + "$3");
}
// Split extracted content to output fields
String[] lines = extract.split(delimiter);
if(lines.length > 0) {
street.addValue(lines[0]);
house.addValue(lines[1]);
if(lines.length > 2) {
supp.addValue(lines[2]);
} else {
supp.addValue(""); // Optional, can be removed if null context is preferred
}
}
}
}
I've tested this on Eclipse on the 5 values you provided and it manages to split it for all 5 cases. It also handles the case where the number field has a dash in the middle.
Rgds
Eng Swee
User | Count |
---|---|
92 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.