on 02-25-2009 4:45 PM
Hi,
I'm using SAX Java mapping in one scenario. Problem is when I get some Croatina characters, like Đ or u0160,
output XML is not valid. XML Spy complains, IE complains and so on. Customer is sure that data ( XML in CLOB field in Oracle DB) is UTF-8? What could be a problem?
What I'm doing is reading entire XML into string with help of BufferedReader, then do some manipulation and write String into byte array with:
byte[] bytes = file.toString().getBytes("UTF-8");
saxParser.parse(new ByteArrayInputStream(bytes), handler);
and then of course parse XML. readLine method reads data and problematic is "�" - 0 - 0xC490.
For this character XML Spy doesn't complain, IE also. After conversion, this character looks like "Ä?" - 0xC43F, and this is not good any more. Why?
What is file?
You could simple use:
saxParser.parse(in, handler);
where in comes from method parameter from the method
public void execute(InputStream in, OutputStream out)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Stefan, my entire code:
public void execute(InputStream in, OutputStream out)
throws com.sap.aii.mapping.api.StreamTransformationException {
DefaultHandler handler = this;
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
fStreamOut = out;
encoding = "UTF-8";
InputStreamReader is = new InputStreamReader(in); // , "UTF-8");
BufferedReader reader = new BufferedReader(is);
StringBuffer file = new StringBuffer();
String line = null;
try {
int ch;
while ((line = reader.readLine()) != null) {
file.append((char)ch);
// file = file + line;
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
// izbaci duplu oznaku za pou010Detak XML-a, jer onda puca SAX parser
file = replaceREGEX(
"<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>", "",
file);
Why I'm doing this is that customer has an entire XML in CLOB field in DB, and when XI picks up XML it has some structure like
<?xml version="1.0" encoding="utf-8" ?>
<msgTyp_El_Invoice_String>
<row>
<DOCUMENT><?xml version="1.0" encoding="UTF-8"?>...</DOCUMENT>
</row>
</msgTyp_El_Invoice_String>
When I tried to parse code like this, SAX parser complains because of 2 occurencess of
<?xml version="1.0" encoding="UTF-8"? and therefore I replace second...and rest of code:
file.insert(0, "<?xml version=\"1.0\" encoding=\"utf-8\"?>");
byte[] bytes = file.toString().getBytes(encoding);
saxParser.parse(new ByteArrayInputStream(bytes), handler);
} catch (Throwable t) {
if (mappingTrace != null) {
mappingTrace.addInfo(t.toString());
}
t.printStackTrace();
}
}
What do you think now?
ok, but I can't figure it out where it is happening. What I tried to do(with your help) was this:
String encoding = "UTF-8";
byte[] bbuf = new byte[in.available()];
in.read(bbuf);
String file = new String(bbuf,"UTF-8").replaceAll("<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>","");
file = file.replaceAll("XX", "YY");
byte[] bytes = file.toString().getBytes(encoding);
saxParser.parse(new ByteArrayInputStream(bytes), handler);
what do you think where is the problem?
I figured out what is the problem. When I convert from string to byte array, some characters like Đ, u0160...
use UTF-8 Latin 2 suppl. with 2 byte and each byte is in decimal more than 127, so it is converted to -61....and that is not good. How can I convert String to InputStream and not to use byte[]?
Hi Stefan,
I've finally done it. Code as foollws:
public void execute(InputStream in, OutputStream out)
throws com.sap.aii.mapping.api.StreamTransformationException {
DefaultHandler handler = this;
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
fStreamOut = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
encoding = "UTF-8";
if (map != null) {
mappingTrace = (MappingTrace) map
.get(StreamTransformationConstants.MAPPING_TRACE);
}
InputStreamReader is = new InputStreamReader(in, "UTF8");
BufferedReader reader = new BufferedReader(is);
StringBuffer file = new StringBuffer();
String line = new String();
try {
while ((line = reader.readLine()) != null) {
file.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Date d4 = new Date();
file = replaceREGEX(
"<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>", "",
file);
char[] cArray = file.toString().toCharArray();
Date filedat = new Date();;
SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd_HHmmss_SSS");
String fName = df.format(filedat) + "_El_Invoice.xml";
Writer out1 = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fName), "UTF8"));
try {
out1.write(file.toString().toCharArray());
out1.close();
} catch (UnsupportedEncodingException e) {
} catch (IOException e) {
}
saxParser.parse(fName, handler);
File outFile = new File(fName);
outFile.delete();
} catch (Throwable t) {
if (mappingTrace != null) {
mappingTrace.addInfo(t.toString());
}
t.printStackTrace();
}
}
problem was also in method for writing in output stream, so I've changed it:
private void printOutPut(String sOP) {
try {
// fStreamOut.write(sOP.getBytes());
fStreamOut.write(sOP);
} catch (IOException e) {
e.notify();
}
}
try getBytes() without charset.
and check the source fileencoding it could be that the signs are allready UTF-8
or try with getBytes("ISO 8859-2");
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
getBytes() without encoding : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).
getBytes("ISO 8859-2") :java.io.UnsupportedEncodingException: ISO 8859-2
getBytes("ISO-8859-2") : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).
doesn't work ...:-(
Its a result of the characters being read as CData...
Change parser to dom......
Regards
Ravi Raman
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
86 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.