cancel
Showing results for 
Search instead for 
Did you mean: 

Problem in SAX Java mapping

Former Member
0 Kudos

Hi,

I'm using SAX Java mapping in one scenario. Problem is when I get some Croatina characters, like Đ or u0160,

output XML is not valid. XML Spy complains, IE complains and so on. Customer is sure that data ( XML in CLOB field in Oracle DB) is UTF-8? What could be a problem?

What I'm doing is reading entire XML into string with help of BufferedReader, then do some manipulation and write String into byte array with:

			byte[] bytes = file.toString().getBytes("UTF-8");
			saxParser.parse(new ByteArrayInputStream(bytes), handler);

and then of course parse XML. readLine method reads data and problematic is "Ä�" - ￯0 - 0xC490.

For this character XML Spy doesn't complain, IE also. After conversion, this character looks like "Ä?" - 0xC43F, and this is not good any more. Why?

Accepted Solutions (1)

Accepted Solutions (1)

stefan_grube
Active Contributor
0 Kudos

What is file?

You could simple use:

saxParser.parse(in, handler);

where in comes from method parameter from the method

public void execute(InputStream in, OutputStream out)

Former Member
0 Kudos

Stefan, that is exactly what I'm dooing

stefan_grube
Active Contributor
0 Kudos

> Stefan, that is exactly what I'm dooing

No, you are doing: "What I'm doing is reading entire XML into string with help of BufferedReader, then do some manipulation and write String into byte array with:..."

In that code, that you not have provided, might be the issue.

Former Member
0 Kudos

Stefan, my entire code:

	public void execute(InputStream in, OutputStream out)
			throws com.sap.aii.mapping.api.StreamTransformationException {
		DefaultHandler handler = this;
		SAXParserFactory factory = SAXParserFactory.newInstance();
		try {
			SAXParser saxParser = factory.newSAXParser();
			fStreamOut = out;
			encoding = "UTF-8";
			InputStreamReader is = new InputStreamReader(in); // , "UTF-8");
			BufferedReader reader = new BufferedReader(is);
			StringBuffer file = new StringBuffer();
			String line = null;
			try {
				int ch;
				while ((line = reader.readLine()) != null) {
					file.append((char)ch);
					// file = file + line;
				}
			} catch (IOException e) {
				e.printStackTrace();
			} finally {
				try {
					in.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			// izbaci duplu oznaku za pou010Detak XML-a, jer onda puca SAX parser
			file = replaceREGEX(
					"<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>", "",
					file);

Why I'm doing this is that customer has an entire XML in CLOB field in DB, and when XI picks up XML it has some structure like

<?xml version="1.0" encoding="utf-8" ?> 
<msgTyp_El_Invoice_String>
<row>
  <DOCUMENT><?xml version="1.0" encoding="UTF-8"?>...</DOCUMENT>
</row>
</msgTyp_El_Invoice_String>

When I tried to parse code like this, SAX parser complains because of 2 occurencess of

<?xml version="1.0" encoding="UTF-8"? and therefore I replace second...and rest of code:

file.insert(0, "<?xml version=\"1.0\" encoding=\"utf-8\"?>");
			byte[] bytes = file.toString().getBytes(encoding);
			saxParser.parse(new ByteArrayInputStream(bytes), handler);
		} catch (Throwable t) {
			if (mappingTrace != null) {
				mappingTrace.addInfo(t.toString());
			}
			t.printStackTrace();
		}
	}

What do you think now?

stefan_grube
Active Contributor
0 Kudos

Use this code:

byte[] bbuf = new byte[in.available()];
in.read(bbuf);
String file = new String(bbuf,"UTF-8").replaceAll("<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>","");

That works for me.

Regards

Stefan

Former Member
0 Kudos

hm, how can I perform more than one replaceAll, because this will create an String and then if I want to do some other(second) replaceAll, I have the same problem...correct?

stefan_grube
Active Contributor
0 Kudos

The replaceAll is not the issue.

The encoding palys the role on converting byte[] / Stream -> String and back.

Former Member
0 Kudos

ok, but I can't figure it out where it is happening. What I tried to do(with your help) was this:


String encoding = "UTF-8";
byte[] bbuf = new byte[in.available()];
in.read(bbuf);
String file = new String(bbuf,"UTF-8").replaceAll("<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>","");
file = file.replaceAll("XX", "YY");
byte[] bytes = file.toString().getBytes(encoding);
saxParser.parse(new ByteArrayInputStream(bytes), handler);

what do you think where is the problem?

stefan_grube
Active Contributor
0 Kudos

why don't you just:

saxParser.parse(file, handler);

Former Member
0 Kudos

I figured out what is the problem. When I convert from string to byte array, some characters like Đ, u0160...

use UTF-8 Latin 2 suppl. with 2 byte and each byte is in decimal more than 127, so it is converted to -61....and that is not good. How can I convert String to InputStream and not to use byte[]?

stefan_grube
Active Contributor
0 Kudos

When you transfer String to byte, you can use:

new String(byte[],encoding);

and back:

string.getBytes(encoding);

In your example you need not convert the String as the saxparser works with string also.

Former Member
0 Kudos

>

I think this is not true, because when I try this, parser searches for a file

<workspace_path>/<?xml version="1.0" encoding="utf-8"?

?!

stefan_grube
Active Contributor
0 Kudos

You are right.

The parse(String,Handler) uses the String as URI.

I cannot see a reason, why your code does not work, as this works for me fine.

The only thing I see is the file.toString() which is unnecessary as file is already a string.

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

I've finally done it. Code as foollws:

	public void execute(InputStream in, OutputStream out)
			throws com.sap.aii.mapping.api.StreamTransformationException {
		DefaultHandler handler = this;
		SAXParserFactory factory = SAXParserFactory.newInstance();
		try {
			SAXParser saxParser = factory.newSAXParser();
			fStreamOut = new BufferedWriter(new OutputStreamWriter(out, "UTF-8"));
			encoding = "UTF-8";
			if (map != null) {
				mappingTrace = (MappingTrace) map
						.get(StreamTransformationConstants.MAPPING_TRACE);
			}
			InputStreamReader is = new InputStreamReader(in, "UTF8");
			BufferedReader reader = new BufferedReader(is);
			StringBuffer file = new StringBuffer();
			String line = new String();
			try {
				while ((line = reader.readLine()) != null) {
					file.append(line);
				}
			} catch (IOException e) {
				e.printStackTrace();
			} finally {
				try {
					in.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			Date d4 = new Date();
			file = replaceREGEX(
					"<\\?xml version=\"1\\.0\" encoding=\"UTF-8\"\\?>", "",
					file);
			char[] cArray = file.toString().toCharArray();
			Date filedat = new Date();;
			SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd_HHmmss_SSS");
			String fName = df.format(filedat) + "_El_Invoice.xml";
			Writer out1 = new BufferedWriter(new OutputStreamWriter(
					new FileOutputStream(fName), "UTF8"));
			try {
				out1.write(file.toString().toCharArray());
				out1.close();
			} catch (UnsupportedEncodingException e) {
			} catch (IOException e) {
			}

			saxParser.parse(fName, handler);
			File outFile = new File(fName);
			outFile.delete();
		} catch (Throwable t) {
			if (mappingTrace != null) {
				mappingTrace.addInfo(t.toString());
			}
			t.printStackTrace();
		}
	}

problem was also in method for writing in output stream, so I've changed it:

	private void printOutPut(String sOP) {
		try {
                  //    fStreamOut.write(sOP.getBytes());
			fStreamOut.write(sOP);
		} catch (IOException e) {
			e.notify();
		}
	}

Answers (2)

Answers (2)

Former Member
0 Kudos

try getBytes() without charset.

and check the source fileencoding it could be that the signs are allready UTF-8

or try with getBytes("ISO 8859-2");

Former Member
0 Kudos

getBytes() without encoding : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).

getBytes("ISO 8859-2") :java.io.UnsupportedEncodingException: ISO 8859-2

getBytes("ISO-8859-2") : Character conversion error: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (line number may be too low).

doesn't work ...:-(

ravi_raman2
Active Contributor
0 Kudos

Its a result of the characters being read as CData...

Change parser to dom......

Regards

Ravi Raman

Former Member
0 Kudos

I would rather not change SAX to DOM because of performance. Is there any other way?