cancel
Showing results for 
Search instead for 
Did you mean: 

character set conversion UTF-8 --> ISO-8859-1 generates question mark (?)

Former Member
0 Kudos

I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.

My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.

Is there a way to do the conversion without getting the question mark?

My code looks as follows:

public class ConvertEncoding {

	public static void main(String[] args) {
		
		String from = "UTF-8", to = "ISO-8859-1";
		String infile = "C:	empinfile.xml", outfile = "C:	empoutfile.xml";
		
		try {
			convert(infile, outfile, from, to);
			
		} catch (Exception e) {
			System.out.println(e.getMessage());
			System.exit(1);
		}
			
	}

	/**
	 * 
	 */
	private static void convert(String infile, String outfile,
								String from, String to)
					throws IOException, UnsupportedEncodingException 
	{
		//Set up byte streams
		
		InputStream in = null;
		OutputStream out = null;
		
		if(infile != null) {
			in = new FileInputStream(infile);
		}
		if(outfile != null) {
			out = new FileOutputStream(outfile);
		}
		
		//Set up character streams
		Reader r = new BufferedReader(new InputStreamReader(in, from));
		Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
		
		/*Copy characters from input to output.
		 * The InputSreamreader converts
		 * from Unicode to the output encoding.
		 * Characters that cannot be represented in
		 * the output encoding are output as '?'
		 *
		 */
		char[] buffer = new char[4096];
		int len;
		while((len = r.read(buffer))!= -1) { //Read a block of output
			w.write(buffer, 0, len);
		}
		r.close();
		w.flush();
		w.close();
		
		
	}
}

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

I suppose that your XML starts with 0xEFBBBF (XML UTF-8 encoding prefix) which goes to ? after conversion.

see http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing-no-ext-info

So you can check end remove if these bytes exist, or you should parse XML with parser and then save in new encoding.

Answers (0)