on 10-03-2006 9:41 AM
I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.
My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.
Is there a way to do the conversion without getting the question mark?
My code looks as follows:
public class ConvertEncoding {
public static void main(String[] args) {
String from = "UTF-8", to = "ISO-8859-1";
String infile = "C: empinfile.xml", outfile = "C: empoutfile.xml";
try {
convert(infile, outfile, from, to);
} catch (Exception e) {
System.out.println(e.getMessage());
System.exit(1);
}
}
/**
*
*/
private static void convert(String infile, String outfile,
String from, String to)
throws IOException, UnsupportedEncodingException
{
//Set up byte streams
InputStream in = null;
OutputStream out = null;
if(infile != null) {
in = new FileInputStream(infile);
}
if(outfile != null) {
out = new FileOutputStream(outfile);
}
//Set up character streams
Reader r = new BufferedReader(new InputStreamReader(in, from));
Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
/*Copy characters from input to output.
* The InputSreamreader converts
* from Unicode to the output encoding.
* Characters that cannot be represented in
* the output encoding are output as '?'
*
*/
char[] buffer = new char[4096];
int len;
while((len = r.read(buffer))!= -1) { //Read a block of output
w.write(buffer, 0, len);
}
r.close();
w.flush();
w.close();
}
}
I suppose that your XML starts with 0xEFBBBF (XML UTF-8 encoding prefix) which goes to ? after conversion.
see http://www.w3.org/TR/2004/REC-xml-20040204/#sec-guessing-no-ext-info
So you can check end remove if these bytes exist, or you should parse XML with parser and then save in new encoding.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
90 | |
10 | |
10 | |
10 | |
7 | |
7 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.