09-18-2015 1:27 PM
Hello abap2xlsx team,
since years we're using abap2xlsx to export and also import data to our CRM system. But now we faced the problem that this error occurred when opening the generated .xlsx:
Removed Part: /xl/sharedStrings.xml part with XML error. (Strings) Illegal xml character.
I've started investigating and with the help of Firefox who marks clearly where a XML file has errors I discovered that the file contains the Character for "Thumbs up" (Unicode character inspector: ). I've tracked down the problem further and found that it's not the fault of abap2xlsx. It seems to me to be a problem how the ixml class does the encoding conversion when UTF-8 Encoding is specified. Because when I set the encoding to UTF-16LE, then the character is embedded in the XML file correctly and the file is valid.
I've started to search for SAP Notes in the application area BC-ABA-XML which is maintained for the package SIXML where the CL_IXML class is assigned. I've found:
1559677 - XML renderer generates invalid XML
1750204 - iXML: Treatment of special and invalid characters
To check if the character is an invalid character I've implemented the provided code of note 1559677 in the following demo report. The report creates the XML using ixml but also manually. When the report is executed with Encoding UTF-8 then the XML generated by ixml is invalid. But when you execute using UTF-16, then also the ixml generated XML is valid.
REPORT zdemo_excel1_unicode_xml.
DATA: lv_string TYPE string,
lv_xml TYPE string,
lv_xml_x TYPE xstring,
lv_encoding TYPE abap_encoding,
lv_xsting TYPE xstring,
lv_content TYPE xstring.
PARAMETERS: p_chars TYPE string DEFAULT 'UTF-16'. " Export is OK for UTF-16, but try with UTF-8
" Class from Note 1559677 - XML renderer generates invalid XML
CLASS lcl_replace_chars DEFINITION FINAL.
PUBLIC SECTION.
CLASS-METHODS replace_invalid_xml_chars
CHANGING c_string TYPE string.
CLASS-METHODS class_constructor.
PRIVATE SECTION.
CLASS-DATA ctrls TYPE string.
CLASS-DATA replc TYPE string VALUE ` `.
ENDCLASS.
CLASS lcl_replace_chars IMPLEMENTATION.
METHOD replace_invalid_xml_chars.
TRANSLATE c_string USING ctrls.
ENDMETHOD.
METHOD class_constructor.
DO 32 TIMES.
CHECK NOT ( sy-index = 10 OR sy-index = 11 OR sy-index = 14 ).
ctrls = |{ ctrls }{ cl_abap_conv_in_ce=>uccpi( sy-index - 1 ) }| &
|{ replc }|.
ENDDO.
ENDMETHOD.
ENDCLASS.
START-OF-SELECTION.
" the following HEX sequence is the UTF-16LE representation for the text
" Tumbs up + U+1F44D http://graphemica.com/%F0%9F%91%8D
lv_xsting = '5400680075006D006200730020007500700020003DD84DDC'.
" Convert the HEX sequence into a string
TRY.
lv_encoding = '4103'. " utf-16le
cl_abap_conv_in_ce=>create(
EXPORTING
encoding = lv_encoding " Input Character Format
input = lv_xsting
RECEIVING
conv = DATA(lr_conv) " New Converter Instance
).
lr_conv->read(
IMPORTING
data = lv_string " Data Object To Be Read
).
CATCH cx_sy_conversion_codepage.
CATCH cx_sy_codepage_converter_init.
CATCH cx_parameter_invalid_type.
CATCH cx_parameter_invalid_range.
ENDTRY.
" Create XML using IXML
DATA(lo_ixml) = cl_ixml=>create( ).
DATA(lo_encoding) = lo_ixml->create_encoding(
byte_order = if_ixml_encoding=>co_platform_endian
character_set = p_chars
).
DATA(lo_document) = lo_ixml->create_document( ).
lo_document->set_encoding( lo_encoding ).
lo_document->set_standalone( abap_true ).
DATA(lo_element_root) = lo_document->create_simple_element( name = 'demo'
parent = lo_document ).
lo_element_root->set_value( value = lv_string ).
" Create xstring stream
DATA(lo_streamfactory) = lo_ixml->create_stream_factory( ).
DATA(lo_ostream) = lo_streamfactory->create_ostream_xstring( string = lv_content ).
DATA(lo_renderer) = lo_ixml->create_renderer(
ostream = lo_ostream
document = lo_document
).
lo_renderer->render( ).
" set a breakpoint here and use the View "XML-Browser" to display the content of lv_content
WRITE: / lv_content.
" Create XML by hand and convert to UTF-8
CONCATENATE '<demo>' lv_string '</demo>' INTO lv_xml.
CONCATENATE
'<?xml version="1.0" encoding="utf-8" standalone="yes" ?>'
lv_xml INTO lv_xml
SEPARATED BY cl_abap_char_utilities=>cr_lf.
DATA(lr_conv_out) = cl_abap_conv_out_ce=>create(
EXPORTING
encoding = 'UTF-8' " Output Character Format
).
lr_conv_out->write( EXPORTING data = lv_xml ).
lv_xml_x = lr_conv_out->get_buffer( ).
" set a breakpoint here and use the View "XML-Browser" to display the content of lv_xml_x
WRITE: lv_xml_x.
" Test Method from Note 1559677 - XML renderer generates invalid XML
" to remove invalid characters
WRITE: / 'Before:', lv_string.
lcl_replace_chars=>replace_invalid_xml_chars(
CHANGING c_string = lv_string ).
WRITE: / 'After:', lv_string.
Can you cross-check my investigation and the report. If you also think it's a bug that SAP should fix I will raise the Incident.
Best regards
Gregor
09-20-2015 10:45 AM
Hello Gregor,
in my system the output "before" and "after" look the same and in the debugger the two variables seen in the xml-viewer seem to be ok. ( see screenshot )
System: 7.40, EHP6
09-20-2015 12:00 PM
Hi Stefan,
have you tried running with providing UTF-8 on the selection screen?
Best regards
Gregor
09-20-2015 12:07 PM
It seems this problem was found also by others using abap2xlsx: Corrupted XML file when working with a particular ideogram.
09-20-2015 6:52 PM
09-21-2015 8:46 AM
09-24-2015 7:42 AM
I've already got a response from SAP. It's an issue in the ixml class. It works internally with UCS-2. SAP is working on a Kernel patch to use internally UTF-16. ETA in about a month.
10-01-2015 12:36 PM
SAP is providing a solution via a Kernel Patch that should be available next week. For details check out Note:
2220720 - iXML: Fehlende Unterstützung in UTF-8 von Unicode-Zeichen die länger als zwei Bytes sind
Currently only available in German. Thanks to for providing the fix.