Application Development Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 

[abap2xlsx] ixml writes illegal xml character in UTF-8 Encoding

gregorw
Active Contributor

Hello abap2xlsx team,

since years we're using abap2xlsx to export and also import data to our CRM system. But now we faced the problem that this error occurred when opening the generated .xlsx:


Removed Part: /xl/sharedStrings.xml part with XML error.  (Strings) Illegal xml character.

I've started investigating and with the help of Firefox who marks clearly where a XML file has errors I discovered that the file contains the Character for "Thumbs up" (Unicode character inspector:   ). I've tracked down the problem further and found that it's not the fault of abap2xlsx. It seems to me to be a problem how the ixml class does the encoding conversion when UTF-8 Encoding is specified. Because when I set the encoding to UTF-16LE, then the    character is embedded in the XML file correctly and the file is valid.

I've started to search for SAP Notes in the application area BC-ABA-XML which is maintained for the package SIXML where the CL_IXML class is assigned. I've found:

1559677 - XML renderer generates invalid XML

1750204 - iXML: Treatment of special and invalid characters

To check if the    character is an invalid character I've implemented the provided code of note 1559677 in the following demo report. The report creates the XML using ixml but also manually. When the report is executed with Encoding UTF-8 then the XML generated by ixml is invalid. But when you execute using UTF-16, then also the ixml generated XML is valid.


REPORT zdemo_excel1_unicode_xml.

DATA: lv_string   TYPE string,

      lv_xml      TYPE string,

      lv_xml_x    TYPE xstring,

      lv_encoding TYPE abap_encoding,

      lv_xsting   TYPE xstring,

      lv_content  TYPE xstring.

PARAMETERS: p_chars TYPE string DEFAULT 'UTF-16'. " Export is OK for UTF-16, but try with UTF-8

" Class from Note 1559677 - XML renderer generates invalid XML

CLASS lcl_replace_chars DEFINITION FINAL.

  PUBLIC SECTION.

    CLASS-METHODS replace_invalid_xml_chars

      CHANGING c_string TYPE string.

    CLASS-METHODS class_constructor.

  PRIVATE SECTION.

    CLASS-DATA ctrls TYPE string.

    CLASS-DATA replc TYPE string VALUE ` `.

ENDCLASS.

CLASS lcl_replace_chars IMPLEMENTATION.

  METHOD replace_invalid_xml_chars.

    TRANSLATE c_string USING ctrls.

  ENDMETHOD.

  METHOD class_constructor.

    DO 32 TIMES.

      CHECK NOT ( sy-index = 10 OR sy-index = 11 OR sy-index = 14 ).

      ctrls = |{ ctrls }{ cl_abap_conv_in_ce=>uccpi( sy-index - 1 ) }| &

              |{ replc }|.

    ENDDO.

  ENDMETHOD.

ENDCLASS.

START-OF-SELECTION.

  " the following HEX sequence is the UTF-16LE representation for the text

  " Tumbs up + U+1F44D http://graphemica.com/%F0%9F%91%8D

  lv_xsting = '5400680075006D006200730020007500700020003DD84DDC'.

  " Convert the HEX sequence into a string

  TRY.

      lv_encoding = '4103'. " utf-16le

      cl_abap_conv_in_ce=>create(

        EXPORTING

          encoding                      = lv_encoding    " Input Character Format

          input                         = lv_xsting

        RECEIVING

          conv                          = DATA(lr_conv)    " New Converter Instance

      ).

      lr_conv->read(

        IMPORTING

          data                          = lv_string    " Data Object To Be Read

      ).

    CATCH cx_sy_conversion_codepage.

    CATCH cx_sy_codepage_converter_init.

    CATCH cx_parameter_invalid_type.

    CATCH cx_parameter_invalid_range.

  ENDTRY.

  " Create XML using IXML

  DATA(lo_ixml) = cl_ixml=>create( ).

  DATA(lo_encoding) = lo_ixml->create_encoding(

    byte_order = if_ixml_encoding=>co_platform_endian

    character_set = p_chars

  ).

  DATA(lo_document) = lo_ixml->create_document( ).

  lo_document->set_encoding( lo_encoding ).

  lo_document->set_standalone( abap_true ).

  DATA(lo_element_root)  = lo_document->create_simple_element( name   = 'demo'

                                                         parent = lo_document ).

  lo_element_root->set_value( value = lv_string ).

  " Create xstring stream

  DATA(lo_streamfactory) = lo_ixml->create_stream_factory( ).

  DATA(lo_ostream) = lo_streamfactory->create_ostream_xstring( string = lv_content ).

  DATA(lo_renderer) = lo_ixml->create_renderer(

                        ostream  = lo_ostream

                        document = lo_document

                      ).

  lo_renderer->render( ).

  " set a breakpoint here and use the View "XML-Browser" to display the content of lv_content

  WRITE: / lv_content.

  " Create XML by hand and convert to UTF-8

  CONCATENATE '<demo>' lv_string '</demo>' INTO lv_xml.

  CONCATENATE

    '<?xml version="1.0" encoding="utf-8" standalone="yes" ?>'

    lv_xml INTO lv_xml

    SEPARATED BY cl_abap_char_utilities=>cr_lf.

  DATA(lr_conv_out) = cl_abap_conv_out_ce=>create(

                        EXPORTING

                          encoding = 'UTF-8'    " Output Character Format

                      ).

  lr_conv_out->write( EXPORTING data = lv_xml ).

  lv_xml_x = lr_conv_out->get_buffer( ).

  " set a breakpoint here and use the View "XML-Browser" to display the content of lv_xml_x

  WRITE: lv_xml_x.

  " Test Method from Note 1559677 - XML renderer generates invalid XML

  " to remove invalid characters

  WRITE: / 'Before:', lv_string.

  lcl_replace_chars=>replace_invalid_xml_chars(

                CHANGING c_string = lv_string ).

  WRITE: / 'After:', lv_string.

Can you cross-check my investigation and the report. If you also think it's a bug that SAP should fix I will raise the Incident.

Best regards

Gregor

7 REPLIES 7

stefan_schmcker
Explorer
0 Kudos

Hello Gregor,

in my system the output "before" and "after" look the same and in the debugger the two variables seen in the xml-viewer seem to be ok.  ( see screenshot )

System: 7.40, EHP6

0 Kudos

Hi Stefan,

have you tried running with providing UTF-8 on the selection screen?

Best regards

Gregor

0 Kudos

It seems this problem was found also by others using abap2xlsx: Corrupted XML file when working with a particular ideogram.

0 Kudos

Did it now - same problem here...  XML-View incorrect

0 Kudos

I've filed the Incident 707322/2015.

0 Kudos

I've already got a response from SAP. It's an issue in the ixml class. It works internally with UCS-2. SAP is working on a Kernel patch to use internally UTF-16. ETA in about a month.

0 Kudos

SAP is providing a solution via a Kernel Patch that should be available next week. For details check out Note:

2220720 - iXML: Fehlende Unterstützung in UTF-8 von Unicode-Zeichen die länger als zwei Bytes sind

Currently only available in German. Thanks to for providing the fix.