09-11-2009 2:21 PM
Hi
I'm currently investigating converting special characters for a unicode conversion project
eg I know that type x value '09' is the same as CL_ABAP_CHAR_UTILITIES=>HORIZONTAL_TAB, though I have no idea what 58 and FFFFFF are.
Is there a definitive list of what all these values represent anywhere?
Many Thanks
Chris
09-14-2009 11:08 AM
Hi Chris,
1) 7-bit ASCII characters:
-Non- Unicode:
Description of control characters with ASCII hex codes:
http://en.wikipedia.org/wiki/Control_character
- Unicode
In general characters with ASCII codes 00 to 7F (including the control chars) are equivalent to the first 127 Unicode codepoints which are U0000 to U007F.
However there are different encoding schemes for Unicode - for SAP systems utf-16 big endian / utf-16 little endian and utf-8 are relevant.
In general you need to consider, that abap Unicode systems use utf-16 as basis. The endianess depends on the used Hardware (please refer to SAP note 552464 ).
Regarding hex codes this actually means for the mentioned US7ASCII characters (including control characters):
Unicode code point U+00nn:
ASCII --> nn
UTF-8 --> nn
utf-16 LE --> /x nn00
utf-16 BE --> /x 00nn
Example:
Horizontal Tab:
Unicode code point: U+0009
ASCII --> 09
UTF-8 --> 09
utf-16 LE --> /x 0900
utf-16 BE --> /x 0009
2) For other Non-control characters, please have a look at e.g.
a) Non-Unicode (ASCII):
http://service.sap.com/~form/sapnet?_SHORTKEY=01100035870000380759&_OBJECT=011000358700000456542007E
--> page 5 and following pages
b) Unicode: General Unicode chart (C0 Controls and basic latin):
http://www.unicode.org/charts/PDF/U0000.pdf
c) UTF-8 char table
In addition please have a look at:
Unicode Enabling Guide:
--> Requirements of ABAP Programs in Unicode systems:
--> Page 45
Best regards,
Nils Buerckel
SAP AG
09-14-2009 11:08 AM
Hi Chris,
1) 7-bit ASCII characters:
-Non- Unicode:
Description of control characters with ASCII hex codes:
http://en.wikipedia.org/wiki/Control_character
- Unicode
In general characters with ASCII codes 00 to 7F (including the control chars) are equivalent to the first 127 Unicode codepoints which are U0000 to U007F.
However there are different encoding schemes for Unicode - for SAP systems utf-16 big endian / utf-16 little endian and utf-8 are relevant.
In general you need to consider, that abap Unicode systems use utf-16 as basis. The endianess depends on the used Hardware (please refer to SAP note 552464 ).
Regarding hex codes this actually means for the mentioned US7ASCII characters (including control characters):
Unicode code point U+00nn:
ASCII --> nn
UTF-8 --> nn
utf-16 LE --> /x nn00
utf-16 BE --> /x 00nn
Example:
Horizontal Tab:
Unicode code point: U+0009
ASCII --> 09
UTF-8 --> 09
utf-16 LE --> /x 0900
utf-16 BE --> /x 0009
2) For other Non-control characters, please have a look at e.g.
a) Non-Unicode (ASCII):
http://service.sap.com/~form/sapnet?_SHORTKEY=01100035870000380759&_OBJECT=011000358700000456542007E
--> page 5 and following pages
b) Unicode: General Unicode chart (C0 Controls and basic latin):
http://www.unicode.org/charts/PDF/U0000.pdf
c) UTF-8 char table
In addition please have a look at:
Unicode Enabling Guide:
--> Requirements of ABAP Programs in Unicode systems:
--> Page 45
Best regards,
Nils Buerckel
SAP AG
09-16-2009 11:11 AM