Identifying the Code Page of Files

former_member187668 · ‎01-17-2009

Hi,

We upgraded our 4.6c MDMP (4 Code pages) system to ERP6.0 Unicode. Before upgrade we have certain files on the applicaiton server which we intend to read after Upgrade also.

For some files while reading using OPEN DATASET I am getting conversion errors and am losing data in conversion. How can I open the old files without any loss of data..?

If I know the Code Page (CP) I can open the file using OPEN DATASET..COPEPAGE CP, but I dont know the CP of the file. I know we have methods to find if a file is UTF-8 or not, but do we have any method to identify the code page of the file?

Regards,

Ravikanth

markus_doehr2 · ‎01-19-2009

How do you open them now if you don´t know what codepage they are?

Markus

former_member187668 · ‎01-19-2009

Hi Markus,

That was my question. How do I know the codepage. If I know the CP I can open it using

OPEN DATASET ... IN LEGACY TEXT MODE CODE PAGE CP.

But the problem is I dont know the CP of the file. As I have 4 CPs in my original system it can be any one of them. One bad solution is to open the file in all 4 CPs. Whichever CP opens the file without conversion errors will be the correct CP of that file.

Any better method to identify the CP of the file?

Regards,

Ravikanth

markus_doehr2 · ‎01-19-2009

No - I meant

how did you open it before - under 4.6c?

If your program ran e. g. in chinese user context and the file was russian? How did you deal with that before?

Markus

former_member187668 · ‎01-24-2009

Hi Markus,

Luckily it didnt happen like that before.

Our scenario is files are created by a user team (and they can logon in their preferred languages). Generally these files are processed by the same user after some bussiness approvals. Users can take 2 or 3 days.

I see your point. I can use OPENDATASET..NON-UNICODE variant. An user created a file in chinese context. The same user process it after upgrade the system will use chinese code page to read and there will be no loss of data.

But immediately after upgrade if there are any leftover files (i.e. files which are not processed)

a different team will finish them in one day.

So we have a situation where a different user may process the file, hence the need to identify the code page.

Any solutions to it?

Thanks for your effort.

Regards,

Ravikanth

markus_doehr2 · ‎01-25-2009

Check the following notes:

Note 1066952 - Working with legacy data files in LSMW (Unicode, BOM, FTP)

Note 752859 - sapiconv - a tool for converting the encoding of files

The first note explains implications, the second note describes a tool that comes with the SAP kernel to convert files from one encoding to another. You may run that program (e. g. as external command) in your own programs before you actually import the file.

Markus

former_member187668 · ‎01-26-2009

Hi Markus,

Quite impressive notes with good details. The second note describes about converting the code pages of the file. But it does need the Source Code Page to convert, which is exactly I dont know.

One way could be to findout which user has created the file and accordingly guess the Code Page and convert it to Unicode. I can continue using OPENDATASET..ENCODING DEFAULT variant.

Anyother ways you can think of?

Regards,

Ravikanth

markus_doehr2 · ‎01-26-2009

If this is a Unix system you could do a

file <filename>

as external command. This should give you the correct codepage.

If that does not work for your OS you can get the newest "file" program and /etc/magic from

http://www.darwinsys.com/file/

An easier approach could be to simply create subdirectories for each language/codepage and tell the users to copy the files there (do it in an organizational manner rather than trying to circumvent possible errors).

Markus

former_member187668 · ‎01-29-2009

Hi Markus,

We resolved the issue by asking bussiness to process all the files before upgrade itself. They will not leave files to be processed after upgrade.

Thanks a lot Markus for your help. I could learn few more things regarding Code Pages

Regards,

Ravikanth

Edited by: Ravikanth Tunuguntla on Jan 29, 2009 3:40 PM