Solved: Removing Line-Feed characters while reading a dat...

former_member209728 · ‎12-19-2011

Hi Experts,

I am facing a problem while reading a .txt file which was exported from an excel file and contains some box like characters (which i think are from alt+enter or line-feed). Below is a sample data from one of the rows:

RH LTMHI_FG1 1600 "HORIZONTAL [] FLANGE" 1 4

1 ABCD 1200 "HORIZONTAL [] FLANGE" 1 6

Please read '[]' as a box symbol (or hex "0D0A"')l because it is not possible to paste it here as the editor is taking that as a line-feed.

My problem is that the below code is reading data only up to the box symbol and is not reading the rest of the data.


OPEN DATASET v_file_name FOR INPUT IN TEXT MODE ENCODING DEFAULT.
CLEAR lv_record.
READ DATASET z_data_file INTO lv_record.

I have tried with all the line-feed options and the encoding options but no success. So please suggest me how should i either remove these box symbols or read the entire line. I have no control over how this .txt file is generated.

Thanks in advance.

Best Regards,

Kush Kashyap

SuhaSaha · ‎12-19-2011

Hello Kush,

Hex characters - 0D0A - represent Carriage Return + Line Feed(CRLF). Search the forum for the attribute CL_ABAP_CHAR_UTILITIES=>CR_LF, you'll get the idea.

BR,

Suhas

SuhaSaha · ‎12-19-2011

Hello Kush,

Hex characters - 0D0A - represent Carriage Return + Line Feed(CRLF). Search the forum for the attribute CL_ABAP_CHAR_UTILITIES=>CR_LF, you'll get the idea.

BR,

Suhas

Former Member · ‎12-19-2011

If I understand this correctly, you're getting:

RH LTMHI_FG1 1600 "HORIZONTAL

FLANGE" 1 4

1 ABCD 1200 "HORIZONTAL

FLANGE" 1 6

when you read this into an internal table? This is going to be virtually impossible to handle. Even in you read this into an binary string, as in:

open dataset p_file for input in binary mode.

read dataset p_file into p_content. "describd as xstring...

close dataset p_file.

you still have a problem in that you'd have to remove the 0D0A between the double quotes (and probably the double quotes). Simply writing a routine to remove 0D0A will remove the CRLF at the end of the records, and you then lose the end of record marker, and cannot properly parse the data.

If your records were fixed length you might be able to remove the 0D0A, and then utilize FM ISM_TRANSFORM_XSTRING_TO_TAB to put into a table, or perhaps one of the SCMS function modules to convert the xstring to something else. But, from your post it appears that you're not dealing with fixed length records in your input file.

Personally, I think you have to take the view that the data has to be corrected before it can be imported into SAP.

Former Member · ‎12-19-2011

if the CRLF are always inside the quoted string, you could use a regular expression to pull them out by searching only for the CRLF inside the quoted string and replacing it with a space or something. You would have to read the data into a binary string in order for the data to not be spread across multiple table entries. You can clean the data with the REGEX, then parse it into an internal table, since the CRLF at the end of line would still be intact. I am not a REGEX expert, but there is plenty of help on using REGEX out there.

Former Member · ‎12-19-2011

UNIX uses line feed LF (hex symbol 0A) as record separator. Windows use carriage return followed by line feed CRLF (0D0A) as record separator.

When you have a text file that is created on UNIX transferred to Windows as it is, the whole file will appear as a single string in notepad as Windows cannot interpret only line feed as a record separator. Similarly when you have a text file that is created on Windows transferred to UNIX as it is, the records will break at the line feed character but carriage return will show off

Window format file on UNIX

This is line 1<CR>
This is line 2<CR>
This is line 3<CR>

UNIX format file on Windows

This is line 1<LF>This is line 2<LF>This is line 3<LF>

Now READ DATASET can only read one record at a time when you OPEN DATASET in TEXT MODE (or you can READ into binary string by OPEN DATASET in BINARY MODE). So if your application server is UNIX and you are reading a Windows formatted file, you will see a CR at the end of each record that you read. The fact that READ DATASET is reading only up to the CR character in your case means that the record ends there, if you want to read more than one record then you need to read in a DO loop like below

OPEN DATASET v_file_name FOR INPUT IN TEXT MODE ENCODING DEFAULT.
  IF sy-subrc = 0.
    DO.
      READ DATASET v_file_name INTO lv_record.
      IF sy-subrc <> 0.
        EXIT.
      ELSE.
        APPEND lv_record TO itab.
      ENDIF.
    ENDDO.
  ENDIF.

Once you get the data into internal table you can eliminate the carriage return by using

REPLACE ALL OCCURRENCES OF cl_abap_char_utilities=>newline
IN lc_string WITH space.

OR

REPLACE ALL OCCURRENCES OF cl_abap_char_utilities=>cr_lf
IN lc_string WITH space.

Removing Line-Feed characters while reading a dataset