cancel
Showing results for 
Search instead for 
Did you mean: 

Parsing Error when loading csv files that contain double quotes

Former Member
0 Kudos

I am loading a data file using Hana Studio. The data file is four fields, all of them NVARCHAR. The data comes from a Teradata database.

Some of the records'  field values have double quotes. There are also commas, and for this reason the data file uses a pipe ('|') as the record delimiter.

When I import using either a control file or using a SQL import  command from Hana Studio, I specify that the record delimiter is '|'.

Some of the records load, and some fail. The failed records show up in the error log as 'Parsing error after  'xxx' column. The error record indicates that the problem is in the space right after the double quote mark.

Other details:

  • The data file is created using Teradata SQL Assistant against a Teradata database.
  • The records are delimited by a newline character, but no carriage return.
  • The fields are delimited by a pipe ('|').
  • The file format option in SQL Assistant is 'UTF-8' and written to a *.txt. file on a Windows system.
  • The file is FTP'd to the Hana Linux box.
  • File is imported using options:
    • record delimited by '\n'
    • field delimited by '|'
    • optionally enclosed by '"'

  • There are some records with a control character in them- Hex value of '8D' - Don't know where this comes from yet.
    • Records with a field that have two double quotes do not load.
    • Records with a field that have two double quotes and also have the control character ( Hex value '8D') preceding the first double quote will load.
  • Example records
    • Column Names-Prod_ID,Prod_Desc, SKU, SKU_DESC
      • 12345   |Product xyz - "100" Daily|0A1234ZZ|"100" Daily
      • 12346   |Product abc - "200" Daily|0B1234|"200 Daily
    • The first record will not load
    • The second record will load. It has the control character (Hex value '8D') in the position between the hyphen and the first quote.
    • The error record from the .err file has this message:

               Parsing error: incorrect delimiter for the next column of PRODUCT_DESC field: "100"

               12345   |Product xyz - "100" Daily|0A1234ZZ|"100" Daily

                                                            ^

I have tried loading with and without the 'optionally enclosed by '"' '. Same result.

What am I doing wrong?

Thanks

Accepted Solutions (1)

Accepted Solutions (1)

former_member182302
Active Contributor
0 Kudos

Adding to what jake said, try using sed to command to replace the unwanted characters.

And try using this:

sed -e 's/"/\\"/g

In the CTL file:

optionally enclosed by '\"'

And check if it works.

Regards,

Krishna Tangudu

Former Member
0 Kudos

Krishna,

Thank you. This solves the problem. I tried to find this solution in the Hana documentation but couldn't. Is there a reference for this syntax that you can point me to for future use?

Thanks again,

Harry

former_member182302
Active Contributor
0 Kudos

Hi Harry,

Glad it solved your problem .

Well i don't know if there is any standard documentation for the syntax , but may be that calls for a separate blog which i would love to post in the future

Regards,

Krishna Tangudu

Answers (2)

Answers (2)

Former Member
0 Kudos

Hello Krishna,

i saw your solution about the parsing error when loading csv files that contain double quotes.

I want to know how and where i have to use the command :

sed -e 's/"/\\"/g


Thank you very much

Regards

Pierre

Former Member
0 Kudos

Hi Harry,

I've also had issues loading a TeraData file. In my case, I had to use "sed" to remove spaces and "?" that were in the CSV export. Sed is a good tool to modify the file to get it to import properly.

Check out John Appleby's blog about loading flat files.

Hope it helps.

-Jake