cancel
Showing results for 
Search instead for 
Did you mean: 

Server-side import not committing rows beyond even batch multiple

Former Member
0 Kudos

Hi All, I'm importing large text files from the local HANA server file system and it's working well except the remaining records beyond an even multiple of the batch size (total_records mod batch_size) are not being committed. For example my test table has 809,159 rows. If I specify 1,000 as the batch size I get 809,000 rows committed, 100,000 gives me 800,000 etc. If I specify exactly 809,159 as the batch size, I get 809,159 committed. Here is my code with an example batch size of 100000:

IMPORT FROM CSV FILE '/dropbox/my-utf-8-data.csv' INTO MYSCHEMA.MYTABLE WITH

RECORD DELIMITED BY '\n'

FIELD DELIMITED BY ','

BATCH 100000;

my-utf-8-data.csv contains 809,159 rows, but the above command commits only 800.000.

Other points:

  1. Table is column store, csv fields match and there is no problem with the data committed
  2. csv files are UTF-8
  3. Omitting BATCH option gives a default of 1,000 with the same issue
  4. I'm using HANA 1.00.28.361821 on Suse Enterprise 11.1 (dev instance on AWS)
  5. Studio 1.0.32 (but as mentioned the import is server-side)
  6. Imports from the client file system are too slow for the volume of data we have (we compress and load to server)
  7. Batch size of 1 works, but is too slow for the volume of data we have
  8. This is the reference we're using http://help.sap.com/hana/html/sql_import_from.html

I'm sorry if I'm missing something obvious. Thanks in advance for any assistance you may be able to provide.

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

I can't imagine the behaviour above is 'by design', but just in case anybody else is having trouble with this, I have a couple of short-cuts. Basically, we count the number of lines in the text file and set that (or a multiple) as the batch size. Painful, yes. Error prone, yes. But nothing beats the performance of server-side import for bulk loads.

To get the number of lines in a file (you need to be logged-on to the Linux server) where the file name is my-utf-8-data.csv:

# wc -l my-utf-8-data.csv

809154 my-utf-8-data.csv

Obviously you need to be very careful about header rows and trailing empty lines. Here is a way to quickly peek at the last 11 lines of a large text file:

# sed -e :a -e '$q;N;11,$D;ba' my-utf-8-data.csv

27137323,14639241,1,281586,0,2912,NULL

27137325,14639243,2,,2,2250,A

...

Then check the number of rows committed meticulously!

Good luck.

-Leuis