on 10-23-2013 9:14 AM
i have to import bulk data in SAP HANA without using data services. is there any methods from which bulks of data can be loaded?
After Digging every possible methods , i find CTL method is good to load one time data.
But if we use SHell Scripts then we can use Ctl method to load the data .
I have implemented shell scripts over ctl files to load the data whenevr any ctl files is present.
I find it bit easy (if we have full packaged data and only if need to load in HANA ) then this method is easy and good.
To schedule the files is still a hurdle for me.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I have used Cron ..its good .But can we not run mutliple files under cron scheduler.?
But now i am stuck up in implementation:
I have to implement a shellscript code where in i have to import only those ctls in HANA whose csvs are present in a particular folder.
Like i have a folder name A and in that i have updated CSVs which i am scheduling on daily basis. In the A folder only those CSV will come who are updated .So now how can i load data on of those tables whose csv is only present in folder A . like if i have 2 tables on 1st day then only 2 ctl files will run and on the next day if i have 3 tables in A folder then 3 files will run.
This logic is harder for me to implement
Shweta:
For small volume and demo table files, you can use the import wizard if you are higher than SP4.
However, for high volume buld loads. The best way is to use the import wizard for high volume flat files. Please follow John's step-by-step and guidelines to import bulk data.
http://www.saphana.com/community/blogs/blog/2013/04/07/best-practices-for-sap-hana-data-loads
John is a HANA Guru / Veteran and I am a big fan of John's blogs and I have followed his inputs and tips in my projects. John, good to see you back in the blogging area. You have been missing in action for some time sir! Welcome Back!
Hope this helps.
Rama
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You're too kind - I've been busy with customers.
I do have a Java loader which I'm happy to share - creating a GitHub repository now for HANA code snippets. Empty for now, keep checking back.
Feel free to join and add your own code.
John
thanks John . As u mentioned in the very first post that u write shell scripts for the reloading of flat files
in HANA database. and to schedule them we have to use XS wrapper ..can u provide me some examples of XS wrapper . i m not getting the idea of it. As a beginner in HANA.:)
Rama: thanks for info and also about John's .
I pretty much always use the IMPORT command with control files. It is limited in file format (requires CSV) and so I often find myself writing Linux shell scripts to reformat files.
It is easy and cheap and very fast - I see up to 7m rows/second depending on the type of file.
http://help.sap.com/hana/html/sql_import_from.html
John
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
CTL files are ideal for single time bulk loads.
You didn't specify that you wanted to schedule them. You can do this with CTL files using a XS wrapper, and there is a scheduling engine in the roadmap.
But if you want to do periodic loads from flat files into HANA, then you are better using Data Services for sure. It provides good error handling, monitoring and notification, all of which may be important to you.
If you don't want to use Data Services then you could write a loader into XS using Server Side JavaScript, or I have even written them using JDBC. Alternatively, Informatica is now supported.
John
You can do it like for any database - here's an example for MSSQL, you could rewrite this using the HANA driver.
The key is that you addBatch() and executeBatch() to ensure you do bulk inserts rather than individual inserts. This will dramatically improve load performance. I advise batches of 50-200k rows depending on your data - play with the number.
Note that this will only use a single thread (actually two) and you will max out at about 100-200k rows/sec. If you want faster performance, you will have to write a multi-threaded loader which can push multiple batches at once into multiple connections. You will find you will get an improvement in performance up to the number of cores/2.
Typically you should expect 1m rows/sec of inserts on a typical 10 column table. For very wide tables (1-200 columns) this may drop to about 250k rows/sec.
Good Luck!
John
User | Count |
---|---|
86 | |
10 | |
10 | |
9 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.