on 12-16-2013 8:31 AM
Dear all,
I have created one stored procedure in HANA which will take article id, store id and current datetime as input. The business logic is quite complex, which is to process the records at hourly basis, on which I am using CURSORS to process each record at a time. I want to run this stored procedure for 300+ stores and 200 items per store once in a hour.
Please let me know how i can use HANA capability to run the stored procedure parallelly ? I will be using hdbsql to call the stored procedure in UNIX Cron job for running at each hour.
Regards,
Yogesh
Hi John
Could you please help me in giving some example for work around for cursor inside stored procedure?
I am also using Cursor inside my procedure to get column name dynamically from Table A in my Select < column list > from Table B.
Thanks
Kumar
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Swamy,
Database stored procedures don't run in parallel. Not in the sense that a multi-threaded application runs in parallel. So you cannot instruct HANA to start 10 invocations of a single stored procedure. You could connect 10 sessions and CALL() it ten times; as long as you logically partition your data so that there is no overlapping of processing. If you range partition your one (or more) of your primary tables, then this will happen automatically as long as the result set from each query only hits one partition (pass in the partition key as a parameter to the procedure). How to call it becomes the next issue; you need a process architecture that can manage child threads - start, monitor, end, check status etc. Desktop HANA clients (Studio, hdbsql) cannot do this - they are single threaded. You need a language (python, C) that can start and manage threads. All in all - its hard. Few applications go to so much trouble (though it is common for investment banking systems).
Recommendation - tune the SQL so that a single invocation of the procedure invokes as much automatic parallelism as possible, thus minimizing execution time. The usual techniques: range partitioning, matched- fact/dimension partitioning, avoidance of operations that serialize (CASE etc) - that sort of stuff. SQL parallelism is easier to use than thread parallelism.
mark teehan
singapore
Mark - there are some nuances here. SQL statements do indeed run in parallel if they are not dependent, the procedure is read-only, there are no scalar variables declared and there are no imperative functions.
Whats more, individual SQL queries will parallelize in most cases, plus the compiler will rewrite SQL in many cases to be more efficient.
All this is discussed in detail in the SQLScript reference guide.
The short version is: don't use cursors. They will never run in parallel. You can rewrite your stored procedure to avoid them in most cases.
If you truly believe you can't avoid cursors then you can do the following:
- Select all stores into a local table variable with M values
- Unnest it into an array
- Create N arrays where N is the degree of parallelism
- Loop through array M and alternately insert into arrays N
- Call a subprocedure that does your logic, N times
This will cause your job to run with a degree of parallelism of N. But, if you end up doing this way, you could probably have rewritten the stored procedure to avoid the use of cursors in the first place.
John
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Appleby,
What we created is a generic procedure to load data into hana.
So to make it generic, we kept the Table names as input parameter. And since SELECT * FROM TABLE is not good for the performance we wanted to give SELECT <Column List> FROM TABLE.
To get this <Column list> we were using cursors to read and frame the SQL statement dynamically.
So is there any way to eliminate the use of cursors like this?
Did you get the context of my doubt .. of what am trying to explain here?
Regards,
Krishna Tangudu
You're misunderstanding a few things about HANA.
SELECT * is considered poor for performance because you need to scan each column and this is inefficient if you only need a few. If you actually want each column then using SELECT * is fine.
If you want to load data then why wouldn't you use the LOAD command?
Can you provide some sample code if you want more assistance?
John
Hi Appleby,
I am loading data from Stage Table to Final Table. Since stage table has some extra columns which are not present in the Final (Target) table am using cursor to dynamically populate the Select <columnlist> as per the table (given by user in input parameter).
With Load statement i can load the column table data into memory right? is it something like Loading data into table
i.e. instead of writing insert into Final Table ( Select <column list> from Stage table)
Can i use load? Can you please help me understand Load statement in a better way?
Can you share your email id. I may not be able to share the code in the forum.. or i will check if i can make the code a bit generic and post here.
Regards,
Krishna Tangudu
Hi John,
If possible, could you please help in detail or example on how to achieve the below logic in stored procedure to run in parallel. I could not able to get the view which you have described below.
- Select all stores into a local table variable with M values
- Unnest it into an array
- Create N arrays where N is the degree of parallelism
- Loop through array M and alternately insert into arrays N
- Call a subprocedure that does your logic, N times
Regards,
Yogesh
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
85 | |
10 | |
10 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.