cancel
Showing results for 
Search instead for 
Did you mean: 

How to run HANA Stored procedure in parallel

0 Kudos


Dear all,

I have created one stored procedure in HANA which will take article id, store id and current datetime as input. The business logic is quite complex, which is to process the records at hourly basis, on which I am using CURSORS to process each record at a time. I want to run this stored procedure for 300+ stores and 200 items per store once in a hour.

Please let me know how i can use HANA capability to run the stored procedure parallelly ?  I will be using hdbsql to call the stored procedure in UNIX Cron job for running at each hour.

Regards,

Yogesh

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi John

Could you please help me in giving some example for work around for cursor inside stored procedure?

I am also using Cursor inside my procedure to get column name dynamically from Table A in my Select < column list > from Table B.

Thanks

Kumar

Former Member
0 Kudos

Hi Swamy,

Database stored procedures don't run in parallel. Not in the sense that a multi-threaded application runs in parallel. So you cannot instruct HANA to start 10 invocations of a single stored procedure. You could connect 10 sessions and CALL() it ten times; as long as you logically partition your data so that there is no overlapping of processing. If you range partition your one (or more) of your primary tables, then this will happen automatically as long as the result set from each query only hits one partition (pass in the partition key as a parameter to the procedure). How to call it becomes the next issue; you need a process architecture that can manage child threads - start, monitor, end, check status etc. Desktop HANA clients (Studio, hdbsql) cannot do this - they are single threaded. You need a language (python, C) that can start and manage threads. All in all - its hard. Few applications go to so much trouble (though it is common for investment banking systems).

Recommendation - tune the SQL so that a single invocation of the procedure invokes as much automatic parallelism as possible, thus minimizing execution time. The usual techniques: range partitioning, matched- fact/dimension partitioning, avoidance of operations that serialize (CASE etc) - that sort of stuff. SQL parallelism is easier to use than thread parallelism.

mark teehan

singapore

Former Member
0 Kudos

Mark - there are some nuances here. SQL statements do indeed run in parallel if they are not dependent, the procedure is read-only, there are no scalar variables declared and there are no imperative functions.

Whats more, individual SQL queries will parallelize in most cases, plus the compiler will rewrite SQL in many cases to be more efficient.

All this is discussed in detail in the SQLScript reference guide.

Former Member
0 Kudos

The short version is: don't use cursors. They will never run in parallel. You can rewrite your stored procedure to avoid them in most cases.

If you truly believe you can't avoid cursors then you can do the following:

- Select all stores into a local table variable with M values

- Unnest it into an array

- Create N arrays where N is the degree of parallelism

- Loop through array M and alternately insert into arrays N

- Call a subprocedure that does your logic, N times

This will cause your job to run with a degree of parallelism of N. But, if you end up doing this way, you could probably have rewritten the stored procedure to avoid the use of cursors in the first place.

John

former_member182302
Active Contributor
0 Kudos

Hi John,

In a generic procedure we wrote for loading via HDBSQL, we were using cursors to dynamically fetch the column list of the table to be loaded.

Is there a work around to avoid this cursor as well ? to maximize the performance by increasing parellelism?

Regards,

Krishna Tangudu

Former Member
0 Kudos

I'm not sure I understand what you are trying to do. Can you explain in more detail and paste your code.

former_member182302
Active Contributor
0 Kudos

Hi Appleby,

What we created is a generic procedure to load data into hana.

So to make it generic, we kept the Table names as input parameter. And since SELECT * FROM TABLE is not good for the performance we wanted to give SELECT <Column List>  FROM TABLE.

To get this <Column list> we were using cursors to read and frame the SQL statement dynamically.

So is there any way to eliminate the use of cursors like this?

Did you get the context of my doubt .. of what am trying to explain here?

Regards,

Krishna Tangudu

Former Member
0 Kudos

You're misunderstanding a few things about HANA.

SELECT * is considered poor for performance because you need to scan each column and this is inefficient if you only need a few. If you actually want each column then using SELECT * is fine.

If you want to load data then why wouldn't you use the LOAD command?

Can you provide some sample code if you want more assistance?

John

former_member182302
Active Contributor
0 Kudos

Hi Appleby,

I am loading data from Stage Table to Final Table. Since stage table has some extra columns which are not present in the Final (Target) table am using cursor to dynamically populate the Select <columnlist> as per the table (given by user in input parameter).

With Load statement i can load the column table data into memory right? is it something like Loading data into table

i.e. instead of writing insert into Final Table ( Select <column list> from Stage table)

Can i use load? Can you please help me understand Load statement in a better way?

Can you share your email id. I may not be able to share the code in the forum.. or i will check if i can make the code a bit generic and post here.

Regards,

Krishna Tangudu

0 Kudos

Hi John,

If possible, could you please help in detail or example on how to achieve the below logic in stored procedure to run in parallel. I could not able to get the view which you have described below.

- Select all stores into a local table variable with M values

- Unnest it into an array

- Create N arrays where N is the degree of parallelism

- Loop through array M and alternately insert into arrays N

- Call a subprocedure that does your logic, N times

Regards,

Yogesh

Former Member
0 Kudos