cancel
Showing results for 
Search instead for 
Did you mean: 

How to avoid cursor and make more efficient code

Former Member
0 Kudos

Hello HANA experts,

We have a problem with a preprocessing procedure. It prepares some data to be processed with R. A simpler version of what we need to do can be summarized as:

We have two tables: A and B. We need to do an aggregation from table B for each row in table A, using some table A columns to limit the range of table B rows to be aggregated.

As a sample, table A has information of some buildings (construction start, construction end and where it is located), and table B has weather information for the last years for many locations. What we want to do is to aggregate how much it rained during the construction of each building, and add that information to table A.

Currently we've developed it using a SQLSCRIPT procedure with a cursor, but it takes too long to preprocess the hole table due to the fact that cursors are not parallelized. Thus resulting in a sequential execution of as many aggregation queries as the number of rows in table A.

Do you have any advise regarding how to make more efficient our preprocessing stage? There is any way to run the aggregation queries in parallel, avoiding the use of the cursor?

Thanks in advance!

Juan

Accepted Solutions (1)

Accepted Solutions (1)

suresh_devarajan
Explorer
0 Kudos

I think this can be done using a query like the below. The inner queries are to locate the station closest to the building. Outer query joins the building and weather tables using the found closest station ID.

SELECT bloc.id_building, SUM(w1.prec) prec

  FROM (

    /* Select the station with the least distance */

  SELECT e.*

   FROM (

    /* Rank the distance */

  SELECT d.*, RANK() OVER (PARTITION BY d.id_building ORDER BY d.dist ASC) rnk

   FROM

  (

  /* Distance between the buildings & stations */

  SELECT b.*, w.id_estation, b.location.ST_DISTANCE(w.obs_location) dist

    FROM buildings b

   CROSS JOIN (SELECT DISTINCT id_estation, obs_location FROM weather) w

  ) d

  ) e

  WHERE e.rnk = 1

  ) bloc

LEFT OUTER JOIN weather w1

   ON bloc.id_estation = w1.id_estation

  AND w1.obs_date BETWEEN bloc.cons_start and bloc.cons_end

GROUP BY bloc.id_building

ORDER BY bloc.id_building

  ;

Former Member
0 Kudos

Thank you Suresh, that's exactly what I wanted to do.

Answers (2)

Answers (2)

former_member185511
Active Participant
0 Kudos

i think this document may help you

after finding the closest distance you can aggregate PREC value. But what i didn't understand is, what is the threshold value for you defining the closest distance ? If it is 1km distance to building A, and 500meter distance to building B, can we accept it rained to both buildings?

Former Member
0 Kudos

Hi Bilen,

Thanks for your reply. I read that document couple of weeks ago when started using GIS functionalities in HANA.  My problem is not with the GIS part (that was already solved).

My problem is how to do the query for each row in table A without having to loop over table A (without using a cursor).

Regarding your threshold question, I'm just interested in the closest one. I mean, if one building is 500 meter away from station 1 and 800 meter away from station 2, I'm only  interested in querying station 1. But as I said, that was already solved, my problem comes with the cursor thing ; )

Thanks!

BR

Juan

lucas_oliveira
Advisor
Advisor
0 Kudos

Hi Juan,

I trust it might be possible do what you're doing with plain SQL or WITH but we need to understand what exactly you want to do.

Provide the table create statements, data samples and desired results for the data samples.

BRs,
Lucas de Oliveira

Former Member
0 Kudos

Hi Lucas

A sample can be:

DROP TABLE BUILDINGS;

CREATE COLUMN TABLE BUILDINGS(

    ID_BUILDING INTEGER,

    CONS_START DATE,

    CONS_END DATE,

    LOCATION ST_GEOMETRY(4326)

);

DROP TABLE WEATHER;

CREATE COLUMN TABLE WEATHER(

    ID_ESTATION INTEGER,

    OBS_LOCATION ST_GEOMETRY(4326),

    OBS_DATE DATE,

    PREC REAL

);

IMPORT FROM CSV FILE '/home/datadm/util/scripts_hana/weather.txt'

INTO WEATHER

WITH

  record delimited by '\n'

  field delimited by ','

  skip first 1 row;


IMPORT FROM CSV FILE '/home/datadm/util/scripts_hana/buildings.txt'

INTO BUILDINGS

WITH

  record delimited by '\n'

  field delimited by ','

  skip first 1 row;



We want to have an ouput as follows:

   

CREATE COLUMN TABLE OUTPUT_TABLE (

    ID_BUILDING INTEGER,

    PREC_ACUM REAL

);


INSERT INTO OUTPUT_TABLE VALUES(1,43.38);

INSERT INTO OUTPUT_TABLE VALUES(2,75.92);

INSERT INTO OUTPUT_TABLE VALUES(3,66.64);


That belongs to the following queries (note that id_estation is calculated as the closest one to the location of the building, and the dates used in the between come from cons_start and cons_end):

select 1 as id_building, sum(prec) from weather where id_estation=99021 and obs_date between TO_DATE('01-01-2015','DD-MM-YYYY') and TO_DATE('12-02-2015','DD-MM-YYYY');

select 2 as id_building, sum(prec) from weather where id_estation=99012 and obs_date between TO_DATE('12-01-2015','DD-MM-YYYY') and TO_DATE('13-03-2015','DD-MM-YYYY');

select 3 as id_building, sum(prec) from weather where id_estation=99008 and obs_date between TO_DATE('01-02-2015','DD-MM-YYYY') and TO_DATE('29-03-2015','DD-MM-YYYY');


I've ommited the part in which I've calculated the closest id_estation for each building, you can do it too if you want, using the ones specified in the queries.



Thanks for your help



Regards


Juan