cancel
Showing results for 
Search instead for 
Did you mean: 

Fetch the first record with 24 hr interval based on first record timestamp

Former Member
0 Kudos

Hello Folks,

Need your help to build logic for the below requirement.

We have a requirement where we need to fetch the record posted after every 24 hours from the first record time stamp.

details about the requirement:

1) Take the first record for each order_number and corresponding time stamp.

2) take the record posted for the same order number after 24 hours.

3) take the record posted after after 24 hours based on the time stamp from 2nd record.

Below is the sample data set.

RECORD_NUMBERCREATED_DATE
5675462015-07-28 10:45:24
5675462015-07-28 14:13:45
5675462015-07-28 22:53:19
5675462015-07-29 08:34:39
5675462015-07-29 17:13:26
5675972015-07-26 11:34:29
5675972015-07-27 10:45:32
5675972015-07-28 13:12:43
5675972015-07-29 18:39:51

The result should be like below.

RECORD_NUMBERCREATED_DATE
5675462015-07-28 10:45:24
5675462015-07-29 17:13:26
5675972015-07-26 11:34:29
5675972015-07-28 13:12:43
5675972015-07-29 18:39:51

The records should be grouped based on record_number.

Regards,

Venkat N.

Accepted Solutions (1)

Accepted Solutions (1)

henrique_pinto
Active Contributor
0 Kudos

Window functions are your friends. 🙂


SELECT RECORD_NUMBER,MIN(CREATE_DATE) CREATE_DATE FROM (

SELECT *, MIN(TO_DATE) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE DESC) TO_DATE2 FROM (

SELECT *, CASE WHEN DIFF > 24 THEN CREATE_DATE WHEN NEXT_DATE IS NULL THEN '2100-01-01' ELSE NULL END TO_DATE FROM (

SELECT *, SUM(SECONDS_BETWEEN(CREATE_DATE,NEXT_DATE)/3600) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE) DIFF FROM (

SELECT RECORD_NUMBER, CREATE_DATE, LEAD(CREATE_DATE) OVER (PARTITION BY RECORD_NUMBER ORDER BY CREATE_DATE) NEXT_DATE

FROM "TEST".TEST_TABLE)

GROUP BY RECORD_NUMBER, CREATE_DATE,NEXT_DATE)))

GROUP BY RECORD_NUMBER, TO_DATE2

ORDER BY RECORD_NUMBER, CREATE_DATE;


;RECORD_NUMBER;CREATE_DATE              

1;567,546      ;Jul 28, 2015 10:45:24.0 AM

2;567,546      ;Jul 29, 2015 5:13:26.0 PM

3;567,597      ;Jul 26, 2015 11:34:29.0 AM

4;567,597      ;Jul 28, 2015 1:12:43.0 PM

5;567,597      ;Jul 29, 2015 6:39:51.0 PM

lbreddemann
Active Contributor
0 Kudos

Creating a running total of the time distance to the next event is a good idea.

I tried the same but couldn't get it to work (as I was trying to work the time backwards to the starting point.. ).

Your solution is (with proper formatting) also quite easy to understand and follow.

Very nice!

- Lars

henrique_pinto
Active Contributor
0 Kudos

I did the cumulative total then the most outer sort is descending by create date, so that the MIN() aggregation on the TO_DATE would put the desired target value on the target records, so that when I finally aggregate by MIN(CREATE_DATE) (and by then I resort by ascending order), it groups by correctly.

former_member182114
Active Contributor
0 Kudos

Hi Henrique,

I did a test on the sample attached and it not worked as expected.

From 30 lines it returns 29 when the expectation is 19 (as far as I understood the proposition), anyhow it's explained on excel attached.

Regards, Fernando Da Rós

former_member182114
Active Contributor
0 Kudos

Hi Colleagues,

I tougth using WITH is was step-by-step easily and readable but Lars latests words say I'm wrong... 😞

No problem, it always is time to get back and do it better

Converting the WITH version to plain SQL (which also can be modeled as graph) reached a clear solution that as far as I tested works for many cases, including


select

  a.record_number,

  min(a.create_date) as create_date

from

  test_table as a inner join

  (select

    -- get first create_date per record_number to make the grouping of series

    record_number,

    min(create_date) as min_dt

  from

    test_table

  group by

    record_number) as b

  on  b.record_number = a.record_number

group by

  a.record_number,

  -- compute grouping key by interval from first date with desired interval (here 24 hours = 1 day = 86400 seconds)

  round(seconds_between(a.create_date,b.min_dt)/(1*86400),0,ROUND_DOWN)

order by 1,2

Regards, Fernando Da Rós

henrique_pinto
Active Contributor
0 Kudos

Dont indian me! LOL!

I made it work for the origina data set. 🙂

Now you make it work for the whole world, you're the distinguished engineer, not me.

former_member182114
Active Contributor
0 Kudos

Hi Henrique,

I need some translation to understand but ok I got it.

About works I expect some more testings from colleagues to proof this.

Best regards, Fernando Da Rós

henrique_pinto
Active Contributor
0 Kudos

There you go.

lbreddemann
Active Contributor
0 Kudos

No no no no no!

I didn't mean to say (or imply) that you're wrong - your solution is very fast and produces, as far as I could see, the correct result.

The fault is on my side, since I simply struggle to understand _why_ your solution is correct.

But that's really on me and surely caused by my ridiculously short attention span, my lack of abstract thinking and my general fear of everything too technical!

@All: guys, this is one of the most interesting and entertaining threads in a long time here on SCN.

I'd hope there will be something like that in the future 1DX version of discussions, too.

Thanks for that and cheers!

- Lars

Answers (3)

Answers (3)

Former Member
0 Kudos

What about this solution?


select

  record_number,

  create_date

from ( select

  tbl.record_number,

  tbl.create_date,

  cat.datemin,

  to_integer((seconds_between(cat.datemin, tbl.create_date)/3600)/24) as days,

  mod((seconds_between(cat.datemin, tbl.create_date)/3600), 24) as dif

  from test_table tbl

  inner join (select

  distinct record_number,

  min(CREATE_DATE) as datemin

  from test_table

  group by record_number) cat

  on tbl.record_number = cat.record_number )

where dif = 0

or days >= 1

former_member182114
Active Contributor
0 Kudos

Hi José,

Need some adjust. It produces same result from the SQL from Henrique but also doesn't work on new data. The new data is here but I'll reproduce it again here:

The inserts:

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-26 09:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-26 20:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-27 07:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-27 18:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-28 05:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-28 16:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-07-29 03:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-01 10:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-02 05:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-03 00:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-03 19:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-04 14:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-05 09:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-06 04:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-10 05:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-11 06:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-12 07:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-13 08:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-14 09:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-15 10:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-16 11:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-17 12:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 08:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 09:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 10:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-20 11:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 08:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 09:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 10:00:00');

INSERT INTO TEST_TABLE VALUES(3112,'2015-09-21 11:00:00');

Attached the excel explaining midnights:

Current SQL statement:


select

  a.record_number,

  min(a.create_date) as create_date

from

  test_table as a inner join

  (select

    -- get first create_date per record_number to make the grouping of series

    record_number,

    min(create_date) as min_dt

  from

    test_table

  group by

    record_number) as b

  on  b.record_number = a.record_number

group by

  a.record_number,

  -- compute grouping key by interval from first date with desired interval (here 24 hours = 1 day = 86400 seconds)

  round(seconds_between(a.create_date,b.min_dt)/(1*86400),0,ROUND_DOWN)

order by 1,2

;

Here the expected output:

BTW: Where is the owner of thread? jump in and provide feedback, the question is yours, we are just playiing it around but your silent doesn't tell if it's help with your initial question.

Best regards, Fernando Da Rós

Former Member
0 Kudos

Hello Venkat,

Just an idea from my side..it may need data type casting and with slight adjustment in code it may work..


DROP PROCEDURE <procedure>;

CREATE PROCEDURE <procedure> LANGUAGE SQLSCRIPT AS

BEGIN

DECLARE v_var1 TIMESTAMP;

DECLARE v_temp BIGINT ;

DECLARE c_operator NVARCHAR(1) := '+';

DECLARE c_24hours TIME := '24:00:00';

DECLARE CURSOR c_cursor  FOR SELECT record_number , created_date FROM table1 ORDER BY record_number

BEGIN

FOR cur_row as c_cursor  DO

v_var1 = cur_row.CREATED_DATE|| c_operator || c_24hours ;

IF cur_row.RECORD_NUMBER <> v_temp THEN

execute immediate  'INSERT INTO SCHEMA.TABLE2 ( RECORD_NUMBER, CREATED_DATE )    -- this should pick the first record for each order_number and corresponding time stamp.     

SELECT RECORD_NUMBER , CREATED_DATE FROM TABLE1 

                       WHERE RECORD_NUMBER ='||cur_row.RECORD_NUMBER||' 

                            AND CREATED_DATE      = '||cur_row.CREATED_DATE||'  ;

execute immediate  'INSERT INTO SCHEMA.TABLE2 ( RECORD_NUMBER, CREATED_DATE )   -- This should pick same record number but posted after after 24 hours based on the time stamp from 2nd record     

SELECT RECORD_NUMBER , CREATED_DATE FROM TABLE1 

                       WHERE RECORD_NUMBER ='||cur_row.RECORD_NUMBER||'

                            AND CREATED_DATE > '||V_VAR1||'  ;

v_temp = cur_row.RECORD_NUMBER;

END FOR

END

Thanks

Siva

Message was edited by: Sivakumar Palaniswamy

lbreddemann
Active Contributor
0 Kudos

Please post the create table and insert statements so that we can work with your data.

- Lars

Former Member
0 Kudos

Hi lars,

Thanks for the reply. below are the create and insert statements.

CREATE COLUMN TABLE <SCHEMA>.TEST_TABLE (RECORD_NUMBER BIGINT,CREATE_DATE SECONDDATE)

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 10:45:24')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 14:13:45')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-28 22:53:19')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-29 08:34:39')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567546,'2015-07-29 17:13:26')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-26 11:34:29')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-27 10:45:32')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-28 13:12:43')

INSERT INTO <SCHEMA>.TEST_TABLE VALUES(567597,'2015-07-29 18:39:51')

lbreddemann
Active Contributor
0 Kudos

Thanks for the DDL and INSERT statements.

I really liked this question and I spend some time tinkering with it.

Unfortunately I did not find an elegant solution for this, since SAP HANA currently doesn't support recursive queries.

Because the new starting point for any 24 hour interval is based on a calculated value in a row "before" the current one and there could be any number of "before" rows, neither join nor sub-select work here.

The solution to this, unfortunately seems to be a procedure:


create global temporary column table my_tmp (record_number bigint, create_date seconddate);

drop procedure max_by_24_hrs;

create procedure max_by_24_hrs (out max_events

                                TABLE (record_number bigint, create_date seconddate))

language SQLSCRIPT as

begin

declare cursor c_raw   for select record_number, create_date

                           from test_table

                           order by record_number, create_date;

declare curr_grp_record bigint ;

declare curr_grp_date seconddate ;

declare curr_record bigint;

declare curr_date  seconddate;

  

    -- initialize the loop variables

    truncate table my_tmp;

    select min (record_number), min (create_date)

           into curr_grp_record, curr_grp_date

    from test_table;

  for cr as c_raw  DO   

        curr_record := cr.record_number;

        curr_date := cr.create_date;

       

        -- records group

        if :curr_record > :curr_grp_record then

        -- new group, new date

            curr_grp_record = :curr_record;

            curr_grp_date = :curr_date;

           

            insert into my_tmp values (:curr_record, :curr_grp_date);

         elseif (:curr_record = :curr_grp_record

             and add_days(:curr_grp_date, 1) <= :curr_date) then

         -- same record no, but date is at least a day higher than the last starting point

         -- new date

            curr_grp_date = :curr_date;  

            insert into my_tmp values (:curr_record, :curr_grp_date);  

         end if;

                 

    end for;

   

    max_events =  select * from my_tmp;

end;


call max_by_24_hrs(?);

RECORD_NUMBERCREATE_DATE        
567546     2015-07-28 10:45:24.0
567546     2015-07-29 17:13:26.0
567597     2015-07-26 11:34:29.0
567597     2015-07-28 13:12:43.0
567597     2015-07-29 18:39:51.0

This version at least avoids dynamic SQL and loops over the data only once.

Would be interesting to see if anyone finds a more elegant or better performing option.

- Lars

former_member182114
Active Contributor
0 Kudos

Hi Colleagues,

Follow two other ways to do it.

I've used WITH just as it's easy but it can be ported to calc view easily.

This is hardcoded to match only Two days, depending on requirement I'd choose this one.....


with

  lt_first  as ( select record_number,min(create_date) as dt

  from test_table

  group by record_number),

  lt_1day   as (select b.record_number,min(b.create_date) as dt

  from lt_first as a inner join test_table as b

  on  b.record_number = a.record_number

  and b.create_date > (add_days(a.dt,1))

  group by b.record_number),


  lt_2day   as (select b.record_number,min(b.create_date) as dt

  from lt_1day as a inner join test_table as b

  on  b.record_number = a.record_number

  and b.create_date > (add_days(a.dt,1))

  group by b.record_number)

  select * from lt_first union all

  select * from lt_1day union all

  select * from lt_2day

  order by 1,2;

... the other one is infinite but have potential to crash a server due to cross join:


with

  lt_24hrs_pair  as ( select a.record_number,a.create_date as dt,b.create_date as dt_next

  from test_table as a inner join test_table as b

  on  b.record_number = a.record_number

  and b.create_date   > add_days(a.create_date,1)),

  lt_min_next  as (select record_number,dt,min(dt_next) as dt_next

  from lt_24hrs_pair

  group by record_number,dt),

  lt_min_min   as (select record_number,min(dt) as dt,dt_next

  from lt_min_next

  group by record_number,dt_next),

  lt_last      as (select record_number,max(dt_next) as dt

  from lt_24hrs_pair

  group by record_number)

  select record_number,dt from lt_min_min union

  select record_number,dt from lt_last

  order by 1,2;

I've added more lines in order to test if the result kept the same and yes, it still the same.


INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 10:45:59');

INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 14:13:59');

INSERT INTO TEST_TABLE VALUES(567546,'2015-07-28 22:53:59');

INSERT INTO TEST_TABLE VALUES(567546,'2015-07-29 08:34:59');

INSERT INTO TEST_TABLE VALUES(567597,'2015-07-26 11:34:59');

INSERT INTO TEST_TABLE VALUES(567597,'2015-07-27 10:45:59');

INSERT INTO TEST_TABLE VALUES(567597,'2015-07-28 13:12:59');

Regards, Fernando Da Rós

PS: I couldn't make the SCN editor show the SQL formatted for better reading 😞

Message was edited by: Fernando Ros

lbreddemann
Active Contributor
0 Kudos

Hi Fernando

So glad someone else came up with something, too. Thanks for that!

This one kept me puzzling a while..

And yes, the (hopefully) much improve code display is something I am looking forward to on the new discussion platform, too!

Admittedly, I also tried to make do with the WITH clause but since the RECURSIVE feature is not there (and no other feature allowing recursion) I switched to the sledge-hammer approach that I posted.

Concerning your second approach, how does this one allow for arbitrary number of 24-hr steps?

To me a core difficulty was the fact that there could be any number of records within a 24hr period which would be the start of a new group.

How does this work in your solution?

BTW: the procedure version really is not that good for mass processing. I blew the record set up to 1.3 mio records and it already took more than a minute. Without additional filters, we really need recursive/hierarchy support for such requirements.

- Lars

lbreddemann
Active Contributor
0 Kudos

Shoot... I just remembered that we already discussed such a requirement some time ago...

Looking at it, I created the same solution again

back then came up with the parent-child hierarchy (which essentially is the missing recursion functionality in this scenario) approach.

I find that quite frustrating as the discussion back then was over a year ago on rev. 74...

Let's keep fingers crossed for more recursion support in SPS after SPS 10!

- Lars

former_member182114
Active Contributor
0 Kudos

Hi Lars,

I love such on hands, and your request worked for me as a call to army. My luck that I read it and could spent some time (not spare time). I still have a "background job" with this from

You are right, it doesn't work in series of 24 hours. The failure is expose in such sample of series of 23 hours:

26/Jul   09:00

27/Jul   08:00 <-- discarded

28/Jul   07:00

29/Jul   06:00 <-- discarded

30/Jul   05:00

31/Jul   04:00 <-- discarded

01/Aug   03:00

02/Aug   02:00 <-- discarded

03/Aug   01:00

04/Aug   00:00 <-- discarded

04/Aug   23:00

05/Aug   22:00 <-- discarded

The first 24 hours start on 26/jul 09:00 and should end on 27/jul 08:59, the second cycle start two hour earlier, and third forth hours. 😞

I got confused with what I understood from:


1) Take the first record for each order_number and corresponding time stamp.

2) take the record posted for the same order number after 24 hours.

3) take the record posted after after 24 hours based on the time stamp from 2nd record.

Anyhow, the correctness depends on requeriment, if the requirement of 24 hours demand harmonization among all then a date grouping is most interesting.

If the link is necessary according start of series I guess it can be achieved using the first occurence as  parameter for grouping the next sets. I'll give a try it later and feedback to thread...

Best regards, Fernando Da Rós

Former Member
0 Kudos

Here's one way to approach it.

- Create DATE() and TIME() GENERATED ALWAYS statements

- Find the minimum timestamp and add it as a calculated column for all rows

- Subtract the time for this from the current time into another calculated column "calctime"

- Rank on the minimum "calctime" for each date

I think that could be built as a graphical calc view, but it might suck.

Wondering what Werner Steyn thinks to all this.

lbreddemann
Active Contributor
0 Kudos

Hmm... not sure I get your points here.

What do we need the generated columns for?

The approach of looking at the running added up time difference between the current row and the groups minimum is straight forward.

Unfortunately it doesn't help with the core problem - that the local minimum time is clear only after the fact of sequentially adding up the time differences for every row.

I could be missing something here, but I'd really love to see this being build and working without a procedure.

- Lars

Former Member
0 Kudos

The generated columns allow us to separate the time and date so we can:

1) Subtract the first time from all the others, to know when the virtual "midnight" is

2) Group by date in the rank function so we create a window function

I'm pretty certain this would work, don't have time to test right now.

John

lbreddemann
Active Contributor
0 Kudos

I'm eagerly waiting to see this working.

I still believe that this doesn't solve the issue, that it is actually not clear upfront, what the reference point in time is for the 3rd, 4th etc. event.

Would like to learn better and see how this is done.

- Lars

former_member182114
Active Contributor
0 Kudos

Hi Lars,

First take a look on attached sheet with the target designed to achive (on left side the behavior of second WITH posted, and the plan to reach the third one). I've also attached the script to run the sample of sheet:


The SQL below is didatic in 4 steps trying to be readable (at least I expect that)



with

  lt_starter_per_record as

  (select *,seconds_between(to_date(to_char(min_dt,'yyyy-mm-dd')),min_dt) as sec_starter from

  (select record_number,min(create_date) as min_dt

  from test_table

  group by record_number)),

  lt_day_range as

  (select record_number,create_date,min_dt,sec_starter,dt,

  case when seconds_between(dt,create_date) >= sec_starter then 0 else 1 end as ind_day_range

  from

  (select a.record_number,a.create_date,b.sec_starter,b.min_dt,to_date(to_char(create_date,'yyyy-mm-dd')) as dt

  from test_table as a,lt_starter_per_record as b

  where b.record_number = a.record_number)),

  lt_compute_range as

  (select record_number,create_date,

  add_seconds(add_days(dt,-ind_day_range),sec_starter) as dt_range_ini

  from lt_day_range),

  lt_pick_the_ones as

  (select record_number,create_date from

  (select record_number,min(create_date) as create_date,dt_range_ini

  from lt_compute_range

  group by record_number,dt_range_ini))

  select * from lt_pick_the_ones order by 1,2

;

After functionally work, tested also performance adding 10 mi rows in a one year range with script below:

select top 10000000

round(rand()*1000)+1000 as record_number,

add_seconds(add_days(to_date('2000-01-01'),rand()*365),round(rand()*86400,0)) as create_date

from objects,objects

into test_table;

The execution with all data with no filter was 2,5 seconds.

Statement 'with lt_starter_per_record as (select ...'

successfully executed in 2.641 seconds  (server processing time: 2.407 seconds)

Fetched 1000 row(s) in 491 ms 283 µs (server processing time: 0 ms 734 µs)


This was what I said earlier, but reading your comments, and the blog and John's also I understood that there's also a need for range different than this thread (24 hours).


I guess it could be done changing the way group is calculated on lt_day_range / lt_compute_range using a formula like:

- use interval in seconds (example 1209600 seconds =  14 days)

- calculate first_date_time like this one

- compute each date subtracting first_date_time in seconds. Divide it by internal (1209600)... take the integer part

- use this integer for grouping


If it's not clear I can give a try on it later.


Best regards, Fernando Da Rós