on 09-14-2014 7:37 PM
Hi Experts,
I have a scenario in which I have a column with Boolean values and need to calculate the cumulative values in the interval where there are consecutive ones as mentioned below,
ID Date column 1
1 9-14-2014 14:22:00 1
1 9-14-2014 14:25:00 1
2 9-14-2014 13:22:00 0
1 9-14-2014 15:02:00 0
1 9-14-2014 14:37:00 0
2 9-14-2014 14:25:00 1
2 9-14-2014 14:32:00 1
1 9-14-2014 14:05:00 1
2 9-14-2014 14:45:00 0
2 9-14-2014 14:59:00 0
1 9-14-2014 15:12:00 1
1 9-14-2014 15:18:00 1
1 9-14-2014 15:21:00 1
First needs to group by 'ID' and Order By ' Date' and calculate the calculated column 'cumulative sum' for the consecutive ones in 'column1' as
ID Date column 1 Cumulative sum
1 9-14-2014 14:05:00 1 1
1 9-14-2014 14:22:00 1 2
1 9-14-2014 14:25:00 1 3
1 9-14-2014 14:37:00 0 0
1 9-14-2014 15:02:00 0 0
1 9-14-2014 15:12:00 1 1
1 9-14-2014 15:18:00 1 2
1 9-14-2014 15:21:00 1 3
2 9-14-2014 13:22:00 0 0
2 9-14-2014 14:25:00 1 1
2 9-14-2014 14:32:00 1 2
2 9-14-2014 14:45:00 0 0
2 9-14-2014 14:59:00 0 0
Is there any function and way to calculate this without loops in procedure?
Please help!!. Thank you!!
-Gayathri
create column table TAB(ID int, DATE TIMESTAMP, VAL int);
insert into tab values(1,'2014-09-14 14:22:00',1);
insert into tab values(1,'2014-09-14 14:25:00',1);
insert into tab values(2,'2014-09-14 13:22:00',0);
insert into tab values(1,'2014-09-14 15:02:00',0);
insert into tab values(1,'2014-09-14 14:37:00',0);
insert into tab values(2,'2014-09-14 14:25:00',1);
insert into tab values(2,'2014-09-14 14:32:00',1);
insert into tab values(1,'2014-09-14 14:05:00',1);
insert into tab values(2,'2014-09-14 14:45:00',0);
insert into tab values(2,'2014-09-14 14:59:00',0);
insert into tab values(1,'2014-09-14 15:12:00',1);
insert into tab values(1,'2014-09-14 15:18:00',1);
insert into tab values(1,'2014-09-14 15:21:00',1);
select ID,DATE,VAL,row_number() over (partition by ID,GROUPID order by DATE) as CUMULATIVE_SUM from (
select *,(row_number() over (partition by id order by date)-row_number() over (partition by id,val order by date)) as GROUPID
from TAB
)
where VAL = 1
union all select ID,DATE,VAL,VAL as CUMULATIVE_SUM from TAB where VAL = 0
order by ID,DATE
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This is achieved using a window function. There's a good example in this SCN post. It's a SQL thing not a HANA-specific thing.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just to be clear, you need to sum 1 if it's 1 and = 0 if it's 0?
If so, the same LAG() function I talked about in the other question you made should suffice.
Something like:
select id, date, col1, ((lag(col1, 1, 0) over (order by id, date) +col1)*col1) as cummulative_sum from table
The idea is to get the last value from the cummulative sum and sum with the current column 1 value and then multiply by the same column 1 current value. If it's 0, you'll get 0, if it's 1, you'll get it incremented by 1, since the multiplication by 1 is the same value.
This only works if column 1 is always either 0 or 1.
If it can assume other values, then use a case clause checking over the value of column 1:
select id, date, col1, (case col1 when 0 then 0 else col1 + lag(col1,1,0) over (order by id, date) end) as cummulative_sum from table
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Henrique,
the SQL,
select id, date, col1, ((lag(col1, 1, 0) over (partition by ID order by date) +col1)*col1) as cummulative_sum from Table
Give the result as mentioned below, which is not the requirement I am looking for . I have also given the correct values of my requirement below,
ID Date column 1 Cumulative sum correct Cum_sum
1 9-14-2014 14:05:00 1 1 1
1 9-14-2014 14:22:00 1 2 2
1 9-14-2014 14:25:00 1 2 3
1 9-14-2014 14:37:00 0 0 0
1 9-14-2014 15:02:00 0 0 0
1 9-14-2014 15:12:00 1 1 1
1 9-14-2014 15:18:00 1 2 2
1 9-14-2014 15:21:00 1 2 3
2 9-14-2014 13:22:00 0 0 0
2 9-14-2014 14:25:00 1 1 1
2 9-14-2014 14:32:00 1 2 2
2 9-14-2014 14:45:00 0 0 0
2 9-14-2014 14:59:00 0 0 0
we need start the cumulative sum for every time the column 1 has consecutive 1s
Hi Henrique,
The SQL,
SELECT id, date, col1, sum(col1) over (partition by col1 order by id, date) as cumulative_sum
will give me the cumulative value for all the col 1 with same value which is not my requirement,
the above will give the result as shown below. I have also added the correct values as per my requirement as well
ID Date column 1 Cumulative sum correct ***_sum
1 9-14-2014 14:05:00 1 1 1
1 9-14-2014 14:22:00 1 2 2
1 9-14-2014 14:25:00 1 2 3
1 9-14-2014 14:37:00 0 0 0
1 9-14-2014 15:02:00 0 0 0
1 9-14-2014 15:12:00 1 4 1
1 9-14-2014 15:18:00 1 5 2
1 9-14-2014 15:21:00 1 6 3
2 9-14-2014 13:22:00 0 0 0
2 9-14-2014 14:25:00 1 1 1
2 9-14-2014 14:32:00 1 2 2
for my requirement , I need to find the cumulative value only for the consecutive 1s if there are 0s in between then there should be a break and cumulative sum should start again for the next set of consecutive 1s
Hope you could help. Thanks.
-Gayathri
Apologies that my replies gave such an impression. I just wanted to explain my requirements correctly. but, Even before starting this discussion I have tried all these solutions you provided and also some other possibility of using window functions, self joins and so .
Only possibility which is giving an correct answer would be using loops in SQL Script but this will lead to performance issue whit large number of data which I do not want. That's the reason I posted this to know if I could take a fresh look of the problem other than using the same methods mentioned in my other discussion
Try taking my advice above and post on StackOverflow, and ask how to do this in SQL. I don't believe there is a way because you are looking for a function for which I can't see an obvious set-based solution.
Whatever you do, don't use SQLScript loops for this, they will suck.
Instead, bring the result into an array in the front end and write some C++/Java/Python/JavaScript to achieve your result.
Why do you think performance would be a problem for this? I don't see a way to aggregate your dataset so you must be supplying the whole dataset back to the client. Presumably it's not all that big.
You could have just said all that in your original post and save my and (everyone else who spent some thought into it)'s time.
As John said, this is not a HANA specific question but rather a SQL question.
StackOverflow is a good starting point, but it brings to me the realization that we do not have a proper place within SCN to ask SQL questions. We have for ABAP & Java, might be time for us to have a space for SQL & JavaScript as well.
not clear about the requirement
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Sreehari,
I will simplify the requirement , below is the sample data, the explanation of the calculation involved,
ID Date column 1 Cumulative sum Comment
1 9-14-2014 14:05:00 1 1 (col1) =1
1 9-14-2014 14:22:00 1 2 (col1+previous cumulative sum) =1+1
1 9-14-2014 14:25:00 1 3 (col1+Previous cumulative sum) =1+2
1 9-14-2014 14:37:00 0 0 (col1)=0
1 9-14-2014 15:02:00 0 0 (col1)=0
1 9-14-2014 15:12:00 1 1 (col1)=1
1 9-14-2014 15:18:00 1 2 (col1+previous cumulative sum)=1+1
1 9-14-2014 15:21:00 1 3 (col1+previous cumulative sum)=1+2
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.