on 09-15-2015 9:48 AM
Dear all,
I would like to know, if HANA can use streaming algorithms to determine results of top-N queries.
suppose table_a and table_b:
create column table table_a (
id bigint not null primary key generated by default as identity,
val_a numeric
);
create column table table_b (
id bigint not null primary key generated by default as identity,
val_b numeric
);
My query goes like this:
select top 10 * from (
select a.id, b.id, val_a*val_b as similarity
from table_a a
, table_b b
order by val_a * val_b desc
)
where table_a and table_b are large tables, so that the cartesian join will not fit in the memory at hand (that means, a couple of millions each). .
This query works for small numbers of rows, but seems to materialize the complete result set, although this can be achieved with constant memory using a streaming algorithm.
Does HANA have streaming algorithms for this problem available?
Is there a way to tell HANA to use these streaming algorithms?
Can one use this to find the top-N matching elements in b for each element in a? For example by using rank() over (partition by a.id order by val_a*val_b desc)
Thanks in advance,
Jens
There's no way in SQL to do just that in a 'streaming' fashion.
However, for this specific case you might want to look into the option to proof that
TOP 10 ( val_a) X TOP 10 (val_b) = TOP 10 (val_a x val_b)
If you can do that, then you don't need the streaming algorithm and can save a lot of time and memory.
For things like "top n val_bs for a given val_a" the co-related sub-query would be the weapon of choice.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
80 | |
24 | |
11 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.