cancel
Showing results for 
Search instead for 
Did you mean: 

Streaming algorithms in HANA?

Former Member
0 Kudos

Dear all,

I would like to know, if HANA can use streaming algorithms to determine results of top-N queries.

suppose table_a and table_b:

create column table table_a (

   id bigint not null primary key generated by default as identity,

   val_a numeric

);

create column table table_b (

  id bigint not null primary key generated by default as identity,

  val_b numeric

);

My query goes like this:

select top 10 * from (

  select a.id, b.id, val_a*val_b as similarity

    from table_a a

          , table_b b

  order by val_a * val_b desc

)

where table_a and table_b are large tables, so that the cartesian join will not fit in the memory at hand (that means, a couple of millions each). .

This query works for small numbers of rows, but seems to materialize the complete result set, although this can be achieved with constant memory using a streaming algorithm.

Does HANA have streaming algorithms for this problem available?

Is there a way to tell HANA to use these streaming algorithms?

Can one use this to find the top-N matching elements in b for each element in a? For example by using rank() over (partition by a.id order by val_a*val_b desc)

Thanks in advance,

Jens

Accepted Solutions (0)

Answers (1)

Answers (1)

lbreddemann
Active Contributor
0 Kudos

There's no way in SQL to do just that in a 'streaming' fashion.


However, for this specific case you might want to look into the option to proof that

TOP 10 ( val_a)  X TOP 10 (val_b) = TOP 10 (val_a x val_b)

If you can do that, then you don't need the streaming algorithm and can save a lot of time and memory.

For things like "top n val_bs for a given val_a" the co-related sub-query would be the weapon of choice.