cancel
Showing results for 
Search instead for 
Did you mean: 

Remove Duplicate products on HANA

Former Member
0 Kudos

Hello all,

First I'm going to explain what I'm doing:

I'm using predictive analysis to create suggestions (produtcs) for the costumer. For example: who buy bread, usually buy milk.

But I'm having problem to run that algorithm (APRIORI) because I need to remove duplicates first.

Then a guy (Bimal) helped me by creating an algorithm that removes duplicates, but when I add more columns (for example from place) it gives an error. Than he suggested me remove duplicates on HANA! Does anyone here knows how can I do that?

I'm going to explain a "fake" table:

    Product          UserID          Store          Purchase Nº

        A                1234               Aa                  1

        A                1234               Aa                  1

        B                1234               Aa                  1

        C                2345               Bb                  2

        A                1234               Bb                  2

        C                2345               Aa                  3

At this example, you can see that the user 1234 bought product A 3 times, but in different Stores, does anyone know how can I do that to remove the duplicates at the same purchase number?

Regards!

Accepted Solutions (1)

Accepted Solutions (1)

yeushengteo
Advisor
Advisor
0 Kudos

Hi,

If you issue a SQL distinct * function on the table, you will be able to get all the non-duplicate records.

Not sure next how you are going to implement it. I supposed you can bring the non-duplicate records into a new table assuming you cannot remove record from the original master table.

Regards.

YS

Former Member
0 Kudos

Thank you!

but I don't know anything about SQL, so I'll try to learn something about how it works, and then I'll do your tip.

Regards

Answers (1)

Answers (1)

former_member182302
Active Contributor
0 Kudos

Hi Jurgen,

Either you can use Distinct or use group by clause like below:

select "Product","UserId","Store","PurchaseNo"

from "BEST"."Test"

GROUP BY

"Product","UserId","Store","PurchaseNo"


Use this in a script based view on HANA side and use it further for your algorithms as input to ensure that no duplicate records comes while further using APRIORI alogrithms

Regards,

Krishna Tangudu