cancel
Showing results for 
Search instead for 
Did you mean: 

KNN Text Classification Execution Time in HANA

Former Member
0 Kudos

Hi folks,

I have a question about the duration of the KNN Text Categorization (TM_CATEGORIZE_KNN function in Text Mining) in HANA.

I have a train (labelled) table which has 24600 records. It has text, maincategory, subcategory columns where maincategory and subcategory are my labels. Additionally, I have another table which has 8095 records to be predicted for each label. When I start the process, it takes about 140 seconds to finish all predictions for 8095 record (both for two labels) and insert the results into one final table. What will happen, when I have 8 million records to be predicted (assuming the train table size will remain same -actually it may increase as well-)? Will it take 140000 seconds which is about 38 hours? Is that normal or is there a way to increase the speed of the process?

Note: I am using aws r3.2xlarge instance type which has 8 cores, 61GB memory, 1x160 GB SSD. Version is 1.00.110.00.1447753075


For this process, I created an outer procedure (KNN_CHURN_TEST_OUTER) which reads unlabelled records from a table, and an inner procedure (KNN_CHURN_TEST_INNER) which makes predictions for a record (I have two labels, so it makes two predictions for each record). For each record, I call inner procedure from outer procedure.


Thanks,

Inanc


Here is the outer and inner procedures.




CREATE PROCEDURE "SYSTEM"."KNN_CHURN_TEST_OUTER" ()

    LANGUAGE SQLSCRIPT

    AS

BEGIN

/*****************************

    Write your procedure logic

*****************************/

DECLARE new_text NCLOB;

DECLARE id INT;

DECLARE CURSOR c_products FOR

SELECT "id","text_data"

        FROM "SYSTEM"."AVEA_CHURN_TABLE_TEST";

FOR cur_row as c_products DO

  new_text := cur_row."text_data";

  id := cur_row."id";

    call "SYSTEM"."KNN_CHURN_TEST_INNER" (id, new_text);

END FOR;

END;


CREATE PROCEDURE "SYSTEM"."KNN_CHURN_TEST_INNER" (IN id INT, IN new_text nclob)

    LANGUAGE SQLSCRIPT

    AS

BEGIN

/*****************************

    Write your procedure logic

*****************************/

DECLARE sub_cat NVARCHAR(128);

DECLARE main_cat NVARCHAR(128);

DECLARE num INT := 0;

DECLARE num2 INT := 0;

DECLARE CURSOR c_products FOR

SELECT T.CATEGORY_VALUE, T.NEIGHBOR_COUNT, T.SCORE

  FROM TM_CATEGORIZE_KNN(

    DOCUMENT :new_text

      MIME TYPE 'text/plain'

    SEARCH NEAREST NEIGHBORS 22 "text"

      FROM "SYSTEM"."aveaLabelledData"

    RETURN top 1

      "main_category"

      from "SYSTEM"."aveaLabelledData"

     ) AS T;

DECLARE CURSOR c_products2 FOR

SELECT T.CATEGORY_VALUE, T.NEIGHBOR_COUNT, T.SCORE

  FROM TM_CATEGORIZE_KNN(

    DOCUMENT :new_text

      MIME TYPE 'text/plain'

    SEARCH NEAREST NEIGHBORS 22 "text"

      FROM "SYSTEM"."aveaLabelledData"

    RETURN top 1

      "sub_category"

      from "SYSTEM"."aveaLabelledData"

     ) AS T;

open c_products;

begin

   FOR cur_row as c_products DO

  main_cat := cur_row."CATEGORY_VALUE";

      num := num + 1;

 

   END FOR;

   IF :num = 0

  THEN

  main_cat := 'unknown';

   END IF;

end;

close c_products;

open c_products2;

begin

FOR cur_row2 as c_products2 DO

  sub_cat := cur_row2."CATEGORY_VALUE";

    num2 := num2 + 1;

END FOR;

IF :num2 = 0

  THEN

  sub_cat := 'unknown';

END IF;

end;

close c_products2;

insert into "SYSTEM"."KNN_RESULTS" values (:id, :new_text, main_cat, sub_cat);

commit;

END;



Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

Hi guys,

The issue still persists. Still need your help

lucas_oliveira
Advisor
Advisor
0 Kudos

Hello Inanc,

That kind of detail is not clear in the documentation indeed. My suggestion is to perform a volume test (within your expectations regarding data volume and parallel access) and see for yourself how does the algorithm scales.

Further than that I can only think of getting in touch with development. If that's your option, please open an SAP support ticket for that and provide as much detail as possible.

BRs,

Lucas de Oliveira

Former Member
0 Kudos

Thanks Lucas, I will try to make a volume test and see what will happen. SAP Support Ticket is another option that I didn't know. Thanks for the suggestion.