Solved: GMM prediction function

former_member186543 · ‎09-15-2016

Hi team,

I am trying to model a GMM model to find probability distribution of input features and finally mark them as anomalous or not.

I have run the GMM PAL and got back in return the probability and the model table as per my training set. I want to understand, now which is the best PAL function that we can use to predict probability value on each new value of parameter on GMM predicted model.

SAP documentation has no information about it.

Ex: K-means PAL function is used to train a kmeans model and CREATEDT , PREDICTWITHDT functions are used to predict kmeans classification for new values but I am not sure if we can choose the same for GMM as well, since we need probability value here and not only the cluster ?

My understanding is that may be we can use PREDICTWITHDT directly on the GMM received model values to predict, however we need the probabilities for each new input for each cluster in Gaussian space, similar to the outputs we receive in GMM. Please advise !

Update: I used the JSON model from GMM and passed it to PREDICTWITHDT function and getting the below error message now:

Could not execute 'CALL "HRAFIQ".PAL_DT_SCORING_PROC(PAL_DT_SCORING_DATA_TBL, #PAL_CONTROL_TBL, ZPREDICTED_MODEL, ...' in 651 ms 411 µs .

SAP DBTech JDBC: [423]: AFL error: search table error: _SYS_AFL.AFLPAL:PREDICTWITHDT: [423] (range 3) AFL error exception: exception 73001060: PAL error[73001060]:Internal error. Check trace for details.

Thanks,

Hasan

Former Member · ‎09-21-2016

Hi Hasan,

GMM is a clustering algorithm and is usually seen as un-supervised learning algorithm. Decision tree is supervised learning algorithm and PREDICTWITHDT is used for model trained from C4.5, CHAID, and CART. We cannot pass a GMM cluster result to a decision tree prediction function. Even for different model trained by different supervised algorithms, we should use corresponding scoring functions.

As to your question, I understand you want to apply new data points to get an estimate of the probability belonging to each cluster. As this is unsupervised learning, usually there is no such a cluster assignment function. The reason is that there is no guarantee that the new data come from the same distribution from the original data. This is specially true for outlier detection. If there is a new type of outlier, the assignment will mark the new outlier into the existing clusters, which might be a mis-clustering.

In PAL, there is a cluster assignment function which assign new data points to existing clusters generated by cluster algorithm under the assumption that the user is aware that the new data come from the same distribution. Unfortunately, GMM is not yet supported. In your case, if the data are not huge, you can re-run GMM with the new data and get the new clusters and probabilities.

Best regards,

Xingtian

achab · ‎09-16-2016

Hi Hasan, I am looping in

Best regards

Antoine

GMM prediction function

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: Enqueue Server Locking versus DB Locking

Re: Monitoring Apps on Kyma

Re: How to Auto fill Planning Columns in SAP SAC

Re: Filter script to evade BW variable

Re: show count of rows in a table in SAC