on 11-20-2015 1:53 PM
Hello Experts,
I am working on SAP HANA PAL SPS09. Currently I am using Naive Bayes Classification Algorithm for one of my use case. Please let me know how to get Predictive Power and Confidence in HANA PAL Naive Bayes classification algorithm or how to calculate the error margin for the predicted output. Also, let me know how to get which all attributes contributed the most for my prediction like it appears in SAP Predictive Analytics Tool.
Thanks,
Pragati Gupta
Hello Pragati,
In PAL, we do support
- AUC calculation (since SPS11)
- the parameter selections dan model evaluation functions for cross validationa may also be helpful
- also since SPS11, we do support Random Forest, which additionally provides Variable Importance output.
Regards,
Christoph
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thank you Christoph Morgen, Xingtian Shi for your inputs...
Currently I am working on SPS09, I tried APL for my use case and I was able to get the top contributing attributes along with their individual contribution value. I'll definitely try using PAL in SPS10 and 11 versions.
Regards,
Pragati Gupta
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Gupta,
Unfortunately, the log conditional a-posterior probabilities for each class is not directly returned from the prediction function in SPS09. We've added it recently and this will be delivered in the next revision if possible. To workaround, you could extract 1) the probability of the independent variables given the class and 2) the probability of the class from the PMML file returned from the Naive Bayes training. You can refer to http://dmg.org/pmml/v4-1/NaiveBayes.html for how to interpret the file. For continuous variable, we return mean and variance of the normal distribution. You can calculate the the scores in your own logic. Of course, I understand it is not convenient. In PAL, decision tree (all the three), logistic regression including multi class log regression, and random forest (SPS11 only) will return the prediction probability. I will recommend you to use these algorithms if possible. In PAL SPS10 and 11, we also have confusion matrix, ROC, AUC for model evaluation.
Best regards,
Xingtian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Have you also considered the two following complementary options:
1. Installing APL on your HANA system and using the HANA Auto Classification algorithm?
2. Using Automated Analytics on top of your SAP HANA data?
Cheers,
Antoine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Antoine
Actually I wanted to use Naive Bayes algorithm in particular for classification. Yes APL is there but as its more of automated, we can't really tweak with the coefficients to reduce error margin.
Right now I am unable to figure out the error margin/ summary of my model and the attributes which are contributing to the predictions. In SAP PA tool it otherwise, we get a proper summary, prediction power/confidence, variable contributions etc.
Couldn't really find any material also on this topic. Kindly help.
Regards
Pragati
But you will not get the most influencing variables as per my understanding. I would recommend you to position & compare the performance of the different models you can build, in most cases automated will provide the right balance between performance, simplicity and handling of complex data.
In terms of material, the user guide for Expert Analytics rule - see http://help.sap.com/businessobject/product_guides/pa23/en/pa23_expert_user_en.pdf
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.