Predicting Wine Quality with SAP Predictive Analysis 1.0.16
I wanted to share my first impressions on the SAP Predictive Analysis 1.0.16 release which I was fortunate enough to get an early version of. In order to make it more interesting & reader friendly I have used SAP Predictive Analysis to predict Wine Quality in an end-to-end scenario with actual data.
This is my personal view and does not necessarily represent the views of SAP & development might change functionality, timeline etc.
Overall my first impression is that this is absolutely most awesome. The main new addition to Predictive Analysis is the algorithm based on the famous KXEN Classification algorithm – named InfiniteInsight Classification including a stunning new charting capability.
Predicting Wine Quality with SAP Predictive Analysis 1.0.16 - end-to-end presentation:
The new algorithm based on InfiniteInsight Classification:
Using the InfiniteInsight Classification to visualize the variable contributions:
Using the InfiniteInsight Classification algorithm to visualize a Gain chart:
On other new addition is the enhanced Confusion Matrix:
SAP PA 1.0.16 Confusion Matrix details :
The capabilities of SAP Predictive Analysis can easily be extended with R based functions. As shown below SAP PA is enhanced with a visualisation of the correlation of the variables. Here the interesting part is that Alcohol has a positive correlation with Wine Quality - increase in Alcohol also seems to increase the quality of the wine. Likewise the volatile_acidity is negatively correlating with the Wine Quality.
Interpreting the correlation:
Happy predicting - the approach showcased in the video could of course also be applied to other predictive use cases.
SAP Global Predictive Services team
The Wine data-set is public available from: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality
A slightly modified version is attached.
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
12 - quality (score between 0 and 10)
13 - quality_y_n (quality > 6) for use with InfiniteInsight bi-variant classification algorithm.
Correlation script: http://scn.sap.com/docs/DOC-48269
Correlation-coefficient Pearsons r. ("Statistisk problemløsning. Præmisser, teknik og analyse.", Svend Kreiner, 2007, 2. Udgave. Jurist- og økonomforbundets forlag. ISBN: 978-87-574-1686-9).