cancel
Showing results for 
Search instead for 
Did you mean: 

Customizing Decision Tree in PAL 2.4

Former Member
0 Kudos

Hi, I am currently learning Predictive Analytics for a project. I was trying out the Decision Tree in Automated mode. My explanatory variables are some companies names, such as Google, Apple, Microsoft, Cisco etc. My target variable is Gender.

After running the model, the software has automatically classified the company Google and Apple into one node, and all other companies (I have 16 different company names) available in my data into another node called KxOthers. I can't choose the number of branches I want and it's difficult to see each company's prediction this way. How can I make it so that I can customize the number of branches and show each individual company names? For example, I would be interested to see Google in one node, and Apple in another node itself.

I have tried the Simulation, but for some reason, only a few company names are appearing in the list instead of all 16 of them. See below: Only Apple, KxOther and Microsoft appeared in the list, the rest of the companies names are not listed.

Any advice on how to resolve this problem? I have attached a sample data (50 rows) if you would like to run it.

Thanks a lot.

Accepted Solutions (1)

Accepted Solutions (1)

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Rin,

Can you please attach the sample data?

Two points here:

- In Automated Analytics, we do not build a decision tree - but rather derive the decision tree from the built classification or regression model. The classification or regression algorithm encodes the data in a certain way, especially for nominal values, it will group values showing similar behaviors using the KxOther category. As you can in your screenshot, the group [Apple, Google] tends to reduce the percentage of positive target.

Now if you want to predict a value for a particular company that's not Apple or Google, you would have to use the KxOther category in the "Simulating the Model" screen. BTW it's strange that the values listed there are Apple and Microsoft.

- For a more "classical" decision tree algorithm, you might want to give it a try with Expert Analytics as it provides possibilities using R algorithms. You can even compare the performance of this approach side by side with the Automated one.

Maybe it will ease the understanding if you can explain what is your use case and what you are trying to achieve.

Best regards & thanks

Antoine

Former Member
0 Kudos

Hi Antoine

Please download the sample dataset using the google drive link as I can't attach in this discussion box...

Sample Data (100 Rows).xlsx - Google Drive

My use case is:

I am trying to find out the probability that a student will be choosing which companies, e.g. what is the probability that a student from IT/Computing (Faculty) school, taking their Diploma (Academic Level) will choose SAP or Microsoft as their Top preferred company.

I have not tried Expert mode yet. Do I need to install R packages before I can perform the decision tree algorithm?

Thanks a lot.

Rin

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Yes, you need to install R packages.

Former Member
0 Kudos

Thanks Antoine

Will my current data (based on the sample I have provided previously) be able to construct a decision tree? Or would I need to reformat my data? I am rather unsure of how to go about doing this as I am new to the tool..

Appreciate your kind help.

Rin

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

I need to drop-off now but I will look in detail tomorrow.

Former Member
0 Kudos

No problem, thank you for your kind help.

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

I had a look to your file.

If the output variable to predict is the top preferred company like Google or SAP, you need to rework the file so that you can generate as many binary variables as outputs you would like to predict.

For instance, I created a variable "Output" with yes/no values.

Yes means the student wants to work at Google in priority, no he does not want.

Once you have done that, you can try to explain the output variable using the input variables.

You can use as many output variables - it means you can generate the models for every company in one shot.

Here are some results on the "Google" model:

The model uses 11 variables, it sounds robust (predictive confidence = 0,94) but explain only one part of the output variables (predictive power = 0,55). This means you need more samples to improve the confidence as usually we trust models > 0,95 and you definitely need more variables to explain the student choice.

As an example when we click on Country, we see that Singapore as a Country is influencing the choice of Google as a top pick.

Let's say you follow the multi-target model approach I suggested earlier on, when you apply these models to new individuals, filling all the input variables, this will give you the different probabilities corresponding to the choices of this or this particular company.

Please note that the sample data file that you provided is very small and for instance, we only have few entries for SAP. It is not possible for the product to build reliable models on super-small data sets.

Hope this helps,

Best regards

Antoine

Former Member
0 Kudos

Thanks so much Antoine. You have been so helpful towards a beginner like me!

May I clarify if I can perform the analysis in Automated mode so long as I rework my data? How about the decision tree model that I tried out previously?


Should I use the R-CNR Decision tree in Expert mode? Or a decision tree does not make sense for my use-case at all?


Thanks again, Antoine.


Best Regards

Rin

achab
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi,

Here is my answers to your questions

May I clarify if I can perform the analysis in Automated mode so long as I rework my data?

Yes, as I explained


How about the decision tree model that I tried out previously?

we are now discussing about a classification use case and decision tree models can be derived from classification models in Automated Analytics


Should I use the R-CNR Decision tree in Expert mode? Or a decision tree does not make sense for my use-case at all?

You can definitely try it as well as other classification algorithms available in Expert Analytics (I have seen at least 4 of them). You can also use our model comparison feature to compare the performance of the different algorithms including the Auto Classification one, which is exactly the same as the one in Automated Analytics. Please refer to the Expert Analytics user guide for more information, see SAP Predictive Analytics 2.4 – SAP Help Portal Page


I gave it a try this morning but it's better if you try & learn by yourself.


Happy predictive,


Antoine


Can you please kindly close the question in case you think it's answered for now? You can always reopen new questions if need be.





Former Member
0 Kudos

I see! Thanks a lot, Antoine. Will definitely try out the different algorithms and open another disucssion should problems arise.

Thank you!

Answers (0)