on 03-11-2016 7:32 AM
Hello everybody,
I am having problems with the PAL-CHAID Algorithm in SAP PA 2.4.
When I am using HANA online and let the CHAID Algorithm run on about 30 independent Variables
I don’t get any results back, it is loading forever.
And I don’t think I should need to wait longer. I only use VARCHAR, INTEGER and DOUBLE Data Types.
The HANA trace gives me the following:
[6154]{200947}[57/59567858] 2016-03-10 21:01:18.295235 i TraceContext TraceContext.cpp(00827) : UserName=HGUESSMANN, ApplicationUserName=hguessmann, ApplicationName=SAPVisualIntelligence
[6154]{200947}[57/59567858] 2016-03-10 21:01:18.295227 e CalcEngine ceRepositoryAccessor.cpp(00066) : RepositoryAccessor::getCalculationScenario(): for scenario 'HGUESSMANN:PAS79_3_PROC' failed
[6125]{219478}[65/59568461] 2016-03-10 21:10:05.844571 i TraceContext TraceContext.cpp(00827) : UserName=HGUESSMANN, ApplicationUserName=hguessmann, ApplicationName=SAPVisualIntelligence
[6125]{219478}[65/59568461] 2016-03-10 21:10:05.844564 e CalcEngine ceRepositoryAccessor.cpp(00066) : RepositoryAccessor::getCalculationScenario(): for scenario 'HGUESSMANN:PAS80_READER_0_PROC' failed
[24678]{219478}[65/59568590] 2016-03-10 21:10:17.362373 i TraceContext TraceContext.cpp(00827) : UserName=HGUESSMANN, ApplicationUserName=hguessmann, ApplicationName=SAPVisualIntelligence
[24678]{219478}[65/59568590] 2016-03-10 21:10:17.362365 e CalcEngine ceRepositoryAccessor.cpp(00066) : RepositoryAccessor::getCalculationScenario(): for scenario 'HGUESSMANN:PAS80_1_PROC' failed
[13937]{-1}[-1/-1] 2016-03-10 21:10:54.410049 e TrexNet Request.cpp(00741) : ERROR: new Request without host!
[13937]{-1}[-1/-1] 2016-03-10 21:10:54.410177 e Executor X2.cpp(04909) : failed to send listPlan request to an invalid parameter was given
[13937]{-1}[-1/-1] 2016-03-10 21:11:24.411503 e TrexNet Request.cpp(00741) : ERROR: new Request without host!
[13937]{-1}[-1/-1] 2016-03-10 21:11:24.411622 e Executor X2.cpp(04909) : failed to send listPlan request to an invalid parameter was given
[13937]{-1}[-1/-1] 2016-03-10 21:11:54.412741 e TrexNet Request.cpp(00741) : ERROR: new Request without host!
[13937]{-1}[-1/-1] 2016-03-10 21:11:54.412866 e Executor X2.cpp(04909) : failed to send listPlan request to an
…
The last part goes on forever, even if I killed SAP PA in the task manager.
Here is the tricky part, when I only use a few variables I get results.
I would appreciate your help.
Using less variables is not really a solution for me.
Additional question: when do through SAP PA created Procedures "PAS##_PROC" get deleted on HANA?
Hi Heiko,
Are you able to isolate which variable(s) are causing the problem to appear?
Thanks & regards
Antoine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Antoine,
I did not come close to identify the trouble causing variables, because I have to restart SAP PA everytime it crashes. This is taking a long time.
The error, something like: "Execution plan aborted. Transaction rolled back" popped up one time.
I don't have the exact message anymore.
Do you have a hint, what could be wrong with the variables?
Thanks for the quick reply!
Best regards,
Heiko
When I use SAP HANA Studio and use the CHAID PAL-Function directly, it is in progress forever, as well. I think there is a general problem with my data. Are there any known limitations with CHAID?
I attached the logfile.
Thank you.
Best regards,
Heiko
I see this
Prerequisites
● The target column of the training data must not have null values, and other columns should have at least one valid value (not null).
● The table used to store the tree model is a column table.
Note CHAID treats null values as special values.
http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf
P148
Thanks & regards
Antoine
Also spotted this in our release restriction note - might be the reason
In Expert Analytics, HANA CHAID algorithm performance is inversely proportionate to the number of distinct values for categorical features in the training dataset. As a result, when using Expert Analytics, if you do not find the performance of CHAID optimal for your use case, it is recommended to use other decision tree algorithms such as HANA C4.5. Note: Additional configurable parameters will be exposed in a future release of Expert Analytics to allow a threshold parameter that will help stop merging of categories beyond a specified threshold. This will allow use of CHAID algorithm with all kinds of datasets.
Hi,
PAL C4.5, Auto Classification and R-Algorithms work without a problem. I recognized that there are still execution threads open in SAP HANA from calling CHAID a few days before. As I can see some from SAP PA and some from calling them from the AFM, so I need to shut them down manually.
I will move on using other algorithms or try binning with CHAID.
Thank you!
User | Count |
---|---|
78 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.