cancel
Showing results for 
Search instead for 
Did you mean: 

Maximum size / complexity of ESP Model

Former Member
0 Kudos

Hi everyone,

First of all I would like to say hello to the community as this is my very first posting here. I am working for the SAP Innovation Centre in Potsdam Germany and there we are working on a project together with Bigpoint, an online gaming company where we are using Sybase ESP 5.0 to stream live event data of Bigpoint's online games.

In this context and as an ESP newbie I have tons of questions regarding CCL, Splash, Adaptors, Pubsub and all that stuff which I hope to get answered here.

For the start I have a question regarding the maximum size or complexity of ESP models in 5.0.

To understand the problem, I should say that we have a rather complex problem that we would like to solve with ESP. We want to collect Vectors of event-fields for each and every active user in a game reflexing their ingame behaviour. We have a very sparse event landscape with about 250 different events and over 2000 fields in total.

Based on metadata about this event landscape we generate our ccl file.

We have tried different approaches in generating the ccl with different results.

Now we are at a point were we seem to have reached a maximum complexity in ccl in some dimension. When trying to compile our ccl we get the following error message from the compiler:

terminate called after throwing an instance of 'std::out_of_range'

  what():  vector::_M_range_check

Aborted

Does anybody have information about how complex models can get?

Here are some numbers describing our model:

~38.000 lines of code in ccl

~250 input streams one for each event

max ~ 150 fields for single input streams describing the fields for the event

--> one flex stream with ~250 IN - streams

--> one flex stream with ~2000 Fields in the Schema

The discussion why we have such big numbers for the model is a different one and we are thinking of the need to have that on our end anyways. At the moment it would simply be interesting for us to know which of the numbers is causing issues here and resulting in the out_of_range error.

Best regards,

Dave

EDIT: I just cut the metadata in half and tried this out. 13012 lines of ccl took ~ 11 minutes to compile during wich the compiler allocated ~7 GB of RAM. The resulted .ccx file has ~150 MB. Trying to start it results in the following erros in esp_server.log:

2012-09-11 17:43:58.421 | 25693 | container | [SP-2-720005] (31.885) sp(9159) Manager.registerApplication() status=failure

2012-09-11 17:43:58.421 | 25693 | container | [SP-2-720002] (31.885) sp(9159) ClusterContainerHeartbeatThread::execute() container registration failed

2012-09-11 17:43:58.421 | 9159 | container | [SP-3-100005] (31.885) sp(9159) Cannot register project with cluster or was requested to stop...stopping

Accepted Solutions (1)

Accepted Solutions (1)

JWootton
Advisor
Advisor
0 Kudos

Dave - a bit more info. I discussed with the engineering team. They said also confirmed there are no hard limits, just resource limits. They said you shouldn't be seeing the error that you're seeing, and should open a support case.

Former Member
0 Kudos

Ok, thanks for the info. That leaves only one question open: How/where do I open a support case?

Former Member
0 Kudos

I just sent the instructions for opening a case as an SAP employee to you via DM.

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi everyone.

Thank you mike and Jeff for the instructions to open support cases. I will go forward and do so, as I understood this should not be occuring.

However, for our use case we found a suitable workaroudn/redesign to get it working with a way leaner model!

Here is some explanation:

As I said our first approach was to save metadata and then generically generate our ESP model based on that data. For a metadata base with over 2000 distinct fields in ~250 distinct events the resulting model was kind of huge and crashed our compiler.

Taking a step back and rethinking the whole setup we came to the idea of not generically creating a static ESP model, but having the model itself handle the generic part.

So we did the following:

- instead of 250 insert streams, some wide, some lean with a total of 2k + fields -> 1 generic input stream with 4 fields (two key fields + fieldname + value )

- self written flex stream to do the aggregation based on ( key field1, key field 2, fieldname)

- persisting of the information into a dicionary-dictionary (two dimensional) instead of a stream with one dimension beeing the key fields and the other dimension beeing the value fields

- self written flex stream to recreate the two dimensional vector on output cases

Result:

We now have a model with a bit more complex stuff happening in the flex streams and the dictionary as the main persistence. The new model does exactly the same (prints the same outputs). The new model has 110 lines of ccl instead of 38.000 !

You also see the difference when taking a look at the load on ESP server.

Old model: ~1.000 inserts into ESP per second, complex model, esp cpu consumption ~ 3%

new model: ~60.000 inserts into ESP per second, smaller model, esp cpu consumption ~ 78% (parallelized)

However the result was the same and it still worked when we pushed the insert rate to ~200.000 inserts per second (~156% cpu consumption)

We were quite sure that our model was not optimal before, but we didnt expect such a factor of compression. However we wouldn't have thought that the compiler crashes in the other case.

If you want a closer insight into our models and what we changed, do not hestitate to contact me.

Best regards,

Dave

JWootton
Advisor
Advisor
0 Kudos

I'm not aware of fixed limits. I asked around and got this input from Dave Rosenblum - one of the people that has some of the most experience in this area:

Dave:   "I have a project with 150 streams and windows, about 7k lines. Widest stream/window is over 300 fields.  Some only have 5 fields.  I have noticed that Studio is a bit sluggish. I have never had a flex stream with that many inputs. It may be a bottle neck. I would rethink things and make the streams much narrower. Use a bunch of unions to get the many streams into fewer. "

I would suggest you open a support case and get the ESP support team to help you troubleshoot - they'd be happy to help.

vijaigeethach
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hi Dave,

Typically the maximum size of a ESP model is governed by the hardware and the platform that it runs on. Each query in ESP is a thread so the amount of resouce available to the thread depends on the machine and the OS you are running this on. With a model as huge as yours you can easily get in to performance bottlenecks particularly for the one flexstream with ~250 inputs. We may need to analyze the model to see where exactly the issue is whether it is with the no of fields or with the huge number of streams. Can you open up a technical support case with us so we can anlyze your model. Have you considered splitting the project in to multiple project and using binding to connect them ?

Thanks,

Geetha

Former Member
0 Kudos

Hi Geetha,

Thanks for the hints. We are well aware of the fact that our model is probably not optimized. However, as I said we are generating it based on metadata and we were simply curious to find out where the limits are for our model.

We are trying different approaches, workarounds and divide and conquer to tackle this issue.

However, I really dont think that hardware is an issue on that one. We have been trying to get it started on an 80 core / 1 TB RAM machine, but as I said in the first attempt the compiler stopped with an error and when i cut the metadata in half the error came during startup (not during runtime so before the first event was processed!)

Again: It was more out of curiousity that i posted this question because i was not able to find any number in that direction in the documentation. For our concrete problem I think it is better to redesign the model and optimize it so we dont run into that issue anymore.

Best regards,
Dave