on 11-10-2013 9:52 PM
Dear HANA experts.
How HANA handle OOM (Out of memory) situations, could you provide more details about this?
As i see In rev 72 situation more better, but there are still possibilities for OOM.
OOM situations - is the one of the main HANA exploits.
And no matter how memory you can - 1TB, 2TB, 8TB.
Hope HANA dev team working to change situation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We opened ticket at SAP to clarify this and waiting news:
==--==-==-
If OOM is hapened and indexserver service restarted does all current sessions on server will be terminated ?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We've upgraded to the last rev 69 pl1.
Yesterday we got OOM because of DSO Activation. (this process very RAM-hungry).
After indexserver was automatic restarted we got a problems with backup logs (log backup process hanged). Only one solution - restart the HANA. Open Ticket about.
Colleagues, OOM handling in HANA - it's nightmare. Hope SAP will can fix this situation, a make OOM handling not so fatal. We can't restart HANA scale-out every time when OOM occure.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Which revision of HANA are you running? On my system, the bad session gets killed. An index server restart is a bug, so please raise an OSS message if you get this.
John
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
So I was doing some testing for an unrelated matter, to see what happens in OOM situations for HANA. In my instance I cause a large number (100+) very expensive queries to run.
Note that I am using HANA Rev.69 and there are a number of fixes in that for various scenarios. It's possible that our revisions behave differently.
These queries would need hundreds of TBs of memory to actually run, because they cause massive materializations. I can one or two of these concurrently. With 100 concurrent, we expect queries to fail.
During this, I start to run a few smaller queries which access a lot of data, but which would normally run in 3-4 seconds.
I find that all my big queries fail (expected) as follows:
* 2048: column store error: search table error: [9] Error executing physical plan: Memory allocation failed;in executor::Executor in cube: _SYS_BIC:demo.rca.data/AV_BOC_RCA_TRANS_CUST SQLSTATE: HY000
* 2048: column store error: search table error: [9] Error executing physical plan: Memory allocation failed;in executor::Executor in cube: _SYS_BIC:demo.rca.data/AV_BOC_RCA_TRANS_CUST SQLSTATE: HY000
* 2048: column store error: search table error: [9] Error executing physical plan: Memory allocation failed;in executor::Executor in cube: _SYS_BIC:demo.rca.data/AV_BOC_RCA_TRANS_CUST SQLSTATE: HY000
* 2048: column store error: search table error: [9] Error executing physical plan: Memory allocation failed;in executor::Executor in cube: _SYS_BIC:demo.rca.data/AV_BOC_RCA_TRANS_CUST SQLSTATE: HY000
But interestingly, I find that my smaller queries complete, but 5-6x slower than usual.
Statement 'SELECT GENDER, NAME, SUM(TXAMOUNT)/COUNT(TXAMOUNT) AS AVG_SPEND FROM ...'
successfully executed in 18.200 seconds (server processing time: 18.198 seconds)
Fetched 211 row(s) in 2 ms 164 µs (server processing time: 0 ms 674 µs)
I think this is fairly good behavior on HANA's part? Hope this helps.
John
Hi John,
Thanks for very informative post. This would definitely help. If not too much to ask, can I request for one more thing.
Can you please also try a massive data load at the same time of concurrent executions. Such scenario has caused the indexserver to fail in my system. We are in process of upgrading to rev 69 and will also try the similar scenarios in the days to come.
Once again, thanks for sharing your experience.
Regards,
Ravi
So I do this regularly. I've not seen indexserver crashes in this instance even back to Rev.52
I'd make a few notes:
- You should only do bulk loads (40-80 threads) when there are no users on the system. Bulk loads are very resource-intensive and cause major latency problems in query execution because you have very large delta stores. With bulk loads you should expect to get around 1-5m rows/sec depending on the table width.
- If you want good concurrent query execution then try loading in a single thread. You will still get a reasonable load rate - 100k+ rows/sec but in my tests it has a negligable impact (10-20%) on query response.
In my tests we were doing as many as 250k inserts/sec using ESP (multiple threads to multiple tables). What are your requirements for concurrent loading and reporting?
Regards,
John
Sure. When you reach 95% memory, HANA unloads partition-columns in a Least Recently Used model.
In most instances we find that the partition-columns unloaded weren't being used so there's no impact.
If you have a situation where your working set of partition-columns plus your calculation memory required for the queries coming in exceeds available RAM, then bad things happen. Much like in any other computer software ever made 🙂
If you have a situation where your query consumes all available RAM, it will terminate with an error.
John
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We have found that HANA have a strange behavior in OOM.
In classical RDBMS there are sessions approah - if session feel bad - RDMS kill session.
In HANA there servers (indexserver) oriented approach.
==--==-==-
If OOM is hapened then indexserver service restarted and all current sessions on server will be terminated.
So you're loosing all. Imagine that you're have 1000+ sessions in HANA ERP and in one session you have OOM. You're loosing all - and hearing disastrous screaming of your users.
SAP seems need to have more failure-surviving OOM handling approach in SAP HANA.
User | Count |
---|---|
82 | |
10 | |
10 | |
9 | |
6 | |
6 | |
5 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.