on 03-24-2014 7:57 AM
Hello,
we upgraded our HANA scale out system from rev. 69.1 to rev. 72.
~4 hours after the upgrade our HDB crashed with OOM errors on the master indexserver.
Since than we are trying to start it back up and face OOM errors during startup in the index rebuild phase:
The errors clearly state the HANA runs out of allocatable memory for Pool/IndexRebuildAllocator during startup.
It looks like HANA requires more memory during this phase compared to rev. 69.1
Our Master node has 512GB of physical memory and the Allocation limit is set to 95% of this (default).
Here is what we read from our logs during startup:
....
[3331]{-1}[-1/-1] 2014-03-24 01:35:36.735498 i Service_Startup ptime_master_start.cc(00719) : Rebuilding system indexes.
[3331]{-1}[-1/-1] 2014-03-24 01:35:37.236971 i Service_Startup ptime_master_start.cc(00733) : Rebuilding system indexes done.
[3331]{-1}[-1/-1] 2014-03-24 01:35:37.237002 i Service_Startup ptime_master_start.cc(00735) : Rebuilding indexes.
[3331]{-1}[-1/-1] 2014-03-24 01:35:37.244660 i Service_Startup IndexManager_rebuild.cc(00478) : Number of indexes: 2837
[3331]{-1}[-1/-1] 2014-03-24 01:35:37.245606 i Service_Startup IndexManager_rebuild.cc(00580) : Number of JobEx indexes: 2798
[3367]{-1}[9/-1] 2014-03-24 01:36:33.729391 w ResMan ResourceContainer.cpp(01300) : Information about shrink at 24.03.2014 01:28:46 000 Mon:
Reason for shrink: Precharge for big block allocation. User size: 6269307200
ShrinkCaller
....
[4123]{-1}[9/-1] 2014-03-24 01:36:42.465154 w Memory PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0
[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.
[3538]{-1}[9/-1] 2014-03-24 01:36:42.465164 w Memory PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0
[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.
[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory ReportMemoryProblems.cpp(00733) : Failed to allocate 48 byte.
[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory ReportMemoryProblems.cpp(00733) : Failed to allocate 48 byte.
[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory ReportMemoryProblems.cpp(00733) : Current callstack:
[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory ReportMemoryProblems.cpp(00733) : Current callstack:
[3562]{-1}[9/-1] 2014-03-24 01:36:42.465245 w Memory PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0
[3562]{-1}[9/-1] 2014-03-24 01:36:42.465256 e Memory ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.
....
GLOBAL_ALLOCATION_LIMIT (GAL) = 520645177866b (484.88gb), SHARED_MEMORY = 243742983024b (227gb), CODE_SIZE = 6919073792b (6.44gb)
PID=2987 (hdbnameserver), PAL=487793667686, AB=1596952576, UA=0, U=1415644701, FSL=0
PID=3196 (hdbcompileserve), PAL=487793667686, AB=447041536, UA=0, U=356200477, FSL=0
PID=3193 (hdbpreprocessor), PAL=487793667686, AB=416477184, UA=0, U=292814063, FSL=0
PID=3254 (hdbstatisticsse), PAL=54199296409, AB=1040187392, UA=0, U=862081043, FSL=0
PID=3257 (hdbxsengine), PAL=487793667686, AB=1100451840, UA=0, U=907234077, FSL=0
PID=3251 (hdbindexserver), PAL=487793667686, AB=265382010522, UA=0, U=218447843045, FSL=0
Total allocated memory= 520645177866b (484.88gb)
Total used memory = 472943874222b (440.46gb)
Sum AB = 269983121050
Sum Used = 222281817406
Heap memory fragmentation: 9
Top allocators (ordered descending by inclusive_size_in_use).
1: / 218448756093b (203.44gb)
2: Pool 214254480696b (199.53gb)
3: Pool/IndexRebuildAllocator 196297409104b (182.81gb)
4: Pool/PersistenceManager 9920106968b (9.23gb)
5: Pool/PersistenceManager/PersistentSpace(0) 9779577952b (9.10gb)
6: Pool/PersistenceManager/PersistentSpace(0)/RowStoreLPA 9473884512b (8.82gb)
7: Pool/ResourceContainer 2852314824b (2.65gb)
8: AllocateOnlyAllocator-unlimited 2832931544b (2.63gb)
9: Pool/malloc 2660566632b (2.47gb)
10: Pool/malloc/libhdbrskernel.so 2467813288b (2.29gb)
11: Pool/RowEngine 2284285840b (2.12gb)
12: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks 2135949312b (1.98gb)
13: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1> 2135949312b (1.98gb)
14: Pool/RowEngine/CpbTree 1417842512b (1.32gb)
15: AllocateOnlyAllocator-limited 1184520640b (1.10gb)
16: AllocateOnlyAllocator-limited/ResourceHeader 1184517680b (1.10gb)
17: Pool/RowEngine/LockTable 536881408b (512mb)
18: AllocateOnlyAllocator-unlimited/FLA-UL<120,256>/BigBlockInfoAllocator 360752520b (344.04mb)
19: AllocateOnlyAllocator-unlimited/FLA-UL<120,256> 360752520b (344.04mb)
20: Pool/PersistenceManager/PersistentSpace(0)/RowStoreConverter 239921680b (228.80mb)
Top allocators (ordered descending by exclusive_size_in_use).
1: Pool/IndexRebuildAllocator 196297409104b (182.81gb)
2: Pool/PersistenceManager/PersistentSpace(0)/RowStoreLPA 9473884512b (8.82gb)
3: Pool/ResourceContainer 2852314824b (2.65gb)
4: Pool/malloc/libhdbrskernel.so 2467813288b (2.29gb)
5: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks 2135949312b (1.98gb)
6: Pool/RowEngine/CpbTree 1417842512b (1.32gb)
7: AllocateOnlyAllocator-limited/ResourceHeader 1184517680b (1.10gb)
8: Pool/RowEngine/LockTable 536881408b (512mb)
9: AllocateOnlyAllocator-unlimited/FLA-UL<120,256>/BigBlockInfoAllocator 360752520b (344.04mb)
10: Pool/PersistenceManager/PersistentSpace(0)/RowStoreConverter/ConvPage 239075328b (228mb)
11: Pool/RowEngine/Internal 205837824b (196.30mb)
12: StackAllocator 176672768b (168.48mb)
13: AllocateOnlyAllocator-unlimited/FLA-UL<48,128>/FreeBigBlockInfoAllocator 144301008b (137.61mb)
14: Pool/RowEngine/Transaction 103391528b (98.60mb)
15: Pool/malloc/libhdbexpression.so 90507984b (86.31mb)
16: Pool/malloc/libhdbbasement.so 90380472b (86.19mb)
17: AllocateOnlyAllocator-unlimited/ReserveForUndoAndCleanupExec 84029440b (80.13mb)
18: AllocateOnlyAllocator-unlimited/ReserveForOnlineCleanup 84029440b (80.13mb)
19: Pool/Statistics 83825720b (79.94mb)
20: Pool/PersistenceManager/ContainerNameDirectory 59182968b (56.44mb)
In order to fix this bottleneck, we first need to get HDB started, but how?
Is there a way to aviod that row store tables are being loaded during startup? (this would allow enough memory for the index rebuild)
Is there a way to skip the index rebuild during startup?
Can we increase the allocation limit to more than 95% of the physical memory? (e.g. we could configure swap space to be utilized, just to get over the edge of this during the startup in order to bring our HDB back up and work on reducing the memory requirement on the master index server)
Kind Regards
Florian Wittmann
ps. we also have a call with SAP.
To update everyone who runs into the same issue during startup of HANA, here is the solution that has worked for us. But I highly recommend to have such an approach reviewed by SAP service before I you perform this!
For us the only chance left to bring HANA up was to configure swap space on the master node. (We configured 20GB...we could not know how much memory it would need during the startup to get passed the index rebuild, so that was just a guess that turned out to be just enough for us to get passed that memory peak during the startup phase).
After that we adjusted the custom parameter file /sapmnt/<SID>/global/hdb/custom/config/global.ini
and maintained a global_allocation_limit that we manually calculated as (512GB physical mem. + 20GB swap mem.)/0.95
e.g.
[memorymanager]
global_allocation_limit=517530
Than we started HANA up again and made use of the page file during the index rebuild in the startup phase...of course this dramatically slows down the startup...well you do an index reorg using swap space...but after 1hour we got passed the index rebuild and memory utilization went down again. (With this error we learned that HANA has a significantly higher memory consumption during startup and the upgrade from SP 6 to SP 7 must have even increased this slightly, since we managed to startup our HANA db before upgrading without this issue, but it was not the root cause!)
Now that we had HANA back up, we investigated memory consuming indices together with SAP, and with the help of below query, we identified row store table RSMONMESS to hold 25GB of 2ndary indices!
select table_name, round((sum(index_size)) / 1024 / 1024, 2) as
SIZE_IN_MB, count(*) as number_of_indexes from M_RS_INDEXES group by
table_name order by SIZE_IN_MB desc
Before droping 2ndary indices:
TABLE_NAME;SIZE_IN_MB;NUMBER_OF_INDEXES
RSMONMESS;30.883,8;6
drop index SAP<SID>."RSMONMESS~AXX";
drop index SAP<SID>."RSMONMESS~TIM";
drop index SAP<SID>."RSMONMESS~AU1";
drop index SAP<SID>."RSMONMESS~RID";
drop index SAP<SID>."RSMONMESS~AXY";
After drop:
TABLE_NAME;SIZE_IN_MB;NUMBER_OF_INDEXES
RSMONMESS;5.708,43;1
With that change we reset our custom parameter on global_allocation_limit and went back to the default (95% of main physical mem.), removed the swap file and started up HANA again...and during this startup we managed to stay in our global_allocation_limit and bring HANA back up again!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
95 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.