cancel
Showing results for 
Search instead for 
Did you mean: 

After upgrading HANA from rev. 69.1 to rev.72 OOM errors occure during HDB startup

Former Member
0 Kudos

Hello,

we upgraded our HANA scale out system from rev. 69.1 to rev. 72.

~4 hours after the upgrade our HDB crashed with OOM errors on  the master indexserver.

Since than we are trying to start it back up and face OOM errors during startup in the index rebuild phase:

The errors clearly state the HANA runs out of allocatable memory for Pool/IndexRebuildAllocator during startup.

It looks like HANA requires more memory during this phase compared to rev. 69.1

Our Master node has 512GB of physical memory and the Allocation limit is set to 95% of this (default).

Here is what we read from our logs during startup:

....

[3331]{-1}[-1/-1] 2014-03-24 01:35:36.735498 i Service_Startup  ptime_master_start.cc(00719) : Rebuilding system indexes.

[3331]{-1}[-1/-1] 2014-03-24 01:35:37.236971 i Service_Startup  ptime_master_start.cc(00733) : Rebuilding system indexes done.

[3331]{-1}[-1/-1] 2014-03-24 01:35:37.237002 i Service_Startup  ptime_master_start.cc(00735) : Rebuilding indexes.

[3331]{-1}[-1/-1] 2014-03-24 01:35:37.244660 i Service_Startup  IndexManager_rebuild.cc(00478) : Number of indexes: 2837

[3331]{-1}[-1/-1] 2014-03-24 01:35:37.245606 i Service_Startup  IndexManager_rebuild.cc(00580) : Number of JobEx indexes: 2798

[3367]{-1}[9/-1] 2014-03-24 01:36:33.729391 w ResMan           ResourceContainer.cpp(01300) : Information about shrink at 24.03.2014 01:28:46 000 Mon:

Reason for shrink: Precharge for big block allocation. User size:  6269307200

ShrinkCaller

....

[4123]{-1}[9/-1] 2014-03-24 01:36:42.465154 w Memory           PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0

[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory           ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.

[3538]{-1}[9/-1] 2014-03-24 01:36:42.465164 w Memory           PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0

[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory           ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.

[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory           ReportMemoryProblems.cpp(00733) : Failed to allocate 48 byte.

[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory           ReportMemoryProblems.cpp(00733) : Failed to allocate 48 byte.

[4123]{-1}[9/-1] 2014-03-24 01:36:42.465168 e Memory           ReportMemoryProblems.cpp(00733) : Current callstack:

[3538]{-1}[9/-1] 2014-03-24 01:36:42.465177 e Memory           ReportMemoryProblems.cpp(00733) : Current callstack:

[3562]{-1}[9/-1] 2014-03-24 01:36:42.465245 w Memory           PoolAllocator.cpp(01060) : Out of memory for Pool/IndexRebuildAllocator, size 48B, flags 0x0

[3562]{-1}[9/-1] 2014-03-24 01:36:42.465256 e Memory           ReportMemoryProblems.cpp(00733) : OUT OF MEMORY occurred.

....

GLOBAL_ALLOCATION_LIMIT (GAL) = 520645177866b (484.88gb), SHARED_MEMORY = 243742983024b (227gb), CODE_SIZE = 6919073792b (6.44gb)

PID=2987 (hdbnameserver), PAL=487793667686, AB=1596952576, UA=0, U=1415644701, FSL=0

PID=3196 (hdbcompileserve), PAL=487793667686, AB=447041536, UA=0, U=356200477, FSL=0

PID=3193 (hdbpreprocessor), PAL=487793667686, AB=416477184, UA=0, U=292814063, FSL=0

PID=3254 (hdbstatisticsse), PAL=54199296409, AB=1040187392, UA=0, U=862081043, FSL=0

PID=3257 (hdbxsengine), PAL=487793667686, AB=1100451840, UA=0, U=907234077, FSL=0

PID=3251 (hdbindexserver), PAL=487793667686, AB=265382010522, UA=0, U=218447843045, FSL=0

Total allocated memory= 520645177866b (484.88gb)

Total used memory     = 472943874222b (440.46gb)

Sum AB                = 269983121050

Sum Used              = 222281817406

Heap memory fragmentation: 9

Top allocators (ordered descending by inclusive_size_in_use).

1: /                                                                       218448756093b (203.44gb)

2: Pool                                                                    214254480696b (199.53gb)

3: Pool/IndexRebuildAllocator                                              196297409104b (182.81gb)

4: Pool/PersistenceManager                                                 9920106968b (9.23gb)

5: Pool/PersistenceManager/PersistentSpace(0)                              9779577952b (9.10gb)

6: Pool/PersistenceManager/PersistentSpace(0)/RowStoreLPA                  9473884512b (8.82gb)

7: Pool/ResourceContainer                                                  2852314824b (2.65gb)

8: AllocateOnlyAllocator-unlimited                                         2832931544b (2.63gb)

9: Pool/malloc                                                             2660566632b (2.47gb)

10: Pool/malloc/libhdbrskernel.so                                           2467813288b (2.29gb)

11: Pool/RowEngine                                                          2284285840b (2.12gb)

12: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks 2135949312b (1.98gb)

13: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>                       2135949312b (1.98gb)

14: Pool/RowEngine/CpbTree                                                  1417842512b (1.32gb)

15: AllocateOnlyAllocator-limited                                           1184520640b (1.10gb)

16: AllocateOnlyAllocator-limited/ResourceHeader                            1184517680b (1.10gb)

17: Pool/RowEngine/LockTable                                                536881408b (512mb)

18: AllocateOnlyAllocator-unlimited/FLA-UL<120,256>/BigBlockInfoAllocator   360752520b (344.04mb)

19: AllocateOnlyAllocator-unlimited/FLA-UL<120,256>                         360752520b (344.04mb)

20: Pool/PersistenceManager/PersistentSpace(0)/RowStoreConverter            239921680b (228.80mb)

Top allocators (ordered descending by exclusive_size_in_use).

1: Pool/IndexRebuildAllocator                                               196297409104b (182.81gb)

2: Pool/PersistenceManager/PersistentSpace(0)/RowStoreLPA                   9473884512b (8.82gb)

3: Pool/ResourceContainer                                                   2852314824b (2.65gb)

4: Pool/malloc/libhdbrskernel.so                                            2467813288b (2.29gb)

5: AllocateOnlyAllocator-unlimited/FLA-UL<3145728,1>/MemoryMapLevel2Blocks  2135949312b (1.98gb)

6: Pool/RowEngine/CpbTree                                                   1417842512b (1.32gb)

7: AllocateOnlyAllocator-limited/ResourceHeader                             1184517680b (1.10gb)

8: Pool/RowEngine/LockTable                                                 536881408b (512mb)

9: AllocateOnlyAllocator-unlimited/FLA-UL<120,256>/BigBlockInfoAllocator    360752520b (344.04mb)

10: Pool/PersistenceManager/PersistentSpace(0)/RowStoreConverter/ConvPage    239075328b (228mb)

11: Pool/RowEngine/Internal                                                  205837824b (196.30mb)

12: StackAllocator                                                           176672768b (168.48mb)

13: AllocateOnlyAllocator-unlimited/FLA-UL<48,128>/FreeBigBlockInfoAllocator 144301008b (137.61mb)

14: Pool/RowEngine/Transaction                                               103391528b (98.60mb)

15: Pool/malloc/libhdbexpression.so                                          90507984b (86.31mb)

16: Pool/malloc/libhdbbasement.so                                            90380472b (86.19mb)

17: AllocateOnlyAllocator-unlimited/ReserveForUndoAndCleanupExec             84029440b (80.13mb)

18: AllocateOnlyAllocator-unlimited/ReserveForOnlineCleanup                  84029440b (80.13mb)

19: Pool/Statistics                                                          83825720b (79.94mb)

20: Pool/PersistenceManager/ContainerNameDirectory                           59182968b (56.44mb)

In order to fix this bottleneck, we first need to get HDB started, but how?

Is there a way to aviod that row store tables are being loaded during startup? (this would allow enough memory for the index rebuild)

Is there a way to skip the index rebuild during startup?

Can we increase the allocation limit to more than 95% of the physical memory? (e.g. we could configure swap space to be utilized, just to get over the edge of this during the startup in order to bring our HDB back up and work on reducing the memory requirement on the master index server)

Kind Regards

Florian Wittmann

ps. we also have a call with SAP.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

To update everyone who runs into the same issue during startup of HANA, here is the solution that has worked for us. But I highly recommend to have such an approach reviewed by SAP service before I you perform this!

For us the only chance left to bring HANA up was to configure swap space on the master node. (We configured 20GB...we could not know how much memory it would need during the startup to get passed the index rebuild, so that was just a guess that turned out to be just enough for us to get passed that memory peak during the startup phase).

After that we adjusted the custom parameter file /sapmnt/<SID>/global/hdb/custom/config/global.ini

and maintained a global_allocation_limit that we manually calculated as (512GB physical mem. + 20GB swap mem.)/0.95

e.g.

[memorymanager]

global_allocation_limit=517530

Than we started HANA up again and made use of the page file during the index rebuild in the startup phase...of course this dramatically slows down the startup...well you do an index reorg using swap space...but after 1hour we got passed the index rebuild and memory utilization went down again. (With this error we learned that HANA has a significantly higher memory consumption during startup and the upgrade from SP 6 to SP 7 must have even increased this slightly, since we managed to startup our HANA db before upgrading without this issue, but it was not the root cause!)

Now that we had HANA back up, we investigated memory consuming indices together with SAP, and with the help of below query, we identified row store table RSMONMESS to hold 25GB of 2ndary indices!

select table_name, round((sum(index_size)) / 1024 / 1024, 2) as

SIZE_IN_MB, count(*) as number_of_indexes from M_RS_INDEXES group by

table_name order by SIZE_IN_MB desc

Before droping 2ndary indices:

TABLE_NAME;SIZE_IN_MB;NUMBER_OF_INDEXES

RSMONMESS;30.883,8;6

drop index SAP<SID>."RSMONMESS~AXX";

drop index SAP<SID>."RSMONMESS~TIM";

drop index SAP<SID>."RSMONMESS~AU1";

drop index SAP<SID>."RSMONMESS~RID";

drop index SAP<SID>."RSMONMESS~AXY";

After drop:

TABLE_NAME;SIZE_IN_MB;NUMBER_OF_INDEXES

RSMONMESS;5.708,43;1

With that change we reset our custom parameter on global_allocation_limit and went back to the default (95% of main physical mem.), removed the swap file and started up HANA again...and during this startup we managed to stay in our global_allocation_limit and bring HANA back up again!

Answers (0)