cancel
Showing results for 
Search instead for 
Did you mean: 

911: My HANA One system on AWS has fallen and it can't get back up - Indexserver crash

Former Member
0 Kudos

   I was trying to load a 5GB CSV file into a ROW table and HANA's IndexServer crashed with a crashdump file. I have the stuff zipped up and copied to my local drive. A quick examination of the trace files show an issue in the indexserver. When I attempt to stop and start the HANA database, I always get the indexserver crash. I can't use SAP HANA Studio for anything. This is on the "production" HANA One instance I have on AWS with 63GB of ram, so I figured I'd be OK.  Right now, my server is useless.The good news is I have a backup of my main schema, the bad news, there is another schema where I have some stored procedures that weren't in source control.

Here are the trace specifics from indexserver_alert_hanaserver.trc:

[3084]{0}[0] 2013-01-17 19:45:09.598506 e Metadata     ptl_shm.cc(00520) : ShmSystem::attach (shmid=12550492, align=67108864) - Cannot allocate memory

[3084]{0}[0] 2013-01-17 19:45:09.599115 e Row_Engine   msglog.cc(00088) : Error during RowStore recovery: transaction rolled back due to unavailable resource (at ptime/storage/recovery/CheckpointMgr.cc:535 )

[3084]{0}[0] 2013-01-17 19:45:09.599273 e Basis        Crash.cpp(00558) : Crash at /HDB/IMP/NewDB100_REL/src///sys/src/Basis/Diagnose/impl/FaultProtectionImpl.cpp:531

Reason:

exception  1: no.2100002  (Basis/Diagnose/impl/FaultProtectionImpl.cpp:531)

    Illegal call to exit(), _exit() or _Exit() detected

exception throw location:

1: 0x00007f573f5b1f6a in exit_handler+0x46 at FaultProtectionImpl.cpp:531 (libhdbbasis.so)

2: 0x0000000000542bba in _exit+0x16 at LinuxMallocInitializer.cpp:143 (hdbindexserver)

3: 0x00007f5735f55897 in ptime::CheckpointMgr::restarter(void*)+0xd3 at CheckpointMgr.cc:536 (libhdbrskernel.so)

4: 0x00007f57354e9a00 in ptime::PtimeThread::run(void*)+0x10 at ptime_thread.h:131 (libhdbrskernel.so)

5: 0x00007f574c16c000 in TrexThreads::PoolThread::run()+0xc30 at PoolThread.cpp:255 (libhdbbasement.so)

6: 0x00007f574c16d7a8 in TrexThreads::PoolThread::run(void*&)+0x14 at PoolThread.cpp:104 (libhdbbasement.so)

7: 0x00007f573f641545 in Execution::Thread::staticMainImp(void**)+0x671 at Thread.cpp:448 (libhdbbasis.so)

8: 0x00007f573f64170d in Execution::Thread::staticMain(void*)+0x39 at Thread.cpp:512 (libhdbbasis.so)

[3084]{0}[0] 2013-01-17 19:45:09.645464 e Basis        FaultProtectionImpl.cpp(00961) : SIGNAL 6 (SIGABRT) caught, sender PID:  3042, PID: 3042, thread: 2710[thr=3084]: Checkpointer, value int: 464540312, ptr: 0xffff880f1bb05298, time: 2013-01-17 19:45:09 000 Local

Instance HDB/00, OS Linux hanaserver 2.6.32.27-0.2-default #1 SMP 2010-12-29 15:03:02 +0100 x86_64

The scenario occurred when I was attempting to load up my test database with approx 30GB of data into ROW tables on the 63 GB instance of HANA One. The import of two of the tables took forever, so I stopped the instance and restarted it. I then learned about ALTER SYSTEM LOGGING OFF; However, the damage was done. I believe that I managed to get too much data into the ROW tables and HANA ran out of memory. Here are the size of the data files for the database:

hanaserver:/hanadata/HDB/data/mnt00001> ls -l ./hdb00001                                                                                                                                                     

total 24668

-rw------- 1 hdbadm sapsys 335577088 2013-01-17 22:43 datavolume_0000.dat

-rw-rw-r-- 1 hdbadm sapsys        36 2013-01-17 19:43 landscape.id

hanaserver:/hanadata/HDB/data/mnt00001> ls -l ./hdb00002                                                                                                                                                     

total 34172160

-rw------- 1 hdbadm sapsys 35130195968 2013-01-16 02:48 datavolume_0000.dat

hanaserver:/hanadata/HDB/data/mnt00001> ls -l ./hdb00003                                                                                                                                                     

total 96152

-rw------- 1 hdbadm sapsys 351059968 2013-01-16 02:03 datavolume_0000.dat

hanaserver:/hanadata/HDB/data/mnt00001> ls -l ./hdb00004                                                                                                                                                     

total 26444

-rw------- 1 hdbadm sapsys 271745024 2013-01-16 02:03 datavolume_0000.dat

The instance seems to be running, but the SapService isn't running so HANA Studio can connect to even get the diagnostic traces. I have to go to Linux. I say this because when I issue the ./HDB stop command, it takes about 6 minutes to shut down.

Is there any way to delete tables or truncate tables in a database so that I can recover it?

Thanks,

Bill

Accepted Solutions (0)

Answers (3)

Answers (3)

danielculp
Explorer
0 Kudos

Hi Bill,

Please have a look at the description below and check the kernel parameters. Maybe it is not configured apropriately.

Best Regards

Daniel

OS Kernel Parameters for Shared Memory Use

In the HANA database , hdbnameserver and the row store are using shared memory. For this, the proper configuration is needed.

In most cases, HANA installer will adjust this automatically. But, when you encounter some error like "shared memory allocation failure ...", you can check the below three kernel parameters.

cat /proc/sys/kernel/shmmax

cat /proc/sys/kernel/shmall

cat /proc/sys/kernel/shmmni

SHMMAX (max segment size in bytes)=   268435456 (256MB, independent of the # of installations)

SHMMNI (system-wide number of segments)= 5 (for deamon, nameserver, …) + 16386 * (# of indexservers in the same machine + 1(for statisticsserver) )

SHMALL (system-wide total shared memory size in OS pages)= SHMMNI * 64MB / 4KB

_Although we haven’t observed problems from larger parameter values than needed, if SHMALL is larger than system memory resource(physical memory size), we can use following values for SHMALL and SHMMNI._

SHMALL = ( physical memory size in B / 4KB )

SHMMNI = ( physical memory size in B / 64MB ).


Former Member
0 Kudos

Hi Daniel,

   I was using a stock HANA One instance on AWS. If this is a problem for me - it's a problem with everyone using HANA from the AWS Marketplace download. I needed to terminate the instance because I needed to save money by deleting the storage volumes. The next time I fire up a new instance, I'll take a look. In the meantime, I'll just keep this as open. Maybe when SPS5 becomes available on AWS, it will no longer be a problem.

Thanks,

Bill

swapan_saha
Employee
Employee
0 Kudos

Hi Bill,


Daniel askd to to stop HANA running "HDB stop" as hdbadm OS user at the command prompt. Just another suggestion, while you don't use or we ask for additional information, you may stop EC2 instance from EC2 Management Console so that you are not billed while system is not in use.

Regards,

Swapan

danielculp
Explorer
0 Kudos

Bill,

can you please shut down the instance, remove all the trace files (rm * in the trace subdirectroy) and then restart HDB.

Then, if it is not coming up (what does "./HDB status" tell you?), please zip up all the trace files and attach them to this message.

Thank you & Best Regards

Daniel

Former Member
0 Kudos

Hi Daniel,

   There is no ./HDB status command with x38. I'll work on getting a clean set of trace files to you in the next couple of hours.

Regards,

Bill

danielculp
Explorer
0 Kudos

HDB info

sorry for the confusion

Sent from my iPhone