SIGSEGV Signal 11 Crashdumps TimerThread after "un...

Former Member · ‎05-05-2015

Greetings all,

We are just surviving on a now-undersized BW7.3 SP8 on HDB rev83 (6 node scaleout). While we are waiting on new hardware over next few months, we are exploring ways to better keep our heads above water for longer. One of these option is enabling HANA to unload columns that are not used for some time (e.g. non-reporting data in a typical LSA propagation layer) to avoid main memory contention during the day while users run their reports. We are mindful this is not the SAP recommended approach, and are only exploring this option as a workaround.

2127458 - FAQ: SAP HANA Loads and Unloads

This note indicated a way to enable the column displacement from memory after a specified retention period.

When testing with this enabled 'System'-wide across all hosts, we found that the master node was crashdumping and rte-dumping frequently.

Here is a sample trace files generated:

indexserver_saphdp-an1.30003.crashdump.20150427-155006.060789.trc

indexserver.perspage.20150427-134110000.0x00000000002ad1e8L.0x0000c00000120000P.1.ok.dmp

indexserver_saphdp-an1.30003.rtedump.20150427-134037.060789.page.trc

indexserver.perspage.20150427-134042000.0x00000000002ad1e8L.0x0000c00000120000P.0.corrupt.dmp

indexserver_saphdp-an1.30003.crashdump.20150427-132538.058793.trc

indexserver.perspage.20150427-132101000.0x00000000002ad1e8L.0x0000c00000120000P.9.corrupt.dmp

indexserver_saphdp-an1.30003.rtedump.20150427-132032.058793.page.trc

indexserver.perspage.20150427-132033000.0x00000000002ad1e8L.0x0000c00000120000P.0.corrupt.dmp

So we test with this enabled just across the SLAVE hosts, we found that they were crashdumping occasionally.

Here is a sample trace files generated:

indexserver_saphdp-an4.30003.crashdump.20150505-064045.050854.trc

indexserver_saphdp-an3.30003.crashdump.20150503-152138.003832.trc

indexserver_saphdp-an4.30003.crashdump.20150503-131004.037534.trc

indexserver_saphdp-an5.30003.crashdump.20150503-074249.029552.trc

indexserver_saphdp-an3.30003.crashdump.20150503-021555.040263.trc

indexserver_saphdp-an5.30003.crashdump.20150502-182801.028108.trc

In the crashdump trace files, we notice it is always a Signal 11 on the TimerThread of the particular indexserver that crashed. Here is a summary (bold highlighted words are found in every crashdump trace file:

"[CRASH_SHORTINFO] exception short info: (2015-05-05 06:40:45 366 Local)

SIGNAL 11 (SIGSEGV) caught, thread: 4302[thr=nnnnn]: TimerThread, etc..."

Also, if it's any help, the error code 1 is detected for Signal 11:

"[CRASH_EXTINFO] extended exception info: (2015-05-05 06:40:45 367 Local)

----> Dump of siginfo contents <----

signal: 11(SIGSEGV)

code: 1(SEGV_MAPERR: address not mapped to object)"

We have logged a SAP message and going through the motions of opening system access, upload tracefiles etc.

Just wanted to check with the community here whether you have encountered and/or resolved similar crashdump issues for Rev83?

Thanks.

Former Member · ‎05-06-2015

Here's the SAP reply:

"Dear customer,

This is a known crash. Unfortunately, we do not know the root cause. Therefore, we implemented (sic) further tracing starting with Revision 91.

Best regards.

Moritz"

The said Note's parameter is a great idea to bring forward the unload of columns unused after, say, 4 hours instead of waiting until HANA hits the threshold for memory contention. In our now-undersized scale-out rev83 system, we have found that whenever the HANA memory is under peak operational stress, the LRU unloading efficiency does not keep up with the flood of data requesting to be loaded into memory. This leads to "oom/rtedump" on the slave hosts where most of the columnstores reside. Once the users repeat their BW queries 5mins later, the required data is then available. It is a little frustrating albeit this only occurs once a week.

Pity about the Signal 11 crashdumps due to the Note's parameter, otherwise this would be the panacea until the hardware is upgraded.

We have consultants pushing a hard-sell of an unproven bespoke application to unload the unused columns at varying times by manually specifying the BW tables. Great concept but, needless to say, this is not likely to have SAP's blessing. This would be money down the drain once SAP resolve the Signal 11 crashdumps.

Would be interested in the community's view.

Thanks.

SIGSEGV Signal 11 Crashdumps TimerThread after "unused_retention_period" enabled (Note 2127458)

Accepted Solutions (0)

Answers (1)

Answers (1)

catch/handle error upload() method in SAPUI5

Connect to ABAP environment using RFC

Re: SAP BUILD deployment to production

Re: systemVars in SAP BUILD apps

SAP Datasphere test tenant ?