cancel
Showing results for 
Search instead for 
Did you mean: 

Maxdb 7.6.06.03 suddenly fails to start on OpenSuse 11.1

obermiller_w
Explorer
0 Kudos

Hi all,

my installation of Maxdb 7.6.06.03 32 bit has been running happily on OpenSuse 11.1. for more than one year.

Kernel: Linux 2.6.27.45-0.1-pae #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux.

However, four days ago, the database failed to startwith the following message:

2010-03-01 07:55:01 0x000013a6 ERR -24700 DBMSrv ERR_DBMSRV_NOSTART: Could not start DBM server.

0x000013a6 ERR -24832 DBMSrv ERR_SHMNOTAVAILABLE: Shared memory not available

0x000013a6 ERR -24686 DBMSrv ERR_SHMNOCLEANUP: Could not cleanup the DBM server Shared Memory

0x000013a6 ERR -24740 DBMSrv ERR_SHMSHIFTERROR: Could not change size of Shared Memory (b77422ea, -16777216)

0x000013a6 ERR -24827 DBMSrv ERR_SHMALLOCFAILED: ID /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm, requested size 4278190898

0x000013a6 ERR 9 RTEIPC Mapping of the shared memory file /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm of length 4278194176 into the address space of the ca

lling process failed, rc = Cannot allocate memory.

2010-03-01 07:55:01 0x000013a6 INF 226 DBMSrv DBM Server client disconnected: PID 5022 on computer columbus.s.netic.de

2010-03-01 18:52:47 0x00003767 ERR -24700 DBMSrv ERR_DBMSRV_NOSTART: Could not start DBM server.

0x00003767 ERR -24832 DBMSrv ERR_SHMNOTAVAILABLE: Shared memory not available

0x00003767 ERR -24686 DBMSrv ERR_SHMNOCLEANUP: Could not cleanup the DBM server Shared Memory

0x00003767 ERR -24827 DBMSrv ERR_SHMALLOCFAILED: ID /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm, requested size 4278190898

0x00003767 ERR 9 RTEIPC Mapping of the shared memory file /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm of length 4278194176 into the address space of the ca

lling process failed, rc = Cannot allocate memory.

I appreciate any pointers that you might have to find out what is happening.

-walt

Accepted Solutions (1)

Accepted Solutions (1)

markus_doehr2
Active Contributor
0 Kudos

> my installation of Maxdb 7.6.06.03 32 bit has been running happily on OpenSuse 11.1. for more than one year.

> Kernel: Linux 2.6.27.45-0.1-pae #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux.

> However, four days ago, the database failed to startwith the following message:

What happened prior to this error? Did the system crash? Or did you change any parameters?

The system tries apparently to allocate 4 GB of memory which you can't do on a 32bit OS.

Markus

obermiller_w
Explorer
0 Kudos

I did some more research since I posted this query.

From what I could see, it seems to appear that on Feb 26 and Feb 27I did a routine Patch of the Kernel in the context of a normal update from the opensuse repository.

There were no crashes or similar that I am aware of. I enclose the corresponding lines from my

knldiag.err file.

The fact that all of a sudden, shared memory can no longer be allocated while it could be allocated

for almost one year without a hitch makes me wonder whether this could be a linux kernel issue.

        • KNLDIAG.ERR **************

2010-02-27 17:07:52 ___ Stopping GMT 2010-02-27 16:07:52 7.6.06 Build 003-121-202-135

2010-02-27 19:29:27 --- Starting GMT 2010-02-27 18:29:27 7.6.06 Build 003-121-202-135

2010-02-27 21:54:59 ___ Stopping GMT 2010-02-27 20:54:59 7.6.06 Build 003-121-202-135

2010-02-28 11:57:13 --- Starting GMT 2010-02-28 10:57:13 7.6.06 Build 003-121-202-135

2010-02-28 12:00:28 ___ Stopping GMT 2010-02-28 11:00:28 7.6.06 Build 003-121-202-135

2010-02-28 18:34:39 --- Starting GMT 2010-02-28 17:34:39 7.6.06 Build 003-121-202-135

2010-02-28 18:49:15 ___ Stopping GMT 2010-02-28 17:49:15 7.6.06 Build 003-121-202-135

2010-03-01 07:54:55 --- Starting GMT 2010-03-01 06:54:55 7.6.06 Build 003-121-202-135

2010-03-01 07:55:01 5006 ERR 20125 RTE Database automatic restart failed

2010-03-01 18:52:57 ___ Stopping GMT 2010-03-01 17:52:57 7.6.06 Build 003-121-202-135

2010-03-02 08:17:57 --- Starting GMT 2010-03-02 07:17:57 7.6.06 Build 003-121-202-135

2010-03-02 08:18:04 5016 ERR 20125 RTE Database automatic restart failed

2010-03-02 18:29:48 ___ Stopping GMT 2010-03-02 17:29:48 7.6.06 Build 003-121-202-135

        • KNLDIAG.ERR **************

-walt.

obermiller_w
Explorer
0 Kudos

Markus,

The system tries apparently to allocate 4 GB of memory which you can't do on a 32bit OS.

/proc/sys seems to have a large max for shm:

uname -a

Linux columbus 2.6.27.45-0.1-pae #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux

cat /proc/sys/kernel/shmmax

4294967295

-w.

markus_doehr2
Active Contributor
0 Kudos

> cat /proc/sys/kernel/shmmax

> 4294967295

That is only a (theroretical) upper limit.

A 32bit process can't allocate as default more than 2 GB of RAM (2^32 bit).

Maybe the new patch introduced a different internal buffer calculation and you're exceeding those limits.

How big is your CACHE_SIZE? If it's more than 200,000 pages I would decrease it to 150,000 and try to restart the database with that amount.

Markus

obermiller_w
Explorer
0 Kudos

That is only a (theroretical) upper limit.

A 32bit process can't allocate as default more than 2 GB of RAM (2^32 bit).

Maybe the new patch introduced a different internal buffer calculation and you're exceeding those limits.

How big is your CACHE_SIZE? If it's more than 200,000 pages I would decrease it to 150,000 and try to restart the database with that amount.

From my knldiag-file: I guess I am well below the 200000 pages you mention.

2010-03-06 18:58:43 4494 20235 RTE CACHE_SIZE=2500

2010-03-06 18:58:43 4494 20235 RTE CALLSTACKLEVEL=0

2010-03-06 18:58:43 4494 20235 RTE CATCACHE_MINSIZE=262144

2010-03-06 18:58:43 4494 20235 RTE CAT_CACHE_SUPPLY=3264

How (where) is the amount of requested shm computed ?

-w.

markus_doehr2
Active Contributor
0 Kudos

> From my knldiag-file: I guess I am well below the 200000 pages you mention.

>

> 2010-03-06 18:58:43 4494 20235 RTE CACHE_SIZE=2500

True - you have a VERY small cache (2500 * 8 kb = 10 MB).

What's the output of

ipcs -m

Markus

obermiller_w
Explorer
0 Kudos

True - you have a VERY small cache (2500 * 8 kb = 10 MB).

What's the output of

ipcs -m

free

total used free shared buffers cached

Mem: 4018436 2117200 1901236 0 249556 1403724

-/+ buffers/cache: 463920 3554516

Swap: 4194296 0 4194296

ipcs -m

-

-


Gemeinsamer Speicher: Segmente -

-


Schlüssel shmid Besitzer Rechte Bytes nattch Status

0x00005d8b 32768 root 777 404 1

0x00000000 98305 root 600 67108864 7 zerstört

0x44000000 131074 sdb 660 1507328 2

0x4400118e 163843 sdb 600 6684672 2

0x44000001 196612 sdb 666 131208 1

0x44000002 229381 sdb 666 262360 1

0x44000003 262150 sdb 666 1311576 1

0x00000000 12648455 walter 600 393216 2 zerstört

0x00000000 12976136 walter 600 4 2 zerstört

0x00000000 13008905 walter 600 4 2 zerstört

0x00000000 13041674 walter 600 4 2 zerstört

0x00000000 13631499 walter 600 99600 2 zerstört

0xcbc384f8 3145740 walter 600 64528 1

-walt.

markus_doehr2
Active Contributor
0 Kudos

-

-


Gemeinsamer Speicher: Segmente -

-


Schlüssel shmid Besitzer Rechte Bytes nattch Status

0x00005d8b 32768 root 777 404 1

0x00000000 98305 root 600 67108864 7 zerstört

0x44000000 131074 sdb 660 1507328 2

0x4400118e 163843 sdb 600 6684672 2

0x44000001 196612 sdb 666 131208 1

0x44000002 229381 sdb 666 262360 1

0x44000003 262150 sdb 666 1311576 1

0x00000000 12648455 walter 600 393216 2 zerstört

0x00000000 12976136 walter 600 4 2 zerstört

0x00000000 13008905 walter 600 4 2 zerstört

0x00000000 13041674 walter 600 4 2 zerstört

0x00000000 13631499 walter 600 99600 2 zerstört

0xcbc384f8 3145740 walter 600 64528 1

So as you can see the shared memory segments for the database are still allocated.

The easiest way to solve this problem would be to reboot your machine and after it's back started rename the file

/var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm

and then try to start your database again.

You can also remove all the keys related to user sdb using ipcrm -M

Markus

obermiller_w
Explorer
0 Kudos

So as you can see the shared memory segments for the database are still allocated.

The easiest way to solve this problem would be to reboot your machine and after it's back started rename the file

/var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm

and then try to start your database again.

You can also remove all the keys related to user sdb using ipcrm -M <key>

Markus, I have tried to do just that. I renamed MAXDB1.dbm.shm to

MAXDB1.dbm.shm-RENAMED.

This gets mysterious:

When after rebooting the db was restarted it crashed again, this time leaving a directory

named rtedump_dir containing a file named rtedump:


ERR 11112  CONSOLE  Incompatible version of running kernel and console!
ERR 11112  CONSOLE  Running kernel-version is: 
ERR 11112  CONSOLE  Actual console-version is: X32/LINUX 7.6.06   Build 003-121-202-135
ERR 11108  CONSOLE  console: kernel shared segment attach error 2 key 0x44000000

After the restart attempt, there is NO new MAXDB1.dbm.shm ! So this time it must have died before

it attempted to allocate shm.


knldiag.err:
2010-03-07 17:54:39  5188 ERR 11000 di_main  semop error: Identifier removed
2010-03-07 17:54:39                          --- Starting GMT 2010-03-07 16:54:39           7.6.06   Bui
ld 003-121-202-135 
2010-03-07 17:54:40     0 ERR 12006 DBCRASH  Kernel exited without core and exit status 0x6
2010-03-07 17:54:40     0 ERR 12009 DBCRASH  Kernel exited due to signal 6(SIGABRT)
2010-03-07 17:54:40                          ___ Stopping GMT 2010-03-07 16:54:40           7.6.06   Bui
ld 003-121-202-135 
2010-03-07 17:54:54 21191 ERR 20125 RTE      Database automatic restart failed
2010-03-07 17:56:39                          ___ Stopping GMT 2010-03-07 16:56:39           7.6.06   Bui
ld 003-121-202-135 
2010-03-07 17:58:03                          --- Starting GMT 2010-03-07 16:58:03           7.6.06   Bui
ld 003-121-202-135 
2010-03-07 17:58:22  5312 ERR 20125 RTE      Database automatic restart failed

thanks,

-w.

markus_doehr2
Active Contributor
0 Kudos

It seems your database was not completely updated/upgraded with the new version, some of the tools seem to have still older versions.

Since the supported way of upgrading databases is using SDBUPD (and not rpm packets) there's nothing left as to

- forcely reinstall the RPM again

- contact the RPM package maintainer to solve this problem.

Markus

obermiller_w
Explorer
0 Kudos

Markus,

This was a completely NEW install, I never upgrade neither software nor instance.

One of the first things I did yesterday was to run sdbverify which told me:

VERIFICATION SUMMARY:

*********************

INVALID PACKAGES: 0

VALID PACKAGES: 12

INCONSISTENT PACKAGES: 0

TOTAL FILES: 380

MISSED FILES: 0

MODIFIED FILES: 0

FILES WITH MODIFIED PERMISSIONS: 0

I'm stumped.

-w.

obermiller_w
Explorer
0 Kudos

Markus,

I forgot to mention that I did not install from an .rpm distribution (I'm not aware that there are any) but from

the original maxdb distribution (real thing), using SDBSETUP.

-w.

markus_doehr2
Active Contributor
0 Kudos

I'm sorry, I have no more idea what went wrong....

Markus

former_member229109
Active Contributor
0 Kudos

Hello ,

1. Please update with output of the following commands:

ps u2013efe | grep dbmsrv

ls u2013l /var/lib/sdb/dbm/ipc

ipcs -m | wc u2013l

sysctl -a | grep kernel.shmmni

dbmcli inst_enum

dbmcli db_enum u2013s

2. Please post the dbmsrv*.err located in /sapdb/data/wrk

3. Check first<!> that you have not active dbmrfc/dbmsrv processes,

Stop the x_server .

- kill all these processes manually, if they where not release after

Closing DBMGUI, dbmcli sessions, stoping the application server.

- try to check/remove the shared memory

/sapdb/MAXDB1/db/pgm/dbmshm CHECK /var/lib/sdb/dbm/ipc MAXDB1

/sapdb/MAXDB1/db/pgm/dbmshm DELETE /var/lib/sdb/dbm/ipc MAXDB1

- Check in /var/lib/sdb/dbm/ipc if you have files MAXDB1.dbm.shi and MAXDB1.dbm.shm

< rename both files to MAXDB1.dbm.shi.old and MAXDB1.dbm.shm.old>

- Try to connect to the database using dbmcli tool & post results.

4. Are you SAP customer?

Thank you and best regards, Natalia Khlopina

Edited by: Natalia Khlopina on Mar 8, 2010 9:26 PM

obermiller_w
Explorer
0 Kudos

Natalia,

thanks for your intervention.

It seems, that clearing the IPC (shm & shi) did the trick.

The database came up again and is running now.

I wish I understand better what happened, for some reason the shared memory information

must have been corrupted.

Thanks a lot,

-walter

Answers (1)

Answers (1)

Former Member
0 Kudos

hi,

check this thread if helpful:

thnx.

Prasanna

obermiller_w
Explorer
0 Kudos

Thanks for the pointer, but the corrupt datapage seems to be

a different problem.

The question here seems to be why the shm cannot be allocated, and how this can be changed.

-w.