cancel
Showing results for 
Search instead for 
Did you mean: 

SAPDB stuck with no recognizable pattern

Former Member
0 Kudos

We are using sapdb 7.6.0.34 as database in a webapp (spring, hibernate as orm mapper, commons dbcp) with the jdbc-driver version 7.6.0.30.5567. Database and application are on the same machine and the cache hit rate is at 100% as our cache is the same size than the database.

After some days the application stucks and is inresponsible after that. After killing and restarting the app it is responsible again

Doing a thread-dump at that point reveals that all threads are waiting for the database.

At that point i find tons of that in the xserver-log:

2008-07-28 10:51:12 13306 12915 XSERVER Connection lost

2008-07-28 10:51:12 13306 ERR -11987 COMMUNIC semop (receive 44793978) interrupted and check reports error

2008-07-28 10:51:12 13288 12915 XSERVER Connection lost

2008-07-28 10:51:12 13288 ERR -11987 COMMUNIC semop (receive 44269674) interrupted and check reports error

About half an hour before we can see a dramatic growth in connections. It seems the application starts opening new connections as the available ones are bound and waiting for a call to the db to return:

2008-07-28 06:38:53 4089 11560 COMMUNIC Releasing T244

2008-07-28 06:38:53 4089 12827 COMMUNIC wait for connection T244

2008-07-28 06:39:00 4073 11561 COMMUNIC Connecting T266 localhost 0

2008-07-28 06:39:00 4089 12929 TASKING Task T266 started

2008-07-28 06:39:00 4089 11007 COMMUNIC wait for connection T266

2008-07-28 06:39:00 4089 11561 COMMUNIC Connected T266 localhost 0

2008-07-28 08:06:04 4073 11561 COMMUNIC Connecting T242 localhost 0

2008-07-28 08:06:04 4089 12929 TASKING Task T242 started

2008-07-28 08:06:04 4089 11007 COMMUNIC wait for connection T242

2008-07-28 08:06:04 4089 11561 COMMUNIC Connected T242 localhost 0

2008-07-28 08:28:26 4073 11561 COMMUNIC Connecting T230 localhost 0

2008-07-28 08:28:26 4089 12929 TASKING Task T230 started

2008-07-28 08:28:26 4089 11007 COMMUNIC wait for connection T230

2008-07-28 08:28:26 4089 11561 COMMUNIC Connected T230 localhost 0

2008-07-28 09:24:44 4073 11561 COMMUNIC Connecting T218 localhost 0

2008-07-28 09:24:44 4089 12929 TASKING Task T218 started

2008-07-28 09:24:44 4089 11007 COMMUNIC wait for connection T218

2008-07-28 09:24:44 4089 11561 COMMUNIC Connected T218 localhost 0

2008-07-28 10:19:50 4073 11561 COMMUNIC Connecting T300 localhost 0

2008-07-28 10:19:50 4089 12929 TASKING Task T300 started

2008-07-28 10:19:50 4089 11007 COMMUNIC wait for connection T300

2008-07-28 10:19:50 4089 11561 COMMUNIC Connected T300 localhost 0

2008-07-28 10:19:52 4073 11561 COMMUNIC Connecting T243 localhost 0

2008-07-28 10:19:52 4089 12929 TASKING Task T243 started

2008-07-28 10:19:52 4089 11007 COMMUNIC wait for connection T243

2008-07-28 10:19:52 4089 11561 COMMUNIC Connected T243 localhost 0

2008-07-28 10:19:55 4073 11561 COMMUNIC Connecting T234 localhost 0

2008-07-28 10:19:55 4089 12929 TASKING Task T234 started

2008-07-28 10:19:55 4089 11007 COMMUNIC wait for connection T234

2008-07-28 10:19:55 4089 11561 COMMUNIC Connected T234 localhost 0

2008-07-28 10:20:15 4089 12929 TASKING Task T267 started

2008-07-28 10:20:15 4073 11561 COMMUNIC Connecting T267 localhost 0

2008-07-28 10:20:15 4089 11007 COMMUNIC wait for connection T267

2008-07-28 10:20:15 4089 11561 COMMUNIC Connected T267 localhost 0

2008-07-28 10:20:16 4073 11561 COMMUNIC Connecting T233 localhost 0

2008-07-28 10:20:16 4089 12929 TASKING Task T233 started

2008-07-28 10:20:16 4089 11007 COMMUNIC wait for connection T233

2008-07-28 10:20:16 4089 11561 COMMUNIC Connected T233 localhost 0

2008-07-28 10:21:08 4089 12929 TASKING Task T290 started

2008-07-28 10:21:08 4073 11561 COMMUNIC Connecting T290 localhost 0

2008-07-28 10:21:08 4089 11007 COMMUNIC wait for connection T290

2008-07-28 10:21:08 4089 11561 COMMUNIC Connected T290 localhost 0

2008-07-28 10:21:17 4089 12929 TASKING Task T235 started

2008-07-28 10:21:17 4073 11561 COMMUNIC Connecting T235 localhost 0

2008-07-28 10:21:17 4089 11007 COMMUNIC wait for connection T235

2008-07-28 10:21:17 4089 11561 COMMUNIC Connected T235 localhost 0

2008-07-28 10:21:21 4073 11561 COMMUNIC Connecting T215 localhost 0

2008-07-28 10:21:21 4089 12929 TASKING Task T215 started

2008-07-28 10:21:21 4089 11007 COMMUNIC wait for connection T215

2008-07-28 10:21:21 4089 11561 COMMUNIC Connected T215 localhost 0

2008-07-28 10:21:33 4073 11561 COMMUNIC Connecting T263 localhost 0

2008-07-28 10:21:33 4089 12929 TASKING Task T263 started

2008-07-28 10:21:33 4089 11007 COMMUNIC wait for connection T263

2008-07-28 10:21:33 4089 11561 COMMUNIC Connected T263 localhost 0

2008-07-28 10:21:41 4073 11561 COMMUNIC Connecting T225 localhost 0

I've googled for that sort of problem but the best answer i found is "this can happen if there is a problem with the socket communication" so that the socket is brocken but the jdbc-driver never realizes this and sits there, waiting for an answer.

Now i'm totally hosed, i've no clue where to look next.

Any hint would be appreciated.

Regards

Dirk Lachowski

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Hi Dirk,

> About half an hour before we can see a dramatic growth in connections. It seems the application starts opening new connections as the available ones are bound and waiting for a call to the db to return:

> Now i'm totally hosed, i've no clue where to look next.

>

> Any hint would be appreciated.

I propose to check why the client starts to create so many connections.

If the db calls hang on the client with semaphore operators in the JDBC driver, than perhaps checking the JAVA runtime environment might be paying off.

It don't sound like a db (kernel) problem at all.

regards,

Lars

BTW: the MaxDB version you use is pretty outdated - get the current patch here in SDN. As this also comes with a new version of the JDBC driver this perhaps improves things for you.

Former Member
0 Kudos

Hi Lars,

i finaly solved the problem, but i'm still unsure if it isn't a kernel problem.

As it turns out we was missing some indexes. Using diagnose monitor we found some selects with a very bad selectivity. The problem was, that under high load (approx. 1000 sql-commands/s) some of these selects seem to get stucked. The command sits there, doing nothing but holding a shared lock on a table because it started to do a full table scan but never returns. In turn, other commands waiting for a write lock also get stuck and there we are, everything stoped.

After adding some indexes the problem is gone, but i still don't understand the reason.

lbreddemann
Active Contributor
0 Kudos

HI Dirk,

good to hear you solved the problem.

Concerning the root cause: it would have been interesting to see, whether the tasks were really doing nothing.

X_CONS and the database trace (vtrace) would have been the 'weapons of choice' here.

Can you reproduce the issue on a test system?

regards,

Lars

Former Member
0 Kudos

Hi Lars,

setting up a test-system should be no big deal but creating a high enough load to reproduce the error?

I'll give it a try in the next weeks.

Regards,

Dirk

Answers (0)