cancel
Showing results for 
Search instead for 
Did you mean: 

SAP loses connection to Sybase ASE database

Former Member
0 Kudos

Hello all,

I'm currently having an issue on a newly installed SAP system on Sybase ASE 15.7.0.0042, where the SAP workprocesses seem to lose the connection to the database and go into reconnect status. A reconnection is never achieved until the entire instance is restarted (both SAP and the Sybase ASE database). While the SAP processes cannot connect to the database anymore, I can still connect to the database using (db)isql.

Here's a segment of a SAP work process developer trace:

C Mon Apr 22 12:47:24 2013

C  database is local: connect_timeout: 3

C  Thread ID:4084

C  lib_dbsl 720 03/06/2013 13:33:00

C  720_EXT build

C  Using dynamic link library 'E:\usr\sap\XMD\DVEBMGS00\exe\dbsybslib.dll'

C  dbsybslib.dll patch info

C    SAP patchlevel  0

C    SAP patchno  420

C    Last SYBASE DBSL patchlevel 0

C    Last SYBASE DBSL patchno         420

C    Last SYBASE DBSL patchcomment Sybase IQ: Enable remote/secondary connect to Sybase IQ (1737415)

C  lpc:(local) connection used on XPEMON01

C Mon Apr 22 12:47:27 2013

C  ERROR: -1 in function SQLConnectWithRetry (SQLConnectWithRetry) [line 4028]

C  (30012) [08001] [Sybase][ODBC Driver]Client unable to establish a connection

C  syb_glob.dbsl_ca.errclass set to DBSL_EC_RESTART

I'm running kernel 7.20_EXT x64 Unicode, patch 402, dbsybslib.dll patch 420 and version 15.7.0.506 of the ODBC driver.

Does anybody have any ideas about this? Thanks in advance!

Kindest regards, WIlbert

Accepted Solutions (0)

Answers (4)

Answers (4)

debrun_julien
Discoverer
0 Kudos

Hello everybody,

I had the same issue. Solved by changing in vmware the network adapter into VMXNET3 (recommended by SAP)


Situation :

- Windows Server 2012
- virtual machine on vmware

- Sybase

I noticed a warning into the System event viewer in Windows :



Log Name:      System
Source:        e1iexpress
Event ID:      27
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      --- censored by me! ---
Description:
Intel(R) 82574L Gigabit Network Connection
Network link is disconnected.

and just after something about the "Service TCP/IP Netbios stop" and just after a start.

By changing the adapter, no more warnings an the systems are stable.
If its not enough it could be helpful to add a loopback network adapter (needed to solve a same problem with an Oracle database)


I hop this help !
Julien

Former Member
0 Kudos

Hi,

it maybe can due to low number of user connections.

isql -Usapsa -S <SID> -w999

1> use master

2> go

1> sp_configure "number of user connections"

2> go

Parameter Name                 Default     Memory Used Config Value Run Value    Unit                 Type

------------------------------
----------- ----------- ------------ ------------ --------------------
--------------------

number of user
connections              25       20976           48           48 number               dynamic

(1 row affected)

(return status = 0)

48 in this case wasn't enough for 30WPs

1> sp_configure "number of user connections",

2> 100

3> go

Parameter Name                 Default     Memory Used Config Value Run Value    Unit                 Type

  ------------------------------
----------- ----------- ------------ ------------ --------------------
--------------------

  number of user
connections              25       44396          100          100 number               dynamic

(1 row affected)

Configuration option changed. ASE need not be rebooted since the option is dynamic.

Changing the value of 'number of user connections' to '100' increases the amount of memory ASE uses by 25380 K.

 

(return status = 0)

Former Member
0 Kudos

Hello Heitbrock,

Thanks for your reaction.

At a very early stage in troubleshooting this issue, I've already set this parameter to a value of 250. With this setting, the problem still occurs.

Kindest regards, Wilbert

former_member215961
Contributor
0 Kudos

Hi Wilbert,

How did you solve the problem? We have the same issue.

Thanks in advance.

Former Member
0 Kudos

Hello Wilbert and Sargan,

We are also having this issue. Did you find a solution?

Thank you!

Former Member
0 Kudos

Hello Sargan and Maria,

We've seemingly solved the issue by updating the database and the drivers to the latest versions. Also, a number of parameters have been changed in the course of this problem. This includes the parameters "number of locks" (from 1m to 5m), "statement cache size" (set to 256mb currently) and "procedure cache size" (set to 512mb currently). Also, review note 1749935 for the current parameter recommendations. Hope this helps you.

Kindest regards, Wilbert

Former Member
0 Kudos

Hi Wilbert,

Thank you for the guidance.

Best regards,

Maria Sahukar

dawn_kim
Contributor
0 Kudos

Hi Wilbert,

I had a client issue with similar issues and it was an ASE bug, not sure if your version of ASE has this fix in it, but an ASE person can check. CR # 721809 TDS RESPONSE sequence incorrect at times. The work- around for this was to change the kernel mode option on the ASE from threaded to process. This is fixed in a later version of the ASE. This was a new feature of ASE 15.7: SyBooks Online. Anyways something to look at.

Thanks,

Dawn

Former Member
0 Kudos

The problem is now also occurring on another system using 15.7.0.042!

I can see that the sqlsrvr.exe process takes about 50% of the CPU time, but no connection is possible (either by disp+work or using dbisql or isql). Stopping the ASE service results in the process taking 98% CPU and not stopping properly.

Trace from disp+work:

Wed Jun 19 08:16:02 2013

ERROR: -1 in function SQLConnectWithRetry (SQLConnectWithRetry) [line 4028]

(30012) [08001] [Sybase][ODBC Driver]Client unable to establish a connection

syb_glob.dbsl_ca.errclass set to DBSL_EC_RESTART

Connection 0 failed using DRIVER={Adaptive Server Enterprise};server=<OUR_HOST>;port=4901;uid=SAPSR3;db=<OUR_SID>;pwd=xxxxxxxxx;TextSize=2147483647;FetchArraySize=1000;ReleaseLocksOnCursorClose=1;DynamicPrepare=1;UseCursor=2;homogeneousbatch=1;charset=utf8;EncryptPassword=2;EnableLOBLocator=0;app=R3D004 00 comm rd ODBC;logintimeout=3;ProtocolCapture=cap_w4_comm_rd;supresstdscontroltokens=1;SuppressRowFormat=1;SuppressParamFormat=1;

***LOG BV3=> severe db error 30012     ; work process is stopped [dbsh         1241]

***LOG BY2=> sql error 30012  performing CON [dblink       544]

***LOG BY0=> [ASE Error SQL30012:08001]Client unable to establish a connection[ASE Error SQL30012:08001]Client unable to establish a connection [dblink       544]

***LOG R19=> ThInit, db_connect ( DB-Connect 000256) [thxxhead.c   1484] in_ThErrHandle: 1

*** ERROR => ThInit: db_connect (step 1, th_errno 13, action 3, level 1) [thxxhead.c   11318]

Trace from ASE:

00:0009:00000:00000:2013/06/19 03:34:19.66 kernel  Sybperf helper thread is setting up counters for a windows Performance Monitor connection

00:0009:00000:00000:2013/06/19 03:34:21.72 kernel  Sybperf helper thread is setting up counters for a windows Performance Monitor connection

00:0009:00000:00000:2013/06/19 05:24:19.93 kernel  Sybperf helper thread is setting up counters for a windows Performance Monitor connection

00:0009:00000:00000:2013/06/19 05:24:22.10 kernel  Sybperf helper thread is setting up counters for a windows Performance Monitor connection

00:0009:00000:00000:2013/06/19 07:34:22.65 kernel  Sybperf helper thread is setting up counters for a windows Performance Monitor connection

System:

SAP ECC 6.05

Windows Server 2008R2 SP1

Kernel 720_EXT 402, Database library 720_EXT 420

Sybase ASE 15.0.7.042

DBCLNT157SP08_2

former_member184473
Active Contributor
0 Kudos

Hello Wilbert,

Kindly execute a R3trans -d and provide the trans.log generated.

Regards,

Eduardo Rezende

Former Member
0 Kudos

Hello Eduardo,

At this point this is not interesting because the system has been restarted and SAP is able to connect without problems at the moment. Please find a log file attached anyway. When the problem occurs again, I'll provide the requested data again.

Kindest regards, Wilbert

former_member184473
Active Contributor
0 Kudos

Hi Wilbert,

The attached trans.log does not contain any issues.

When you faced this issue again, execute a R3trans -d and check/provide the trans.log,

Regards,

Eduardo

Former Member
0 Kudos

Hello Eduardo,

The problem just occurred again, might be related to high stress situations on the database (it was building a shadow instance at the time of this occurrence). Please find the R3trans.log attached.

Kindest regards, Wilbert

former_member184473
Active Contributor
0 Kudos

Hi Wilbert,

High load in the system could cause this behavior.

Is there any information in the ASE log?

Regards,

Eduardo

Former Member
0 Kudos

Hello Eduardo,

The problem just hapened again. Nothing much in the ASE log:

0:0006:00000:00096:2013/05/14 00:00:03.80 server  Increase the config parameter 'number of open indexes' to avoid descriptor reuse. Reuse may result in performance degradation.

00:0006:00000:00012:2013/05/14 11:31:02.40 kernel  Begin processing to generate RSA keypair.

00:0006:00000:00012:2013/05/14 11:31:02.58 kernel  Completed processing to generate RSA keypair.

00:0006:00000:00012:2013/05/14 11:31:02.58 kernel  Begin processing to generate RSA keypair.

00:0006:00000:00012:2013/05/14 11:31:02.71 kernel  Completed processing to generate RSA keypair.

00:0006:00000:00162:2013/05/14 13:47:23.68 server  Increase the config parameter 'number of open indexes' to avoid descriptor reuse. Reuse may result in performance degradation.

00:0006:00000:00019:2013/05/14 15:20:01.06 kernel  JS: failed to send jstask request

00:0006:00000:00019:2013/05/14 15:20:01.06 kernel  Job Scheduler Task lost its Agent connection (No error)

00:0006:00000:00019:2013/05/14 15:20:20.11 kernel  Job Scheduler Task failed (re)connecting to its Agent (No error)

00:0006:00000:00019:2013/05/14 15:20:20.31 kernel  Job Scheduler Task state set to stop

These are all the entries for today. In the job scheduler log, I can see this:

00:03832:03828:2013/05/14 15:15:21.35 jamain  -> JS: Task/Agent socket error.

00:03832:03828:2013/05/14 15:15:21.36 jamain  -> JS: Failed to receive communication buffer.

00:03832:03828:2013/05/14 15:15:21.41 jamain  Client message: ct_send(): network packet layer: internal net library error: Net-Library operation terminated due to disconnect

00:03832:03828:2013/05/14 15:15:21.41 jamain  ct_send() failed

00:03832:03828:2013/05/14 15:15:21.41 jamain  jsd_MakeLanguageCommand() failed

00:03832:03828:2013/05/14 15:15:21.41 jamain  Client message: ct_cmd_drop(): user api layer: external error: The connection has been marked dead.

00:03832:03828:2013/05/14 15:15:21.41 jamain  Client message: ct_cmd_drop(): user api layer: external error: The connection has been marked dead.

00:03832:03828:2013/05/14 15:15:21.41 jamain  ct_cmd_drop() failed. Potential memory leak.

00:03832:03828:2013/05/14 15:15:21.43 jamain  Client message: ct_close(CS_UNUSED): user api layer: external error: The connection has been marked dead.

00:03832:03828:2013/05/14 15:15:21.43 jamain  ct_close(NORMAL) failed

Any thoughts?

Kindest regards, Wilbert

kimon_moschandreou
Contributor
0 Kudos

Hi,

connect to ASE and increase the mentioned parameter

sp_configure 'number of open indexes'

to check the current value

sp_configure 'number of open indexes',5000

to set it to 5000

The command should be run by a login with sa_role. Set a value 50% bigger than the current setting.

Best regards,

Kimon

Former Member
0 Kudos

Hi,

This looks more like the connection was just lost between the two. Is there anything reported in the Windows event log?

If all else fails and doesn't show any external problem, you could try enabling TF 3641 on the ASE, this should dump a lot more information to the ASE log that might help debug (at least it will show you what JS activity is taking place prior to the failure). I really would suggest trying to hit this on a QA or DEV system rather than PROD though.

To add a TF (this is not advised for the prod system)

Open regedit, and find:-

HKEY_LOCAL_MACHINE\SOFTWARE\SYBASE\Server\<SID>\Parameters

On here you will see a number of string values Arg<N>

Please add a further string value called Arg<x+1> (where x is the current maximum Arg value) with the values -T3641.

So if currently your last entry is Arg4, add one more, Arg5 with the value -T3641.

You can then restart the ASE service.

Cheers,

Simon

Former Member
0 Kudos

Hello Simon,

Thanks for the extra tip. I've set the extra argument and restarted the ASE.

I can see this segment repeating itself, with a lot of "js__callout" messages in between:

JS: Task verified HEART BEAT on Agent with 0 jobs in its queue

JS: Task sent poll request.

JS: Task waiting for agent response ...

JS: Task received agent response.

JS: Task next cycle frequency for sjobid 13 set to 297 seconds

The last entries in the ASE log are these (the system is not accessible at this time):

JSREQ : numRun = 4, numExpire = 0, numTerminate = 0

JS: Task/Agent socket error.

JS: failed to send jstask request

Job Scheduler Task lost its Agent connection (No error)

JS: Task cycle frequency set to 1 seconds

JS: jvmh_ntwk_shutdown() status [0]

JS: jvmh_ntwk_closesocket() status [0]

JS: Task attempting to connect to Agent on port 4903

JS: Connecting to fe80::34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Connecting to 10.100.100.1

JS: Failed to connect.

JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Failed to connect on socket.

JS: Task waiting for Agent retry cnt 4

JS: Task attempting to connect to Agent on port 4903

JS: Connecting to fe80::34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Connecting to 10.100.100.1

JS: Failed to connect.

JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Failed to connect on socket.

JS: Task waiting for Agent retry cnt 3

JS: Task attempting to connect to Agent on port 4903

JS: Connecting to fe80::34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Connecting to 10.100.100.1

JS: Failed to connect.

JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Failed to connect on socket.

JS: Task waiting for Agent retry cnt 2

JS: Task attempting to connect to Agent on port 4903

JS: Connecting to fe80::34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Connecting to 10.100.100.1

JS: Failed to connect.

JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Failed to connect on socket.

JS: Task waiting for Agent retry cnt 1

JS: Task attempting to connect to Agent on port 4903

JS: Connecting to fe80::34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Connecting to 10.100.100.1

JS: Failed to connect.

JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe

JS: Failed to connect.

JS: Failed to connect on socket.

Job Scheduler Task failed (re)connecting to its Agent (No error)

Job Scheduler Task state set to stop

Seems there are only messages logged related to the Job Scheduler, and I'm not sure these are errors.

The strange thing is that I can still connect to ASE using dbisql. I can run queries and sp_configure statements. Port 4901 is also still listening:

netstat -nao |findstr 4901

TCP    10.100.100.1:4901      0.0.0.0:0              LISTENING       3636

TCP    10.100.100.1:4901      10.100.100.1:50233     ESTABLISHED     3636

TCP    10.100.100.1:4901      10.100.100.1:51693     ESTABLISHED     3636

TCP    10.100.100.1:4901      10.100.100.1:51697     ESTABLISHED     3636

TCP    10.100.100.1:4901      10.100.100.1:51698     ESTABLISHED     3636

TCP    10.100.100.1:4901      10.100.100.1:52029     ESTABLISHED     3636

TCP    10.100.100.1:50233     10.100.100.1:4901      ESTABLISHED     3596

TCP    10.100.100.1:51693     10.100.100.1:4901      ESTABLISHED     3596

TCP    10.100.100.1:51697     10.100.100.1:4901      ESTABLISHED     3596

TCP    10.100.100.1:51698     10.100.100.1:4901      ESTABLISHED     3596

TCP    10.100.100.1:52029     10.100.100.1:4901      ESTABLISHED     6500

TCP    [2001:0:9d38:6ab8:c54:3be1:f59b:9bfe]:4901  [::]:0                 LISTENING       3636

TCP    [fe80::c54:3be1:f59b:9bfe%13]:4901  [::]:0                 LISTENING    3636

UDP    0.0.0.0:54901          *:*                                    1252

The problem seems to be in the parts between ASE and SAP (db client?).

Kindest regards, Wilbert

Message was edited by: Wilbert Jeuken

Former Member
0 Kudos

Hello Kimon,

Note 1539125 gives a recommendation for "number of open indexes" as  >= 30000. The current running value in my system is 60000. I've raised the number to 90000 nonetheless, just to see whether the warning will disappear.

Kindest regards, Wilbert

kimon_moschandreou
Contributor
0 Kudos

Hello Wilbert,

I think you did well, I am a Sybase ASE DBA without any experience yet on SAP system and I wasn't familiar with this note

Best regards,

Kimon