on 04-22-2013 11:46 AM
Hello all,
I'm currently having an issue on a newly installed SAP system on Sybase ASE 15.7.0.0042, where the SAP workprocesses seem to lose the connection to the database and go into reconnect status. A reconnection is never achieved until the entire instance is restarted (both SAP and the Sybase ASE database). While the SAP processes cannot connect to the database anymore, I can still connect to the database using (db)isql.
Here's a segment of a SAP work process developer trace:
C Mon Apr 22 12:47:24 2013
C database is local: connect_timeout: 3
C Thread ID:4084
C lib_dbsl 720 03/06/2013 13:33:00
C 720_EXT build
C Using dynamic link library 'E:\usr\sap\XMD\DVEBMGS00\exe\dbsybslib.dll'
C dbsybslib.dll patch info
C SAP patchlevel 0
C SAP patchno 420
C Last SYBASE DBSL patchlevel 0
C Last SYBASE DBSL patchno 420
C Last SYBASE DBSL patchcomment Sybase IQ: Enable remote/secondary connect to Sybase IQ (1737415)
C lpc:(local) connection used on XPEMON01
C
C Mon Apr 22 12:47:27 2013
C ERROR: -1 in function SQLConnectWithRetry (SQLConnectWithRetry) [line 4028]
C (30012) [08001] [Sybase][ODBC Driver]Client unable to establish a connection
C syb_glob.dbsl_ca.errclass set to DBSL_EC_RESTART
I'm running kernel 7.20_EXT x64 Unicode, patch 402, dbsybslib.dll patch 420 and version 15.7.0.506 of the ODBC driver.
Does anybody have any ideas about this? Thanks in advance!
Kindest regards, WIlbert
Hello everybody,
I had the same issue. Solved by changing in vmware the network adapter into VMXNET3 (recommended by SAP)
Situation :
- Windows Server 2012
- virtual machine on vmware
- Sybase
I noticed a warning into the System event viewer in Windows :
Log Name: System
Source: e1iexpress
Event ID: 27
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: --- censored by me! ---
Description:
Intel(R) 82574L Gigabit Network Connection
Network link is disconnected.
and just after something about the "Service TCP/IP Netbios stop" and just after a start.
By changing the adapter, no more warnings an the systems are stable.
If its not enough it could be helpful to add a loopback network adapter (needed to solve a same problem with an Oracle database)
I hop this help !
Julien
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
it maybe can due to low number of user connections.
isql -Usapsa -S <SID> -w999
1> use master
2> go
1> sp_configure "number of user connections"
2> go
Parameter Name Default Memory Used Config Value Run Value Unit Type
------------------------------
----------- ----------- ------------ ------------ --------------------
--------------------
number of user
connections 25 20976 48 48 number dynamic
(1 row affected)
(return status = 0)
48 in this case wasn't enough for 30WPs
1> sp_configure "number of user connections",
2> 100
3> go
Parameter Name Default Memory Used Config Value Run Value Unit Type
------------------------------
----------- ----------- ------------ ------------ --------------------
--------------------
number of user
connections 25 44396 100 100 number dynamic
(1 row affected)
Configuration option changed. ASE need not be rebooted since the option is dynamic.
Changing the value of 'number of user connections' to '100' increases the amount of memory ASE uses by 25380 K.
(return status = 0)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Sargan and Maria,
We've seemingly solved the issue by updating the database and the drivers to the latest versions. Also, a number of parameters have been changed in the course of this problem. This includes the parameters "number of locks" (from 1m to 5m), "statement cache size" (set to 256mb currently) and "procedure cache size" (set to 512mb currently). Also, review note 1749935 for the current parameter recommendations. Hope this helps you.
Kindest regards, Wilbert
Hi Wilbert,
I had a client issue with similar issues and it was an ASE bug, not sure if your version of ASE has this fix in it, but an ASE person can check. CR # 721809 TDS RESPONSE sequence incorrect at times. The work- around for this was to change the kernel mode option on the ASE from threaded to process. This is fixed in a later version of the ASE. This was a new feature of ASE 15.7: SyBooks Online. Anyways something to look at.
Thanks,
Dawn
The problem is now also occurring on another system using 15.7.0.042!
I can see that the sqlsrvr.exe process takes about 50% of the CPU time, but no connection is possible (either by disp+work or using dbisql or isql). Stopping the ASE service results in the process taking 98% CPU and not stopping properly.
Trace from disp+work:
Wed Jun 19 08:16:02 2013
ERROR: -1 in function SQLConnectWithRetry (SQLConnectWithRetry) [line 4028]
(30012) [08001] [Sybase][ODBC Driver]Client unable to establish a connection
syb_glob.dbsl_ca.errclass set to DBSL_EC_RESTART
Connection 0 failed using DRIVER={Adaptive Server Enterprise};server=<OUR_HOST>;port=4901;uid=SAPSR3;db=<OUR_SID>;pwd=xxxxxxxxx;TextSize=2147483647;FetchArraySize=1000;ReleaseLocksOnCursorClose=1;DynamicPrepare=1;UseCursor=2;homogeneousbatch=1;charset=utf8;EncryptPassword=2;EnableLOBLocator=0;app=R3D004 00 comm rd ODBC;logintimeout=3;ProtocolCapture=cap_w4_comm_rd;supresstdscontroltokens=1;SuppressRowFormat=1;SuppressParamFormat=1;
***LOG BV3=> severe db error 30012 ; work process is stopped [dbsh 1241]
***LOG BY2=> sql error 30012 performing CON [dblink 544]
***LOG BY0=> [ASE Error SQL30012:08001]Client unable to establish a connection[ASE Error SQL30012:08001]Client unable to establish a connection [dblink 544]
***LOG R19=> ThInit, db_connect ( DB-Connect 000256) [thxxhead.c 1484] in_ThErrHandle: 1
*** ERROR => ThInit: db_connect (step 1, th_errno 13, action 3, level 1) [thxxhead.c 11318]
Trace from ASE:
00:0009:00000:00000:2013/06/19 03:34:19.66 kernel Sybperf helper thread is setting up counters for a windows Performance Monitor connection
00:0009:00000:00000:2013/06/19 03:34:21.72 kernel Sybperf helper thread is setting up counters for a windows Performance Monitor connection
00:0009:00000:00000:2013/06/19 05:24:19.93 kernel Sybperf helper thread is setting up counters for a windows Performance Monitor connection
00:0009:00000:00000:2013/06/19 05:24:22.10 kernel Sybperf helper thread is setting up counters for a windows Performance Monitor connection
00:0009:00000:00000:2013/06/19 07:34:22.65 kernel Sybperf helper thread is setting up counters for a windows Performance Monitor connection
System:
SAP ECC 6.05
Windows Server 2008R2 SP1
Kernel 720_EXT 402, Database library 720_EXT 420
Sybase ASE 15.0.7.042
DBCLNT157SP08_2
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Wilbert,
Kindly execute a R3trans -d and provide the trans.log generated.
Regards,
Eduardo Rezende
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Eduardo,
At this point this is not interesting because the system has been restarted and SAP is able to connect without problems at the moment. Please find a log file attached anyway. When the problem occurs again, I'll provide the requested data again.
Kindest regards, Wilbert
Hello Eduardo,
The problem just hapened again. Nothing much in the ASE log:
0:0006:00000:00096:2013/05/14 00:00:03.80 server Increase the config parameter 'number of open indexes' to avoid descriptor reuse. Reuse may result in performance degradation.
00:0006:00000:00012:2013/05/14 11:31:02.40 kernel Begin processing to generate RSA keypair.
00:0006:00000:00012:2013/05/14 11:31:02.58 kernel Completed processing to generate RSA keypair.
00:0006:00000:00012:2013/05/14 11:31:02.58 kernel Begin processing to generate RSA keypair.
00:0006:00000:00012:2013/05/14 11:31:02.71 kernel Completed processing to generate RSA keypair.
00:0006:00000:00162:2013/05/14 13:47:23.68 server Increase the config parameter 'number of open indexes' to avoid descriptor reuse. Reuse may result in performance degradation.
00:0006:00000:00019:2013/05/14 15:20:01.06 kernel JS: failed to send jstask request
00:0006:00000:00019:2013/05/14 15:20:01.06 kernel Job Scheduler Task lost its Agent connection (No error)
00:0006:00000:00019:2013/05/14 15:20:20.11 kernel Job Scheduler Task failed (re)connecting to its Agent (No error)
00:0006:00000:00019:2013/05/14 15:20:20.31 kernel Job Scheduler Task state set to stop
These are all the entries for today. In the job scheduler log, I can see this:
00:03832:03828:2013/05/14 15:15:21.35 jamain -> JS: Task/Agent socket error.
00:03832:03828:2013/05/14 15:15:21.36 jamain -> JS: Failed to receive communication buffer.
00:03832:03828:2013/05/14 15:15:21.41 jamain Client message: ct_send(): network packet layer: internal net library error: Net-Library operation terminated due to disconnect
00:03832:03828:2013/05/14 15:15:21.41 jamain ct_send() failed
00:03832:03828:2013/05/14 15:15:21.41 jamain jsd_MakeLanguageCommand() failed
00:03832:03828:2013/05/14 15:15:21.41 jamain Client message: ct_cmd_drop(): user api layer: external error: The connection has been marked dead.
00:03832:03828:2013/05/14 15:15:21.41 jamain Client message: ct_cmd_drop(): user api layer: external error: The connection has been marked dead.
00:03832:03828:2013/05/14 15:15:21.41 jamain ct_cmd_drop() failed. Potential memory leak.
00:03832:03828:2013/05/14 15:15:21.43 jamain Client message: ct_close(CS_UNUSED): user api layer: external error: The connection has been marked dead.
00:03832:03828:2013/05/14 15:15:21.43 jamain ct_close(NORMAL) failed
Any thoughts?
Kindest regards, Wilbert
Hi,
connect to ASE and increase the mentioned parameter
sp_configure 'number of open indexes'
to check the current value
sp_configure 'number of open indexes',5000
to set it to 5000
The command should be run by a login with sa_role. Set a value 50% bigger than the current setting.
Best regards,
Kimon
Hi,
This looks more like the connection was just lost between the two. Is there anything reported in the Windows event log?
If all else fails and doesn't show any external problem, you could try enabling TF 3641 on the ASE, this should dump a lot more information to the ASE log that might help debug (at least it will show you what JS activity is taking place prior to the failure). I really would suggest trying to hit this on a QA or DEV system rather than PROD though.
To add a TF (this is not advised for the prod system)
Open regedit, and find:-
HKEY_LOCAL_MACHINE\SOFTWARE\SYBASE\Server\<SID>\Parameters
On here you will see a number of string values Arg<N>
Please add a further string value called Arg<x+1> (where x is the current maximum Arg value) with the values -T3641.
So if currently your last entry is Arg4, add one more, Arg5 with the value -T3641.
You can then restart the ASE service.
Cheers,
Simon
Hello Simon,
Thanks for the extra tip. I've set the extra argument and restarted the ASE.
I can see this segment repeating itself, with a lot of "js__callout" messages in between:
JS: Task verified HEART BEAT on Agent with 0 jobs in its queue
JS: Task sent poll request.
JS: Task waiting for agent response ...
JS: Task received agent response.
JS: Task next cycle frequency for sjobid 13 set to 297 seconds
The last entries in the ASE log are these (the system is not accessible at this time):
JSREQ : numRun = 4, numExpire = 0, numTerminate = 0
JS: Task/Agent socket error.
JS: failed to send jstask request
Job Scheduler Task lost its Agent connection (No error)
JS: Task cycle frequency set to 1 seconds
JS: jvmh_ntwk_shutdown() status [0]
JS: jvmh_ntwk_closesocket() status [0]
JS: Task attempting to connect to Agent on port 4903
JS: Connecting to fe80::34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Connecting to 10.100.100.1
JS: Failed to connect.
JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Failed to connect on socket.
JS: Task waiting for Agent retry cnt 4
JS: Task attempting to connect to Agent on port 4903
JS: Connecting to fe80::34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Connecting to 10.100.100.1
JS: Failed to connect.
JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Failed to connect on socket.
JS: Task waiting for Agent retry cnt 3
JS: Task attempting to connect to Agent on port 4903
JS: Connecting to fe80::34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Connecting to 10.100.100.1
JS: Failed to connect.
JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Failed to connect on socket.
JS: Task waiting for Agent retry cnt 2
JS: Task attempting to connect to Agent on port 4903
JS: Connecting to fe80::34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Connecting to 10.100.100.1
JS: Failed to connect.
JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Failed to connect on socket.
JS: Task waiting for Agent retry cnt 1
JS: Task attempting to connect to Agent on port 4903
JS: Connecting to fe80::34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Connecting to 10.100.100.1
JS: Failed to connect.
JS: Connecting to 2001:0:4137:9e76:34d4:2e52:f59b:9bfe
JS: Failed to connect.
JS: Failed to connect on socket.
Job Scheduler Task failed (re)connecting to its Agent (No error)
Job Scheduler Task state set to stop
Seems there are only messages logged related to the Job Scheduler, and I'm not sure these are errors.
The strange thing is that I can still connect to ASE using dbisql. I can run queries and sp_configure statements. Port 4901 is also still listening:
netstat -nao |findstr 4901
TCP 10.100.100.1:4901 0.0.0.0:0 LISTENING 3636
TCP 10.100.100.1:4901 10.100.100.1:50233 ESTABLISHED 3636
TCP 10.100.100.1:4901 10.100.100.1:51693 ESTABLISHED 3636
TCP 10.100.100.1:4901 10.100.100.1:51697 ESTABLISHED 3636
TCP 10.100.100.1:4901 10.100.100.1:51698 ESTABLISHED 3636
TCP 10.100.100.1:4901 10.100.100.1:52029 ESTABLISHED 3636
TCP 10.100.100.1:50233 10.100.100.1:4901 ESTABLISHED 3596
TCP 10.100.100.1:51693 10.100.100.1:4901 ESTABLISHED 3596
TCP 10.100.100.1:51697 10.100.100.1:4901 ESTABLISHED 3596
TCP 10.100.100.1:51698 10.100.100.1:4901 ESTABLISHED 3596
TCP 10.100.100.1:52029 10.100.100.1:4901 ESTABLISHED 6500
TCP [2001:0:9d38:6ab8:c54:3be1:f59b:9bfe]:4901 [::]:0 LISTENING 3636
TCP [fe80::c54:3be1:f59b:9bfe%13]:4901 [::]:0 LISTENING 3636
UDP 0.0.0.0:54901 *:* 1252
The problem seems to be in the parts between ASE and SAP (db client?).
Kindest regards, Wilbert
Message was edited by: Wilbert Jeuken
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.