cancel
Showing results for 
Search instead for 
Did you mean: 

Database connections get dried up on SAP GRC

0 Kudos

Hi,


I administer a SAP GRC system, running on Linux x86_64 with Oracle 11.2.0.2.0.
Sincer November 2012, we have the following problem:

The database connections get dried up. Meaning that the number of OS processes with LOCAL=NO is increasing until in reaches the maximum allowed connections and as the connections are not closed, the system hangs. At this point we restart the system and all the connections get released.

This issue occured at first once every two weeks. But for more than a month now, once the system is restarted, the number of connections increases immediately without being released and we need to restart the system again and again every 2 or 3 days in order to release connections and get a functional system at least for a while.

At first, we increased the parameter "processes" from 150 to 300. And Max Connections also to 300. Even though we increase the number of processes and sessions, they still get dried up after a while; so it just delays a bit the inevitable.

We opened a sap call at GRC category, and so far, it does not seem to be a problem related to the GRC application side. They haven't figured anything wrong.


From the database perspective we tried switching from dedicated server processes to shared server processes. But this resulted in a similar issue. Even though there are no more LOCAL=NO processes, after a while, the system gives a timeout when we try to log in and a restart is again needed.


Have you ever encountered a similiar issue, or do you have any suggestions what else to try?

Thank you,
Delia

Accepted Solutions (0)

Answers (5)

Answers (5)

Reagan
Product and Topic Expert
Product and Topic Expert
0 Kudos

Hello

Ran into this issue on GRC systems on few occasions and support from SAP was not that convincing.

This used to be a recurring issue on GRC systems especially when the background jobs for Risk Analysis are being executed.

As soon as I start the GRC application the system was consuming all the processes at the DB level.

I played with few settings by changing :

The Thread Manager Configuration in Config Tool

The JCo Pool Configuration for the Virsa JCo's

Increased the Max Connections in the JDBC Connector in the Visual Administrator and reduced the cleanup thread interval.

Increased the Oracle processes to 320 and sessions to 640 (2*processes)

Not sure which worked but you can give it a shot.

Good Luck

RB

0 Kudos

Hi RB,

Thank you very much for your response. It's good to hear that the problem has been encountered before

We changed to Connection Pooling values to the following:
Maximum Connections: 400
Maximum Time to Wait for Connection: 120
Connection Lifetime: 450
Cleanup Thread: 150

processes has 480 and sessions 960

And we restarted the SAP system and DB last evening and in about 17 hours, the number of "LOCAL=NO" connections reached 100 It seems to grow even faster than before.

I will continue analyzing the first two points mentioned by you (The Thread Manager Configuration in Config Tool and The JCo Pool Configuration for the Virsa JCo's) to see what needs to be changed there.

Have a great day,
Delia

steven_pelser
Participant
0 Kudos

Hi

We are having the same problem and this is happening to all our systems. We logged a call with sap and they said we must move from the 720 kernel to the  721 ext kernel patch 400

WE did this and its still happening

I now go in and terminate them LOCAL=NO processes manually I..e kill -9 "process number"

Thanks Morgs


0 Kudos

Hi,

Our issue got resolved after the upgrade of the components:

VIRACLP

VIRAE

VIRCC

VIREPRTA

VIRFF 

VIRRE 

Have a great day,

Delia

stefan_koehler
Active Contributor
0 Kudos

Hi Delia,

> Meaning that the number of OS processes with LOCAL=NO is increasing until in reaches the maximum allowed connections and as the connections are not closed, the system hangs. At this point we restart the system and all the connections get released. At first, we increased the parameter "processes" from 150 to 300. And Max Connections also to 300.

I am not very familiar with GRC, but that is the way how JDBC connection pooling (in a J2EE environment) works. It well never work, if you set the Oracle init parameter processes to 300 and max connections to 300 as well (if all connections are needed by J2EE layer).

Just refer to the Oracle documentation of init parameter processes:

http://docs.oracle.com/cd/E11882_01/server.112/e25513/initparams200.htm#i1132608

PROCESSES specifies the maximum number of operating system user processes that can simultaneously connect to Oracle. Its value should allow for all background processes such as locks, job queue processes, and parallel execution processes.

J2EE connection pooling:

http://help.sap.com/saphelp_nw2004s/helpdata/DE/2d/292391fa6745488f3e0e0d4b03c64e/content.htm

> From the database perspective we tried switching from dedicated server processes to shared server processes.

Bad idea as you already got connection pooling on the application layer and the database will use several (memory) areas differently.

However you can track down the connections by using the listener trace file (/oracle/diag/tnslsnr/<hostname>/listener/trace)

as all the connections are "LOCAL=NO".

Another 3 points that need to be clarified:

  • What is the exact JDBC connection pool setting?
  • Is this issue caused by the process limit or by session limit (as one "connection" can create several sessions)?
  • Are all processes related to an active connection (from the connection pool) or are there dead ones as well, that are not cleaned up? (can be checked with lsof for example)

Do you have Wily Introscope running? You can track down issues on J2EE layer, if the problem is related to active JDBC connections in the pool.

Regards

Stefan

0 Kudos

Hi Stefan,

Thank you for your detailed response.

Yes, the number of Max Connections is indeed incorrect, as it should be smaller than processes, but it was a mistake that we figured it won't affect the system so much, as the connections go as much as they can, never reaching 300. Nevertheless, yesterday we changed Max Connections to 400, processes to 480 and sessions to 960.

Also, from Connection Pooling we changed

Cleanup Thread from 300 to 150
Connection Lifetime from 600 to 450

In the listener trace file we encounter the same error over and over again:

** DBGRL Error: ARB Alert Log
** DBGRL Error: SLERC_OERC, 48180
** DBGRL Error: Linux-x86_64 Error: 13: Permission denied
Additional information: 1
** DBGRL Error: <msg time='2013-05-08T13:49:43.176+02:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='<hostname>'
host_addr='<IP>'>
<txt>08-MAY-2013 13:49:43 * service_update * GRP * 0
</txt>
</msg>

Still, we have this error in the trace files also for other SAP systems on the same host, so we thought this couldn't be the root cause.


•What is the exact JDBC connection pool setting?
Previous to starting this thread we had:
Initial Connections: 1
Maximum Connections: 300
Maximum Time to Wait for Connection: 120
Connection Lifetime: 600
Cleanup Thread: 300

After reading checking the link you provided and Reagan's recommendations we did the following:
Initial Connections: 1
Maximum Connections: 400
Maximum Time to Wait for Connection: 120
Connection Lifetime: 450
Cleanup Thread: 150

•Is this issue caused by the process limit or by session limit (as one "connection" can create several sessions)?
The issue is caused by process limit.

•Are all processes related to an active connection (from the connection pool) or are there dead ones as well, that are not cleaned up? (can be checked with lsof for example)
As far as I can figured it out, all connections seem to be active.
I executed the following command lsof -i@hostname (replacing the word hostname with the actual hostname) and all resulted connections had status "ESTABLISHED". Also, the number of resulted connections is close to the one with "LOCAL=NO". For example lsof -i@hostname had 111 results, while ps -ef | grep LOCAL had 99.

Do you have Wily Introscope running?

Yes, we have Wily Introscope, but we don't have access to it (only our customer, as the outsource of the sap basis part is to us).

I will keep investigating.

Have a beautiful day,

Delia

stefan_koehler
Active Contributor
0 Kudos

Hi Delia,

> In the listener trace file we encounter the same error over and over again:

> ** DBGRL Error: SLERC_OERC, 48180

> ** DBGRL Error: Linux-x86_64 Error: 13: Permission denied

Do you use the same listener name (and ADR directory) for every listener on the same host? Regarding the provided output of "lsnrctl status" it seems to be a general folder like "/oracle/diag/tnslsnr/<hostname>/listener". This could be a possible and valid reason (just use a different ADR_BASE for each listener).


Oracle Documentation: http://docs.oracle.com/cd/E11882_01/network.112/e10835/listener.htm#BGBHDGFI

> After reading checking the link you provided and Reagan's recommendations we did the following

Ok great, but it seems like you have missed the option (Expiration), that enables that limits.

> I executed the following command lsof -i@hostname (replacing the word hostname with the actual hostname) and all resulted connections had status "ESTABLISHED".

I am sorry, but maybe is was a little bit too imprecise with my previous response about lsof. I mean that you should check each connection between dedicated shadow process and corresponding J2EE pool connection. (in your case there should be usually one connection for each oracle shadow process on port 1527 and the corresponding client port). You should miss the opposite port, if the connection is broken, has been disappeared or whatever.

> Yes, we have Wily Introscope, but we don't have access to it

Then you should request access immediately or investigate this with your customer. As you previously mentioned, that the system hangs in regular interval it could be thousand reasons. Some common ones are full application threads (by blocking, looping, freezing or waiting on something). I have investigated such hangings very deeply in the past (due to a nasty bug in the oracle client which causes such behavior), but you will need access to it.

Regards

Stefan

0 Kudos

Hi Stefan,

> Ok great, but it seems like you have missed the option (Expiration), that enables that limits
The option was enabled, I forgot to mention it.

Meanwhile, we did an Oracle client upgrade from 10.2.0.4.0 V5 to 11.2.0.2.0 V1 (it took a while because we did it first on dev, and last on productive). But the number of connections keep increasing right after the restart.

We still have no access to Wily introscope, but we will work on it and keep investigating and I will post any update here.

Thank you for the advices so far

Have a great day,
Delia

0 Kudos

Thank you Jim and Mark for the quick replys

@Jim:

listener.ora:

ADMIN_RESTRICTIONS_LISTENER = on
LISTENER =
  (ADDRESS_LIST =
        (ADDRESS =
          (PROTOCOL = IPC)
          (KEY = GRP.WORLD)
        )
        (ADDRESS=
          (PROTOCOL = IPC)
          (KEY = GRP)
        )
        (ADDRESS =
          (COMMUNITY = SAP.WORLD)
          (PROTOCOL = TCP)
          (HOST = <hostname>)
          (PORT = 1527)
        )
  )
STARTUP_WAIT_TIME_LISTENER = 0
CONNECT_TIMEOUT_LISTENER = 10
TRACE_LEVEL_LISTENER = OFF
SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = GRP)
      (ORACLE_HOME = /oracle/GRP/112_64)
    )

  )


sqlnet.ora:

AUTOMATIC_IPC = ON
TRACE_LEVEL_CLIENT = OFF
NAMES.DEFAULT_DOMAIN = WORLD
TCP.INVITED_NODES = <IP>
SQLNET.EXPIRE_TIME = 10
DEFAULT_SDU_SIZE=32768

lsnrctl status:

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 07-MAY-2013 14:55:07

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=IPC)(KEY=GRP.WORLD))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                23-APR-2013 15:51:30
Uptime                    13 days 23 hr. 3 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/GRP/112_64/network/admin/listener.ora
Listener Log File         /oracle/diag/tnslsnr/<hostname>/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=GRP.WORLD)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=GRP)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=<IP>)(PORT=1527)))
Services Summary...
Service "GRP" has 2 instance(s).
  Instance "GRP", status UNKNOWN, has 1 handler(s) for this service...
  Instance "GRP", status READY, has 1 handler(s) for this service...
The command completed successfully

I replaced the actual hostname and IP of the server with <hostname> and <IP>.

@Mark: I forgot to mention that the system is a stand-alone Java system. I cannot use SMGW

Have a great day,

Delia

Former Member
0 Kudos

Do you see errors in the system log like "Maximum number of xxx connectable SAP gateways reached" ?

If so go to trnsaction SMGW - goto - remote gateways. If it has reached your maximum setting delete the older ones.

To fix this apply the following parameters:-

gw/close_routes 60
gw/so_keepalive 1

BR

Mark

JimSpath
Active Contributor
0 Kudos

What is in listener.ora and sqlnet.ora?  Can you post the output of "lsnrctl status"?