Background jobs getting stuck in Ready Status

liamclark1 · ‎11-20-2014

Hi,

We are currently experiencing an issue with certain background jobs being scheduled/triggered but then getting stuck in ready status and not processing when it gets to their start time.

When they get stuck in ready status we can see in sm37 that the executing server is always the application server.

Our central server is a unix system and our application server(s) are windows both running on the oracle db hosted on the central server.

The jobs that are getting stuck are both technical (RDDIMPDP, SAP_COLLECTOR_PERFMON etc) and user triggered jobs.

I've noticed that sometimes we receive messages stating 'unable to connect to oracle' which im assuming is related as these seem to have picked up with the increase in stuck jobs.

and in work process logs these messages keep appearring:

dblink[db_reconnect]: { new_reconnect_message=1
dbcon[db_con_reconnect]: { reco_trials=3, reco_sleep_time=5
00: name=R/3, con_id=000000000, state=INACTIVE    , tx=NO , bc=NO , hc=NO , perm=YES, reco=NO , info=NO ,
     timeout=000, con_max=255, con_opt=255, occ=NO , prog=
dbcon[db_con_reconnect]: } rc=0
***LOG BV4=> reconnect state is set for the work process [dblink       1999]
***LOG BYY=> work process left reconnect status [dblink       2000]
dblink[db_reconnect]: } rc=0
ThHdlReconnect: reconnect o.k.

does anyone have an idea of where to look to resolve this?

We have looked at sap note 1902517 and tried various values for parameters SQLNET.SEND_TIMEOUT & SQLNET.RECV_TIMEOUT but they have not resolved this. Since upgrading our kernel and support packs on the system it seems to have worsened in our dev environment too (where these parameters were changed).

The only way to get round this is to go in sm37 and start the job off manually but it is too much time to do this for every stuck job.

Let me know if you want more information and thanks in advance

Liam

ken_halvorsen2 · ‎12-09-2014

Hi Liam

I've just experienced the same issue in our production system today, although we are using SQL Server. I noticed that all of the batch jobs in the Ready status could not be changed, rescheduled or cancelled. They didn't even have a work process that I could go to the server and kill the work process either.

There were available background processes on all of the 4 App servers so there were no wp contention.

What I did was cut all of the Background work processes in RZ04 CCMS: Maintain Work Process Distribution. After activating the specific "Mode", I was able to SM37, select the background job and run the "Job -> Check Status" which allowed me to start the process and cancel it.

I then added the background processes back into the mode and was able to run background jobs on that server again.

For every Batch job that was in that Ready state, there were Event log entries stating a Database error: Temse for table TST01 key (or similar), and failure to read status and Faile dto enter message 00& (application area 55) in job log.

My theory is that there was a communication issue between this app server and the msg server. Thankfully I didn't have to reboot the Production servers during the Business day but are planning to reboot on the weekend in the maintenance period.

Co-incidentally there was MS Server patches deployed the previous weekend.

Hopes this helps

Ken

former_member185239 · ‎11-24-2014

Hi Liam,

When your jobs are getting hung then run the transaction SARFC and paste the output .

If it is shows that few resources are available then you need to tune the sap parameters.

With Regards

Ashutosh Chaturvedi

ACE-SAP · ‎11-20-2014

Hi

Do you have errors/alerts in the DB system log ?

You could check the maximum number of process used on your DB and verify if you get close to the limit you have defined with instance parameter 'process'.

select RESOURCE_NAME, INITIAL_ALLOCATION, MAX_UTILIZATION from v$resource_limit where RESOURCE_NAME in ('processes', 'sessions');

If so you could enhance the number of process (and the related session parameter, session = 2 x process ).

alter system set "PROCESSES" = 150 scope = spfile sid='*';

alter system set "SESSIONS" = 300 scope = spfile sid='*';

I'm do not think that internal jobs could consume that much of process to there are no more available to serve SAP WP.

I'm not sure either that this error could come from an exhaustion if the Oracle process.

You could check parameter rsdb/reco_add_error_codes to verify that some Oracle errors are not trapped as DB disconnect problems (24806 - Database Reconnect: technical details and settings)

Regards

1431798 - Oracle 11.2.0: Database Parameter Settings

process = #ABAP work processes * 2 + #J2EE server processes * <max-connections> + PARALLEL_MAX_SERVERS + 40

Former Member · ‎11-20-2014

Hi,

This issue seems to be because of your Oracle Processes are Full.

To resolve this check if Oracle Internal Maintenance Jobs are active and disable the same using SAP note #974781 - Oracle internal maintenance jobs.

Also check SAP note #1898521 - "Process list grows after ORA-01017; ORA-00020 " .

Hope this resolves your issue.

Regards

Bhupesh A

Background jobs getting stuck in Ready Status

Accepted Solutions (0)

Answers (4)

Answers (4)

CEWB Updating Materials Replacement materials

Re: Can't we multiply decimal numbers in SAP ABAP?

Re: MDG BAdI just after updating Material related ...

Re: FedML SAP Datasphere & Databricks

Correlation id as part of Application logs