cancel
Showing results for 
Search instead for 
Did you mean: 

IDM 7.2 not provisioning, no errors in any logs after SAP update

Former Member
0 Kudos

Currently we are running Version 7.20.8 2013-06-21  720_VAL_REL

About two weeks ago our system stopped provisioning users over a holiday break and it went unchecked allowing the provisioning queue to build to about 3400 entries.   I have been tasked to resolve the issue that has our production environment at a stand still with no real help coming from a High Severity SAP message for the last 5 days now.

I have performed the following steps before SAP was contacted.

stopped dispatchers and restarted and got the following errors on the dispatcher log:

exception: The Server Failed to resume the transaction. Desc:40000000001c

SQL Exception.  The server failed to secure the transaction Desc.4000000001c

In the job/system log we would get a few notifications that tasks were complete either good or with warnings that went through either just before or after the Dispatcher error came through.  

Then nothing would provision and the semaphore table would have a lock on it from the dispatcher that would just sit in sleep mode and never release.    We could stop and restart the dispatcher and get a couple of jobs to run with the same results.

Secondary to this the other dispatchers would now not run at all.

when the "select * from MXP_Provision" query is run from SQL management studio our queue has 3400+ entires with 17 in a failed status waiting on another job that failed.

Next reassociated the correct .jar files to the dispatchers and regenerated scripts this had no effect same issues as stated above.

reinstalled Dispatcher services, this had no effect same issues as above but now no furuther jobs were coming through.   queue no longer shrinking, just jobs sitting in failed status, for other jobs or sitting in pending/sleeping/waiting status

Contacted SAP confirmed we are not running in cluster mode for SQL confirmed our two dispatchers (one for provisioning one for housekeeping) and provided SQL SQL SEMAPHORE lock table data confirming that no other process is running except for the locked semaphore in a sleeping status.  (waiting on system information to release...we are presuming that this lock is causing other jobs to fail and other dispatchers to no start)

SAP sent us a fix in the following note reply:

----------------------------------------------------------------------------------------------------------

SAP has a script that extracts important information from SQL
Server
called hangman that can be run in situations like this,

https://service.sap.com/sap/support/notes/948633

But I see Christopher Leonard and others have already requested
the maininformation this gives, so more importantly, lets try to resolve the
issue.

The problem I already mentioned that we suspect this to be is
that a
table or lock remains on the MC_SEMAPHORE table after commit is
issued
and that remains there until the session (dispatcher process) is

a terminated. We've created a replacement set of semaphore
handling

  procedures that attempt to work around and solve this issue.
These have
been attached, and can be installed in your environment.

The new version is in the ProcAndTableUpdates.sql file.

This is run as the _OPER account, and you need to do a
search/replace ofmxmc_ with <prefix>_ if your instance was not created
using the default
mxmc name.

You can revert back to the original SP8 procedures by installing
the
contents of "set procs original.sql" if there's any
other issues. 
If you prefer to have us online with you during this process
please let
us know and we'll set up a conference call session with desktop
sharing.

Best regards

Per Christian Krabsetsve

SAP Labs Norway

-------------------------------------------------------------------------------------

applied the fix and are now getting the following error once in dispatcher log, and no other job will run.

"interrupted due to invalid semiphore"

received the following error in the system log for about 3 hours and then no other errors after that:

mc_trans_commit_all implicit transaction set off sema:444:1 SPID:83

Using the audit trail in SQL i have identified the failed provisioning tasks and have tried to run them directly to at least see if I get a new error message so that I can identify if there is a config setting in the task/system that is causing failure but when I click the "run now" button it shows the last used time of when I press it, but now no errors are logged in at all. (dispatcher, system, job) and none of the jobs show a running status ever in the status view.    it's as if the system does not ever recognize any job/task run.

I did get one final system log of:    "1 stale semaphores released"    and since then nothing else will run.  (4 hours now)

Any help would be appriciated.

At this point the High Sev ticket has been with SAP Support for 5 days, I am not seeing anything on my searches of SCN that seem to be relevant to this release and patch level to try and trouble shoot what may be causing our dispatchers to consistantly fail when they were seeing jobs and now not run anything at all even with the provision queue showing 3400+ items in a pending status.

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos

@Matt,

       Chris was able to provide us with a cleaning job with the specific insturctions to only run this when the queue is completely clean and we run into a deadlock issue when the job won't rerun to complete..as was the case with this job.

The semaphore error we were getting we feel was directly related to this 2 month old job that was in failed status and locking up the semaphore table.

I am running a Privilege synch job now for our PRD ECC system and will start to build a couple of new Business Roles and add rights in preperation for a new set up assignmnets out of HCM so that we can test how the system is functioning since the changes.

I am currently in the process of putting together a SQP scirpt document with what we have gone through and may per your offer send you that document for comments and also if possible additions that you may feel would benefit us as well from a maintenance standpoint.  

Kris appreciates the positive feedback you guys gave him..FYI!

former_member2987
Active Contributor
0 Kudos

Michael,

That's great news.  Yeah, Chris is very good.  To think we shared an office once!

Regards,

Matt

0 Kudos

Hi Michael,

I'm not sure if this will make you feel any better but the SAP guy working on your message is excellent, so you have the right man for the job looking at it.

Good luck.

Ian

former_member2987
Active Contributor
0 Kudos

Indeed Chris is one of the best IDM database guys there is. 

former_member2987
Active Contributor
0 Kudos

Michael,

What do your DBAs say? Have they looked at the database?

This will help SAP troubleshoot.

Matt

Former Member
0 Kudos

@Matt -   The DBA's were seeing that the semi-lock file was going into a sleep status mode waiting on another job to contact it back.

On a side note SAP finally did back in touch with us and it looks like they have seen this problem in environments in clustered mode but this is the first time it has happened in a non-clustered mode environment.

the provisioning task is dead-locking waiting for a currupted log ID which is causing the failure.

That failure is just stuck in the queue now and I am not sure how to get it to restart.   I have gone to the parent task and tried to restart but it won't and when I do a query on any tasks in pending status the queue is empty from that respect:

select t.taskname,COUNT(p.mskey)  from MXP_Provision P, mxp_tasks T where t.TaskID = p.ActionID and P.state=2 by t.taskname

but in the actual provisioning queue it shows one user is still sitting in a "waiting" status with about 12 different linked tasks all on hold until the failed one completes.   This failed one is causing our lock file to go into a sleep status and failing the dispatcher.

I have validated the ID in IDM and it looks like it provisioned the 105 record in HCM correctly and since it is technically not an SAP user all of the other tasks should have triggered the "no action required" step but it didn't.

What I would like to do it just clear out this job complete from the queue and to just retry that specific ID's MSKEY value to see if IDM picks it up for provisioning again but I have not been able to find an MS Query to just retry a single ID's MSKEY value.

If you have one I would appreciate it.

On a side note:

If also seems SAP identified that we had an infinite loop on a provision task that was holding up a large majority of our other tasks which we corrected and that cleared out all but the above issue.

0 Kudos

@michael: you really need to update to patch3 of sp8 - we had this issue and, yes, it was in a clustered environment, however the fixes in patch 3 are not only for cluster. At least the fixes produce some more logging, but do not lock the db/dispatcher anymore.

It's the use of sp_getapplock inside some procedures around the mc_semaphore-table that seems to be the reason for these locks up to patch2 of sp8.

best regards,

Tobias

see SAP Note 1946824 (Dispatcher freezes/stops processing tasks under heavy load)

former_member2987
Active Contributor
0 Kudos

Well SAP isn't always crazy about cleaning the provisioning queues as it's not recommended as a best practice.

If you PM me (email's in my profile) I can give you some pointers about it.

Regards,

Matt

Former Member
0 Kudos

This is our reply to request for more information from Kris on the note you referenced.

The note you refer to is from
one of the other customers where we saw this issue (on a cluster), and they
received a similar fix to the one I sent in the CSS ticket. They were happy
with the workaround that they received and has so far not spent much time with
us trying to resolve the actual issue. While the workaround seems to work fine
in many installations we have one where it did not improve it so we’re still
not happy, and I believe the note itself is not released because of that. What
you received is more advanced in its error-reporting than what is present in
sp8p3 so for note 1946824 alone there is no reason to upgrade.

0 Kudos

maybe update to the latest patch (patch 3, released in December) will help (especially some updated SPs).

We had a very long threaded OSS-Message with SAP related to this issue, and patch 3 includes some workarounds for it.

Best regards,

Tobias