on 04-08-2013 7:07 AM
Hello,
I regularly have the following error with an recently installed Oracle 11.2.0.3.
---
Process W000 died, see its trace file
Sun Apr 07 00:48:23 2013
***********************************************************************
Fatal NI connect error 12537, connecting to:
(LOCAL=NO)
VERSION INFORMATION:
TNS for Solaris: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.2.0.3.0 - Pr
oduction
TCP/IP NT Protocol Adapter for Solaris: Version 11.2.0.3.0 - Production
Time: 07-APR-2013 00:48:23
Tracing not turned on.
Tns error struct:
ns main err code: 12537
TNS-12537: TNS:connection closed
ns secondary err code: 12560
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Sun Apr 07 00:49:29 2013
opiodr aborting process unknown ospid (17179) as a result of ORA-609
Sun Apr 07 01:01:14 2013
Errors in file /oracle/xxx/saptrace/diag/rdbms/xxx/xxx/trace/xxx_smco_7937.trc (incident=85849):
ORA-00445: background process "W000" did not start after 120 seconds
Sun Apr 07 01:03:30 2013
Incident details in: /oracle/xxx/saptrace/diag/rdbms/xxx/xxx/incident/incdir_85849/xxx_smco_7937_i85849.trc
Sun Apr 07 01:17:37 2013
Process 0x480433ce8 appears to be hung while dumping
Current time = 938816049, process death time = 938749879 interval = 60000
Attempting to kill process 0x480433ce8 with OS pid = 7937
OSD kill succeeded for process 480433ce8
Sun Apr 07 01:18:22 2013
Restarting dead background process SMCO
Sun Apr 07 01:18:38 2013
SMCO started with pid=30, OS id=26990
---
Would you know how to solve this problem?
Thanks in advance for your answer.
Hello,
The problem was related to blocking write issues at the Solaris Level.
After open a Oracle Solaris SR we have changed the following kernel parameter:
Regards,
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Benoit,
Please check how big memory has been used by ZFS filesystem. Please run this command
mdb -k (detail in http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html).
Your memory could be occupied by ZFS filesystem cache.
Regards,
Achmad
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello,
Achmad Dimyati wrote:
Your memory could be occupied by ZFS filesystem cache.
Memory consumption looks low:
mdb -k
Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt px crypto mac ip hook neti arp kssl qlc fctl sockfs random emlxs fcp ssd idm cpc fcip logindmux ptm sppp ufs lofs nfs ipc mpt_sas pmcs ]
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 1080217 8439 13%
ZFS File Data 871387 6807 11%
Anon 1462200 11423 18%
Exec and libs 44207 345 1%
Page cache 70407 550 1%
Free (cachelist) 50076 391 1%
Free (freelist) 4656721 36380 57%
Total 8235215 64337
Physical 8213541 64168
> $q
Regards,
Hi Benoit,
I had the same issue but in different environment which is AIX and NAS. It's turn out the problem on cache configuration in NAS. So I think the issue is related with OS and ZFS configuration.
Could you review these scn and notes.
http://scn.sap.com/thread/752504
notes:
Note 1343733 - Support for SAP on ZFS
Note 832871 - SAP relevant patches for Oracle Solaris 10 on SPARC
Is this the only system with ZFS and Oracle? if not do you have the same issue on another system?
Regards,
Achmad
Hello Achmad,
Achmad Dimyati wrote:
Is this the only system with ZFS and Oracle? if not do you have the same issue on another system?
No we have all our SAP systems on ZFS.
I do not have the issue on the other systems running Oracle 11.2.0.2 instead of 11.2.0.3
and Solaris 10 instead of Solaris 11.
Regards,
Hi Benoit,
Is this test enviroment? As oracle 11.2.0.2 and Solaris 10 are OK. You can try Solaris 10 and Oracle 11.2.0.3 just to eliminate one of the components. The issue with Oracle 11.2.0.3 or with OS/ZFS.
In my case the issue also very obvious along the dbv because it's heavy disk access.
I came across article about bug ZFS with L2ARC bug, panic loop (http://simplex.swordsaint.net/?p=199). Perhaps it's good to run the test in that article.
Best Regards,
Achmad
Achmad Dimyati wrote:
Is this test enviroment? As oracle 11.2.0.2 and Solaris 10 are OK. You can try Solaris 10 and Oracle 11.2.0.3 just to eliminate one of the components. The issue with Oracle 11.2.0.3 or with OS/ZFS.
In my case the issue also very obvious along the dbv because it's heavy disk access.
This is not a test environment it is our new SolMan 7.1.
It is going to be tough to test it on Solaris 10.
Why do you say that in your case the issue is very obvious?
Regards,
Hi Benoit,
In my previous environment, we have installed fresh ECC under AIX with Oracle 10g (NAS Storage). Since the beginning we have notice the slowness along the R3load of installation and accessing the system. I have checked the CPU and memory, both are fine. IOSTAT is normal, there is nothing special except it's slow.
But when we run dbv, the CPU was straight away hit 100% with High IO Wait and after awhile the database crash (around 15-20 minutes). So I complaint to vendor.
I ask them to check disk subsystem. They have put additional patch into OS. dbv is still slow but no more crash. So they invetigate further more. They found out one of the cache (built-in) in the NAS storage hasn't installed. After installation of the cache, the dbv only took 30 minutes to finish.
So, in my case dbv has made the issue very obvious, but the symptoms actually are there (slow, IO Wait, high CPU).
Best Regards,
Achmad
Hello Benoit,
What value do you have in the environment variables and listener.ora for ORACLE_HOME?
Regards,
Eduardo Rezende
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Eduardo Rezende wrote:
Hello Benoit,
What value do you have in the environment variables and listener.ora for ORACLE_HOME?
Regards,
Eduardo Rezende
1. /oracle/MYSID/112_64 in SID_LIST_LISTENER subsection in listener.ora
2. not defined in LISTENER subsection in listener.ora
3. /oracle/MYSID/112_64 in env of sidadm
What are you thinking about?
Thanks in advance for your answer.
We observed some issues with customer that upgraded from Oracle 11.2.0.2 to 11.2.0.3 and had different values in the environment and listener.ora. Even though the alias was points to the correct location.
e.g.:
/oracle/MYSID/112_64 pointing to /oracle/MYSID/11203
env: /oracle/MYSID/11203
listener.ora: /oracle/MYSID/112_64
How much CPU, RAM, swap you have allocated?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Benoit
Have you tried the following solutions ?
http://vivekbhutani.wordpress.com/2012/01/30/ora-609-and-tns-12547-in-11g-alert-log/
I have also attached the oracle note suggested in the link provided by Deepak
Best Regards
Marius
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello,
The error occurs during online db verify.
Would you have any suggestion on how I could avoid this problem?
Thanks in advance for your answers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Benoît,
an ORA-00445 combined with a TNS-12537 error can have 4 possible reasons (and none of them are really Oracle database related):
I would consider point 1, 2 or 3 as you already mentioned, that this issue only occurs on a dbverify action (which forces a lot of additional I/O of course).
You need to investigate this issue on OS level in more detail (when you run the dbverify). However you can extend the default timeout of 120 seconds to avoid the Oracle error / abort.
shell> sqlplus / as sysdba
SQL> alter system set events '10281 trace name context forever, level <X>';
---- <X> is the number of seconds to timeout
This event can be set dynamically.
Regards
Stefan
Susan Eck wrote:
Hello,
Did you ever resolve your issue? I am having a similar problem.
Thanks,
Susan
Hello,
Yes, I still have the issue when I run db verify.
Sap Support has ask me to change to rman for dbverify but it does not help.
I am still waiting for another solution from them.
On what OS does your DB runs?
Regards,
Hello Stefan,
Stefan Koehler wrote:
You need to investigate this issue on OS level in more detail (when you run the dbverify). However you can extend the default timeout of 120 seconds to avoid the Oracle error / abort.
On OS, I see with iostat that the disk is busy 100%, but the asvc time are still correct.
Basically rman or non rman dbverify use IO.
But it has always been like that and I can still work on the machine.
1. Why would W000 could not perform in 120 seconds?
2. Do you know what W000 is supposed to perform during 120 seconds?
3. Are there any risk or side effect of increasing settings the following?
events '10281 trace name context forever, level 600'
4. Is it possible to restrict the io usage provided to rman?
Thanks in advance for your answers.
Hi Benoît,
at first this kind of issue can occur with any other database process too, that is spawned by Oracle (like M000 or PZ99).
> 1. Why would W000 could not perform in 120 seconds?
It can not be spawned within 120 seconds, that is the problem (not really the work that has to be performed).
> 2. Do you know what W000 is supposed to perform during 120 seconds?
Oracle Documentation: http://docs.oracle.com/cd/E11882_01/server.112/e25513/bgprocesses.htm#REFRN104
Wnnn processes are slave processes dynamically spawned by SMCO to perform space management tasks in the background. These tasks include preallocating space into locally managed tablespace and SecureFiles segments based on space usage growth analysis, and reclaiming space from dropped segments. At most 10 Wnnn slaves can run on one database instance. After being started, the slave acts as an autonomous agent. After it finishes task execution, it automatically picks up another task from the queue. The process terminates itself after being idle for a long time.
You will see exactly this behavior (by SMCO and W000), if you check the posted errors closely.
> 3. Are there any risk or side effect of increasing settings the following?
No. Set this event in addition to all the other events (which are recommended by SAP). It only provides more time to spawn the process.
> 4. Is it possible to restrict the io usage provided to rman?
Yes, there is an RMAN option called RATE.
Oracle Documentation: http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmtunin.htm#BRADV90070
This option (rman_rate) is implemented in BR*Tools as well, if you are not using native RMAN.
SAP Documentation: http://help.sap.com/saphelp_nwpi71/helpdata/DE/46/afaf3a26a60319e10000000a11466f/content.htm
> On OS, I see with iostat that the disk is busy 100%, but the asvc time are still correct. But it has always been like that and I can still work on the machine.
Ok, but i meant deep investigations on OS level and not just running an I/O monitor or so. I talked about truss and dtrace (as you are running on Solaris luckily) on both processes.
Regards
Stefan
Hello Stephan,
Stefan Koehler wrote:
Ok, but i meant deep investigations on OS level and not just running an I/O monitor or so. I talked about truss and dtrace (as you are running on Solaris luckily) on both processes.
I can reproduced the problem.
Which truss on which process should I perform to diagnose this problem?
Why do you say that I am lucky have Solaris? 🙂
Thanks in advance for your answer.
Hi Benoît,
> Which truss on which process should I perform to diagnose this problem?
Run truss on the PID of SMCO and maybe possible PID of W000 (depends on where the spawn process gets stuck). Start truss before you reproduce the issue.
> Why do you say that I am lucky have Solaris? 🙂
Because of DTrace is available on Solaris only (in full scope). DTrace has been ported to OEL (Oracle Enterprise Linux) as well, but the functionality is/was not the same as on Solaris.
You can use DTrace to trace the application code (like functions, call stack and so on), if you can not find any root cause with truss.
Regards
Stefan
Update: I discovered that Solaris quarterly security patches had been installed recently. The SA backed those out and we are no longer getting the errors during the RMAN backups.
When we were experiencing the problem, we noticed that the server became very sluggish while the backups were running. Simple commands like ls, cd and ps could take 30 seconds or more. Something related to the patches was causing contention and thus causing the TNS errors (12535, 12537), the ORA-00445, and other intermittent "timeout" errors that we were getting (ORA-00494 and ORA-12751).
We don't yet know why the patches caused the issue but at least we have pinpointed the source.
Good luck in resolving your issue!
Hello,
Susan Eck wrote:
Update: I discovered that Solaris quarterly security patches had been installed recently. The SA backed those out and we are no longer getting the errors during the RMAN backups.
Could you please provide me the patch reference with which you had pb and the references
of the ones that solved this problem?
Is it Solaris 10 or 11?
Thanks in advance for your answers.
Hello Stefan,
Stefan Koehler wrote:
Run truss on the PID of SMCO and maybe possible PID of W000 (depends on where the spawn process gets stuck). Start truss before you reproduce the issue.
When I run rman dbv, SMCO and W000 are not started.
The truss shows:
SMCO:
---
27228: 0.0000 semtimedop(1526726746, 0xFFFFFFFF7FFFD284, 1, 0xFFFFFFFF7FFFD270) Err#11 EAGAIN
27228: 0.0000 times(0xFFFFFFFF7FFFD6E0) = 1279307926
27228: 0.0000 times(0xFFFFFFFF7FFFD6E0) = 1279307926
27228: semtimedop(1526726746, 0xFFFFFFFF7FFFD284, 1, 0xFFFFFFFF7FFFD270) (sleeping...)
27228: 0.0000 semtimedop(1526726746, 0xFFFFFFFF7FFFD284, 1, 0xFFFFFFFF7FFFD270) Err#11 EAGAIN
27228: 0.0000 times(0xFFFFFFFF7FFFD6E0) = 1279308226
27228: 0.0000 times(0xFFFFFFFF7FFFD6E0) = 1279308226
27228: semtimedop(1526726746, 0xFFFFFFFF7FFFD284, 1, 0xFFFFFFFF7FFFD270) (sleeping...)
---
W000
---
27230: semtimedop(1526726746, 0xFFFFFFFF7FFFC814, 1, 0xFFFFFFFF7FFFC800) (sleeping...)
27230: 0.0000 semtimedop(1526726746, 0xFFFFFFFF7FFFC814, 1, 0xFFFFFFFF7FFFC800) Err#11 EAGAIN
27230: 0.0000 times(0xFFFFFFFF7FFFCD50) = 1279312003
27230: 0.0000 times(0xFFFFFFFF7FFFCD50) = 1279312003
27230: semtimedop(1526726746, 0xFFFFFFFF7FFFC814, 1, 0xFFFFFFFF7FFFC800) (sleeping...)
---
Is it right to say that there seems to be a deadlock with an operation on a ipc semaphore?
Thanks in advance for your help
Hi Benoît,
thanks for the truss output.
There are several known OS (Solaris) bugs regarding semtimedop with "Err#11 EAGAIN". Unfortunately we need the corresponding function calls (call stack) to locate the specific bug.
Please provide the function trace with DTrace or oradebug (i guess oradebug / short_stack should be enough in this case as well).
By the way .. are you running Oracle on a ZFS file system?
Regards
Stefan
Hello Stefan,
There are several known OS (Solaris) bugs regarding semtimedop with "Err#11 EAGAIN". Unfortunately we need the corresponding function calls (call stack) to locate the specific bug.
Please provide the function trace with DTrace or oradebug (i guess oradebug / short_stack should be enough in this case as well).
By the way .. are you running Oracle on a ZFS file system?
Yes, we are using zfs.
BTW, I have checked Err#11 is not an error. it is when the semtimedop reached its defined timeout value.
Could you please provide me the exact dtrace options that I should use?
Could you please tell me what I should write in sqlplus to get the oradebug?
Thanks in advance for your answers.
Hi Benoît,
> Could you please tell me what I should write in sqlplus to get the oradebug?
Please check out my blog (Point "3. Using stack traces or system call / signal traces"): http://scn.sap.com/community/oracle/blog/2013/01/10/oracle-advanced-performance-troubleshooting-with...
Please attach to both processes (SMCO and W000) and perform several short_stack dumps while the process is hung.
> Yes, we are using zfs.
What is the value of "zfs_arc_max"? Try 8 G, if it is set to 16 G. This is just a "blind guess" as we do not have all the needed information.
Solaris Documentation: http://docs.oracle.com/cd/E19253-01/817-0404/gjhec/index.html
Regards
Stefan
Hello,
Stefan Koehler wrote:
What is the value of "zfs_arc_max"? Try 8 G, if it is set to 16 G. This is just a "blind guess" as we do not have all the needed information.
zfs_arc_max is set to 25 GB on a host wth 64 GB.
My Solaris colleagues fine tune this by checking the used memory on the system
as it is described in http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
Why have you suggested 8 GB?
I will do the traces and post them.
Regards,
Hello again Stephan,
Here is the information you requested:
SQL> oradebug setospid 27917
Oracle pid: 28, Unix process pid: 27917, image: oracle@xxx (SMCO)
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212<-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212<-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212<-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212<-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ksarcv()+212<-ksbabs()+764<-ksbrdp()+1616<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug setospid 575
Oracle pid: 35, Unix process pid: 575, image: oracle@xxx (W000)
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ktsj_slave_main()+976<-ksvrdp()+1872<-opirip()+1972<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ktsj_slave_main()+976<-ksvrdp()+1872<-opirip()+1972<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ktsj_slave_main()+976<-ksvrdp()+1872<-opirip()+1972<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ktsj_slave_main()+976<-ksvrdp()+1872<-opirip()+1972<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
SQL> oradebug short_stack
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3524<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<-sskgpwwait()+236<-ksliwat()+1752<-kslwait()+240<-ktsj_slave_main()+976<-ksvrdp()+1872<-opirip()+1972<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
Regards,
Hi Benoît,
because of there are several known issues (slow startup of database or processes) when setting this value too large. Unfortunately ZFS behaves differently in each environment (depending on ZFS pools, ZFS pool and I/O configuration, database configuration and so on).
I was involved in such ZFS performance evaluations for Oracle databases in the last time and it was a mess (just check out my latest blog about ZFS and async / sync I/O for Oracle databases).
Regards
Stefan
Hi Benoît,
thanks for the short stack traces. It really looks like the two processes are blocked / hung.
Can you please do a system state dump and attach this dump here (make sure that you do not limit the trace file size)?
Check my blog for details about creating a system state dump:
Regards
Stefan
Hello,
Stefan Koehler wrote:
Can you please do a system state dump and attach this dump here (make sure that you do not limit the trace file size)?
- Recreate situation for SMCO / W000
- Execute system state dump with level 266 while the problem persists
I have performed the following:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> host date
Wednesday, May 22, 2013 08:39:34 AM MEST
SQL> oradebug dump systemstate 266
Statement processed.
SQL> host date
Wednesday, May 22, 2013 08:40:00 AM MEST
Could you please provide the filename you are interested in?
-rw-r----- 1 orasid dba | 502K May 22 08:39 SID_ora_20464.trm |
-rw-r----- 1 orasid dba | 2.0M May 22 08:39 SID_ora_20464.trc |
-rw-r----- 1 orasid dba | 23K May 22 08:45 alert_SID.log |
-rw-r----- 1 orasid dba | 173 May 22 08:46 SID_lgwr_19803.trm |
-rw-r----- 1 orasid dba | 1.6K May 22 08:46 SID_lgwr_19803.trc |
-rw-r----- 1 orasid dba | 84 May 22 08:47 SID_mmon_19813.trm |
-rw-r----- 1 orasid dba | 1.1K May 22 08:47 SID_mmon_19813.trc |
Thanks in advance for your answer.
Hi Benoît,
sorry for the delay. I have cross-checked the call stack and system state dump with an Oracle 11.2.0.3 database on Solaris x86 right now.
The call stack (function order) looks nearly the same, but the two processes are missing in your system state dump. I can imagine two possible reasons for that:
It is pretty hard to troubleshoot this issue without access to your system.
Regards
Stefan
Hello,
Stefan Koehler wrote:
The call stack (function order) looks nearly the same, but the two processes are missing in your system state dump. I can imagine two possible reasons for that:
- Both processes are not attached to SGA at all
- Both Processes already died when you performed the system state dump (the runtime of the system state dump was quite large on my system)
I am surprised by your answer.
From what I see the process did not die and seems to be attached to all shm that belongs the
oracle database:
# ipcs -m | gawk '/^m / {gsub("^m +[0-9]+",sprintf("m %12#x",$2))} {print}' | grep -i orasid
m 0x520000a1 0x6ab0fb88 --rw-r----- orasid dba
m 0x520000a0 0x0 --rw-r----- orasid dba
m 0x5200009f 0x0 --rw-r----- orasid dba
# pmap 21524 | grep -i shm
0000000380000000 32768K rwxsR [ ism shmid=0x5200009f ]
00000003C0000000 3178496K rwxsR [ ism shmid=0x520000a0 ]
00000004C0000000 16K rwxsR [ ism shmid=0x520000a1 ]
Regards,
Hi Benoit,
A Similar issue is addressed here
https://cn.forums.oracle.com/forums/thread.jspa?threadID=2378159
Additionally please check the following configuration
f Async I/O is used, Oracle may hang with "startup nomount". The following error message is displayed after two minutes:
ORA-00445: background process "PMON" did not start after 120 seconds
This problem is due to a missing MLOCK privilege for the dba group and the oper group. First use "getprivgrp dba" and "getprivgrp oper" to check whether this privilege exists. If it does not exist, you can assign it as a root either temporarily by using "setprivgrp dba MLOCK"/"setprivgrp oper MLOCK" or permanently by creating the /etc/privgroup file with the following contents:
dba MLOCK
oper MLOCK
If above solution is not suitable please share the following logs /oracle/xxx/saptrace/diag/rdbms/xxx/xxx/trace/xxx_smco_7937.trc
Regards,
Deepak Kori
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Benoit,
The thread talks about adjusting the /etc/resolv.conf file with relevant entries.
Refer example thread : https://forums.oracle.com/forums/thread.jspa?threadID=2373696
Request you to share the trace file logs to have detailed description of the error.
Regards,
Deepak Kori
Hello Deepak,
Deepak Kori wrote:
This problem is due to a missing MLOCK privilege for the dba group and the oper group. First use "getprivgrp dba" and "getprivgrp oper" to check whether this privilege exists. If it does not exist, you can assign it as a root either temporarily by using "setprivgrp dba MLOCK"/"setprivgrp oper MLOCK"
I have check the resolv.conf.
Everything is fine:
search unige.ch
nameserver x.y.z.w
nameserver x.y.z.w
nameserver x.y.z.w
I have read the Oracle DocId that you have provided.
It is for Linux and not for Solaris.
Concerning your commands, they are not applicable for Solaris.
How should I proceed for Solaris?
Thanks in advance for your answers.
User | Count |
---|---|
81 | |
10 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.