on 04-19-2007 12:53 PM
Dear all,
After a disk failure/replacement in a SAN disk array that we are using for storage, we had to restore our Oracle database. After the restore the database does not start properly: It takes around 12 min to move from the state "Mount" to the state "Open". It finally opens but even after this, when I start SAP, only 5 workprocesses connect and logging into SAP is not possible, the gui just hangs
There are no errors in the alert_RSP.log. I attach an extract here to show the time difference between the mount state and the open state
Thu Apr 19 11:16:07 2007
Starting ORACLE instance (normal)
Thu Apr 19 11:16:29 2007
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 3
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.7.0.
System parameters with non-default values:
processes = 240
sessions = 288
timed_statistics = TRUE
event = 38068 trace name context forever, level 100
shared_pool_size = 620756992
sga_max_size = 1159695232
shared_pool_reserved_size= 20401094
java_pool_size = 167772160
enqueue_resources = 8000
filesystemio_options = SETALL
control_files = /oracle/RSP/sapdata1/cntrl/cntrlRSP.dbf, /oracle/RSP/origlogA/cntrl/cntrlRSP.dbf, /oracle/RSP/sa
parch/cntrl/cntrlRSP.dbf
db_block_checksum = FALSE
db_block_size = 8192
compatible = 9.2.0
log_archive_start = TRUE
log_archive_dest = /oracle/RSP/saparch/RSParch
log_buffer = 1048576
log_checkpoint_interval = 0
log_checkpoint_timeout = 0
db_files = 254
db_file_multiblock_read_count= 8
log_checkpoints_to_alert = TRUE
control_file_record_keep_time= 30
dml_locks = 5000
replication_dependency_tracking= FALSE
transaction_auditing = FALSE
undo_management = AUTO
undo_tablespace = PSAPUNDO
undo_suppress_errors = TRUE
undo_retention = 43200
remote_os_authent = TRUE
db_domain = WORLD
session_cached_cursors = 800
hash_join_enabled = FALSE
background_dump_dest = /oracle/RSP/saptrace/background
user_dump_dest = /oracle/RSP/saptrace/usertrace
max_dump_file_size = 20000
core_dump_dest = /oracle/RSP/saptrace/background
optimizer_features_enable= 9.2.0
sort_area_size = 2097152
sort_area_retained_size = 0
db_name = RSP
open_cursors = 800
optimizer_mode = choose
sortelimination_cost_ratio= 10
btree_bitmap_plans = FALSE
pushjoin_predicate = FALSE
optimizerjoin_sel_sanity_check= TRUE
optimizer_index_cost_adj = 10
optimpeek_user_binds = FALSE
optimizeror_expansion = DEPTH
PMON started with pid=2
DBW0 started with pid=3
DBW1 started with pid=4
DBW2 started with pid=5
DBW3 started with pid=6
LGWR started with pid=7
CKPT started with pid=8
SMON started with pid=9
RECO started with pid=10
Thu Apr 19 11:16:31 2007
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=11
ARC0: Archival started
ARC1 started with pid=12
ARC1: Archival started
Thu Apr 19 11:16:31 2007
ARCH: STARTING ARCH PROCESSES COMPLETE
Thu Apr 19 11:16:31 2007
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no FAL' ARCHARC0: Thread not mounted
Thu Apr 19 11:16:31 2007
ARC1: Becoming the heartbeat ARCH
ARC1: Becoming the heartbeat ARCHARC1: Thread not mounted
Thu Apr 19 11:16:31 2007
ALTER DATABASE MOUNT
Thu Apr 19 11:16:36 2007
Successful mount of redo thread 1, with mount id 1209727119
Thu Apr 19 11:16:36 2007
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Thu Apr 19 11:16:36 2007
ALTER DATABASE OPEN
Thu Apr 19 11:16:37 2007
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 1 opened at log sequence 6727
Current log# 12 seq# 6727 mem# 0: /oracle/RSP/origlogB/log_g12m1.dbf
Current log# 12 seq# 6727 mem# 1: /oracle/RSP/mirrlogB/log_g12m2.dbf
Successful open of redo thread 1
Thu Apr 19 11:16:37 2007
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Apr 19 11:16:38 2007
SMON: enabling cache recovery
Thu Apr 19 11:16:49 2007
Successfully onlined Undo Tablespace 27.
Thu Apr 19 11:16:49 2007
SMON: enabling tx recovery
Thu Apr 19 11:16:50 2007
Database Characterset is WE8DEC
Thu Apr 19 11:29:09 2007
Completed: ALTER DATABASE OPEN
Please note the time difference between 11:16:50 and 11:29:09, when the database is mounted and we are just waiting for it to open
I run dbverify, all ok
I reviewed note 521264 and my issue is described in case 2c. However, after reviewing all 127 datafiles and online redologs, I do not have any files with an "S" in their permissions
We run Solaris 10, Oracle 9.2.0.7 and WAS 6.40 SP16
Any help will be very much appreciated
Many thanks
Andreas
Hello,
I will check about the patches, thanks. I do not think it is a recovery issue, I mean we tried to start it several times, restored it from a backup, restored the binaries, still have the same issue
At the moment, we managed to restore the system to other hardware, using the same disks and it worked fine. This denotes that it is an OS problem , we opened a call to SUN about this. I will update this thread when we have some news from them
Many thanks
Andreas
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
For your information, the issue with Oracle taking too much time to move from state "mount" to state "open" after being recovered successfully
was a problem with memory fragmentation in Sun T2000 servers when running Oracle
It was solved by setting the following parameter
set disable_ism_large_pages=0x34
in /etc/system
Regards
Andreas
Thanks for your replies so far
My Unix admin has tried various things to solve this. We just managed to prove that the disks were not part of the problem, by recreating the system to another hardware and keep the data in the same disks as before. The system started ok without any issues there
Now, we are not sure why the current hardware (it is a Solaris 10 zone actually ) can not open Oracle ok and we are investigating on this. Neither a Solaris zone reboot nor a reboot of the whole physical server has helped
Andreas
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Recovery is probably the correct answer for the time. Don't forget, you need to read all the archive logs generated since the backup. This is what takes the largest amount of time, the redo is read after the archives, and generally only involves 1 log that is the same size (or smaller ) than the archive logs.
Vinod
Hello,
Here are the details:
R3trans -d returns 0000 instantly. I am not including trans.log here as the connection to the database works . I have no space issues neither, have around 70GB left under /oracle
The startup log is as follows
Trace of system startup/check of SAP System RSP on Fri Apr 20 08:38:10 BST 2007
Called command: /usr/sap/RSP/SYS/exe/run/startsap
Starting SAP Instance DVEBMGS00
-
SAP-R/3-Startup Program Rel 640 V1.8 (2003/04/24)
-
Starting at 2007/04/20 08:38:10
Startup Profile: "/usr/sap/RSP/SYS/profile/START_DVEBMGS00_rspda002"
Execute Pre-Startup Commands
-
(6285) Local: /usr/sap/RSP/SYS/exe/run/sapmscsa -n pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
/usr/sap/RSP/SYS/exe/run/sapmscsa: SCSA is attached and useable.
/usr/sap/RSP/SYS/exe/run/sapmscsa: finished.
(6285) Local: rm -f ms.sapRSP_DVEBMGS00
(6285) Local: ln -s -f /usr/sap/RSP/SYS/exe/run/msg_server ms.sapRSP_DVEBMGS00
(6285) Local: rm -f dw.sapRSP_DVEBMGS00
(6285) Local: ln -s -f /usr/sap/RSP/SYS/exe/run/disp+work dw.sapRSP_DVEBMGS00
(6285) Local: rm -f co.sapRSP_DVEBMGS00
(6285) Local: ln -s -f /usr/sap/RSP/SYS/exe/run/rslgcoll co.sapRSP_DVEBMGS00
(6285) Local: rm -f se.sapRSP_DVEBMGS00
(6285) Local: ln -s -f /usr/sap/RSP/SYS/exe/run/rslgsend se.sapRSP_DVEBMGS00
Starting Programs
-
(6304) Starting: local ms.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
(6305) Starting: local dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
(6306) Starting: local co.sapRSP_DVEBMGS00 -F pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
(6307) Starting: local se.sapRSP_DVEBMGS00 -F pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
(6285) Waiting for Child Processes to terminate.
Instance on host rspda002 started
ps -ef | grep dw is shown below: It only starts 5 -6 dialog processes. All the other processes are not started (normally I have around 40 )
rspda002:rspadm 33% ps -ef | grep dw
rspadm 6317 6305 0 08:38:17 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6310 6305 0 08:38:16 ? 0:01 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6311 6305 0 08:38:16 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6624 5523 0 08:43:35 pts/1 0:00 grep dw
rspadm 6305 6285 0 08:38:11 ? 0:04 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6312 6305 0 08:38:17 ? 0:01 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6313 6305 0 08:38:17 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
That is ps -ef | grep sap
rspda002:rspadm 32% ps -ef | grep sap
rspadm 6317 6305 0 08:38:17 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6310 6305 0 08:38:16 ? 0:01 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
root 12375 650 0 11:15:58 ? 3:56 /usr/sap/RSP/SYS/exe/run/saposcol
rspadm 6311 6305 0 08:38:16 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6309 6305 0 08:38:16 ? 0:01 icman -attach pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6306 6285 0 08:38:11 ? 0:00 co.sapRSP_DVEBMGS00 -F pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6307 6285 0 08:38:11 ? 0:00 se.sapRSP_DVEBMGS00 -F pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6304 6285 0 08:38:11 ? 0:00 ms.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6308 6305 0 08:38:16 ? 0:02 gwrd -dp pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6622 5523 0 08:43:29 pts/1 0:00 grep sap
rspadm 6305 6285 0 08:38:11 ? 0:04 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 20243 650 0 15:11:34 ? 13:11 sapccm4x -DCCMS pf==/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6285 650 0 08:38:11 ? 0:00 /usr/sap/RSP/SYS/exe/run/sapstart pf=/usr/sap/RSP/SYS/profile/START_DVEBMGS00_r
rspadm 6312 6305 0 08:38:17 ? 0:01 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
rspadm 6313 6305 0 08:38:17 ? 0:00 dw.sapRSP_DVEBMGS00 pf=/usr/sap/RSP/SYS/profile/RSP_DVEBMGS00_rspda002
Many thanks
Andreas
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Andreas,
Can't it be that your new disk is not correctly assigned from the SAN?
Just to test access time, you can run:
<i>time dd if=<your_physical_disk_path> of=/dev/null count=1000 bs=10k</i>
...and repeat it with other Oracle disk from the SAN. Compare the results.
Hope it could help you!
Regards,
JC Llanes.
Hi,
Do you have the response or log from startsap?
Can you do a ps -ef|grep dw and send response to me?
Can you do a R3trans -d and send reply?
Thanks,
George
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
89 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.