cancel
Showing results for 
Search instead for 
Did you mean: 

SAP resources not starting in Solaris cluster after Kernel upgrade

Former Member
0 Kudos

Hi,

We were in the process of upgrading the SAP kernel version from 236 to 254. The upgrade was succcessful in our dev and test systems, but got into issue in Production.

Our ECC version is 6.0 and kernel release 700. In Production, we have a solaris cluster in place and SAP and DB resources are clustered among 2 servers to have a high availability. The process followed for kernel upgrade in production is as follows:

1) Extracted the 2 SAR files downloaded from service market to a new directory exe_new under /sapmnt/<SID>

2) Stopped the 4 application servers

3) Stopped SAP resources and Oracle resources in cluster

4) Mouned the file system /sapmnt/<SID> on server hosting central instance

5) Stopped all services running under <SIDADM> in central instance and application server

6) Renamed exe folder in /sapmnt/<PRD> to exe_backup

7) Renamed exe_new folder in /sapmnt/<SID> to exe

😎 Started oracle resources in cluster, which was successful

9) Started SAP resources in cluster, which was not getting started

Can some one please check the issue and guide us if we missed any steps?

Regards,

BIJOY

Accepted Solutions (0)

Answers (4)

Answers (4)

willi_eimler
Contributor
0 Kudos

Hi,

I forgot icmbnd. pleace check. icmbnd uses root context with stikybit like this:

-rwsr-x---   1 root       sapsys     4107696 May 28 07:23 icmbnd

Best regards

Willi Eimler

willi_eimler
Contributor
0 Kudos

Hi,

first of all: Sorry for my bad englich,  i'm short in time!

But I think your shareredmemory segments were not clean and the instance-exe  directories are not supplied with the kernel. Try this procedure:

1.) Make Copy of old Kernel


2.) stop all sap instances with stopsap


3.) Upgrade saphostagent (on every instance):
  root> cd /tmp
  root> mkdir saphostagent; cd saphostagent
  root> /sapmnt/PIE/SAPCAR -xvf /<Path of saphostagent SAR file>/SAPHOSTAGENT<Version>.SAR
  root> ./saphostexec -upgrade
  root> /usr/sap/hostctrl/exe/saphostexec -stop
  root> /usr/sap/hostctrl/exe/saposcol -k

4.) Stop diagnostic agent on all instances
    
5.) Stop sapstartsrv on all instances
      sidadm> sapcontrol -nr <Systemnumber of instances> -prot NI_HTTP -function StopService;
 
6.) check Sharedmemorysegments
    sidadm> showipc all

    if there are still segments then:
    sidadm> cleanipc <Systemnumber shown by showipc> remove

7.) Delete old Kernel:
    root> cd /usr/sap/<SID>/SYS/exe/run/
    root> rm -rf *

8.) Extract Kernel
    sidadm> /sapmnt/PIE/SAPCAR -xfv <Ptha and Filename of SAR files>
    Don't forget the sapcryptolib if used;)

9.) saproot:
    root> cd /usr/sap/PIE/<SID>/exe/run/
    root> ./saproot.sh <SID>

10.) Delete and rebuild exe-directorys of instancedirectorys
    on every instance do:
    root> cd /usr/sap/<SID>/<Instance e.G. DVEBMGS00 or D10>/exe
    root> rm -rf *
   
    Rebuild:
    sidadm> cd /usr/sap/<SID>/<Instance e.G. DVEBMGS00 or D10>/work
    sidadm> sapcpe pf=/usr/sap/<SID>/SYS/profile/<SID>_<Instance e.G. DVEBMGS00 or D10>_<HOST>

11.) startsap

Best regards

Willi Eimler

Former Member
0 Kudos

> We were in the process of upgrading the SAP kernel version from 236 to 254.

If you want to patch SAP Kernel to higher PL within the same release level you need to extract new .SAR files to existing kernel directory (not to remove old files).

nelis
Active Contributor
0 Kudos

Hi,

If you want to patch SAP Kernel to higher PL within the same release level you need to extract new .SAR files to existing kernel directory (not to remove old files).

Actually that is not true. The reason people do it this way is because it saves some time for having to add other dependencies which may be in use eg sapcryptolib, IGS etc. Sometimes it's a good idea to start fresh and update all these components in a new directory where no old files are left behind.

Having said that, the issue the system might not be starting up is because some of these "other dependencies" may be missing and he will need to check that.

Regards,

Nelis

csaba_goetz
Contributor
0 Kudos

Hi Bijoy,

Please share more information about point

9) Started SAP resources in cluster, which was not getting started

- Are there any errors in OS syslog about the SAP resources?

- Review / attach sapstart.log, sapstartsrv.log, stderr and dev_disp in /usr/sap/SID/Instance/work folder; probably there are relevant information in them.

You should rather not rename /sapmnt/SID/exe but move its content to another folder (e.g. exe_backup) and extract the new kernel in /sapmnt/SID/exe. Try it this way.

Best regrads,

Adam

Former Member
0 Kudos

Hello Adam,

I have listed the logs below. And also one doubt related to your suggestion related to your last point on directory renaming - I followed the same process in our dev and test system and it worked fine some how and the main difference in production in the solaris cluster. Any specific steps / configurations in cluster end during any kernel upgrade activities?

sapstart.log -


SAP-R/3-Startup Program Rel 700 V1.8 (2003/04/24)
-------------------------------------------------

Starting at 2013/06/15 13:12:13
Startup Profile: "/usr/sap/PRD/SYS/profile/START_DVEBMGS00_sscprdsap"

Execute Pre-Startup Commands
----------------------------
(8989) Local: /usr/sap/PRD/SYS/exe/run/sapmscsa -n pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(8993) Local: ln -s -f /usr/sap/PRD/SYS/exe/run/rslgcoll co.sapPRD_DVEBMGS00
(8995) Local: ln -s -f /usr/sap/PRD/SYS/exe/run/rslgsend se.sapPRD_DVEBMGS00
(8997) Local: ln -s -f /usr/sap/PRD/SYS/exe/run/msg_server ms.sapPRD_DVEBMGS00
(8999) Local: ln -s -f /usr/sap/PRD/SYS/exe/run/disp+work dw.sapPRD_DVEBMGS00

Starting Programs
-----------------
(9015) Starting: local co.sapPRD_DVEBMGS00 -F pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9015) New Child Process created.
(9015) Starting local Command:
Command:  co.sapPRD_DVEBMGS00
           -F
           pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9016) Starting: local se.sapPRD_DVEBMGS00 -F pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9016) New Child Process created.
(9017) Starting: local ms.sapPRD_DVEBMGS00 pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9016) Starting local Command:
Command:  se.sapPRD_DVEBMGS00
           -F
           pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9018) Starting: local dw.sapPRD_DVEBMGS00 pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9017) New Child Process created.
(9017) Starting local Command:
Command:  ms.sapPRD_DVEBMGS00
           pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9018) New Child Process created.
(9018) Starting local Command:
Command:  dw.sapPRD_DVEBMGS00
           pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9019) Starting: local /usr/sap/PRD/SYS/exe/run/igswd_mt -mode=profile pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(9019) New Child Process created.
(8988) Waiting for Child Processes to terminate.
(9019) Starting local Command:
Command:  /usr/sap/PRD/SYS/exe/run/igswd_mt
           -mode=profile
           pf=/usr/sap/PRD/SYS/profile/PRD_DVEBMGS00_sscprdsap
(8988) **** 2013/06/15 13:12:16 Child 9016 terminated with Status 2 . ****
(9016) **** 2013/06/15 13:12:16 No RestartProgram command for program 2  ****

sapstartsrv.log - (old)


---------------------------------------------------
trc file: "sapstartsrv.log", trc level: 0, release: "700"
---------------------------------------------------
pid        8605

Sat Jun 15 09:40:19 2013
No halib defined => HA support disabled
Initializing SAPControl Webservice
SapSSLInit failed => https support disabled
Starting WebService thread
Webservice thread started, listening on port 50013
Trusted http connect via Unix domain socket '/tmp/.sapstream50013' enabled.

sapstartsrv.log -


---------------------------------------------------
trc file: "sapstartsrv.log", trc level: 0, release: "700"
---------------------------------------------------
pid        8984

Sat Jun 15 13:12:13 2013
No halib defined => HA support disabled
Initializing SAPControl Webservice
SapSSLInit failed => https support disabled
Starting WebService thread
Webservice thread started, listening on port 50013
Trusted http connect via Unix domain socket '/tmp/.sapstream50013' enabled.

dev_disp (old) -


---------------------------------------------------
trc file: "dev_disp.new", trc level: 1, release: "700"
---------------------------------------------------
sysno      00
sid        PRD
systemid   370 (Solaris on SPARCV9 CPU)
relno      7000
patchlevel 0
patchno    236
intno      20050900
make:      single threaded, ASCII, 64 bit, optimized
pid        8631


Sat Jun 15 09:40:28 2013
kernel runs with dp version 243(ext=110) (@(#) DPLIB-INT-VERSION-243)
length of sys_adm_ext is 364 bytes
*** SWITCH TRC-HIDE on ***
***LOG Q00=> DpSapEnvInit, DPStart (00 8631) [dpxxdisp.c   1287]
shared lib "dw_xml.so" version 236 successfully loaded
shared lib "dw_xtc.so" version 236 successfully loaded
shared lib "dw_stl.so" version 236 successfully loaded
shared lib "dw_gui.so" version 236 successfully loaded
shared lib "dw_mdm.so" version 236 successfully loaded
rdisp/softcancel_sequence :  -> 0,5,-1
use internal message server connection to port 13900
MtxInit: 30000 0 0
DpSysAdmExtInit: ABAP is active
DpSysAdmExtInit: VMC (JAVA VM in WP) is not active
DpIPCInit2: start server >sscprdsap_PRD_00                        <
DpShMCreate: sizeof(wp_adm)  40656 (1232)
DpShMCreate: sizeof(tm_adm)  53610880 (26792)
DpShMCreate: sizeof(wp_ca_adm)  88064 (88)
DpShMCreate: sizeof(appc_ca_adm) 176000 (88)
DpCommTableSize: max/headSize/ftSize/tableSize=2000/8/2192040/2192048
DpShMCreate: sizeof(comm_adm)  2192048 (1088)
DpSlockTableSize: max/headSize/ftSize/fiSize/tableSize=0/0/0/0/0
DpShMCreate: sizeof(slock_adm)  0 (104)
DpFileTableSize: max/headSize/ftSize/tableSize=0/0/0/0
DpShMCreate: sizeof(file_adm)  0 (72)
DpShMCreate: sizeof(vmc_adm)  0 (1840)
DpShMCreate: sizeof(wall_adm)  (224040/346312/80/104)
DpShMCreate: sizeof(gw_adm) 48
DpShMCreate: SHM_DP_ADM_KEY  (addr: ffffffff70800000, size: 56686064)
DpShMCreate: allocated sys_adm at ffffffff70800000
DpShMCreate: allocated wp_adm at ffffffff70801e18
DpShMCreate: allocated tm_adm_list at ffffffff7080bce8
DpShMCreate: allocated tm_adm at ffffffff7080bd48
DpShMCreate: allocated wp_ca_adm at ffffffff73b2c6c8
DpShMCreate: allocated appc_ca_adm at ffffffff73b41ec8
DpShMCreate: allocated comm_adm at ffffffff73b6ce48
DpShMCreate: system runs without slock table
DpShMCreate: system runs without file table
DpShMCreate: allocated vmc_adm_list at ffffffff73d840f8
DpShMCreate: allocated gw_adm at ffffffff73d84178
DpShMCreate: system runs without vmc_adm
DpShMCreate: allocated ca_info at ffffffff73d841a8
DpShMCreate: allocated wall_adm at ffffffff73d841b0
MBUF state OFF
DpCommInitTable: init table for 2000 entries
rdisp/queue_size_check_value :  -> off

Sat Jun 15 09:40:29 2013
ThTaskStatus: rdisp/reset_online_during_debug 0
EmInit: MmSetImplementation( 2 ).
MM global diagnostic options set: 0
<ES> client 0 initializing ....
<ES> InitFreeList
<ES> block size is 4096 kByte.
Using implementation std
<ES> Info: use normal pages (no huge table support available)
EsStdUnamFileMapInit: ES base = 0xfffffffba8000000
EsStdInit: Extended Memory 15360 MB allocated
<ES> 3839 blocks reserved for free list.
ES initialized.
mm.dump: set maximum dump mem to 96 MB

Sat Jun 15 09:41:30 2013
rdisp/http_min_wait_dia_wp : 1 -> 1
***LOG Q0K=> DpMsAttach, mscon ( sscprdsap) [dpxxdisp.c   12650]
use SAPLOCALHOST=<sscprdsap> as internal hostname
DpStartStopMsg: send start message (myname is >sscprdsap_PRD_00                        <)
DpStartStopMsg: start msg sent
CCMS: AlInitGlobals : alert/use_sema_lock = TRUE.
DpMsgAdmin: Set release to 7000, patchlevel 0
MBUF state PREPARED
MBUF component UP
DpMBufHwIdSet: set Hardware-ID
***LOG Q1C=> DpMBufHwIdSet [dpxxmbuf.c   1050]
DpMsgAdmin: Set patchno for this platform to 236
Release check o.K.

Sat Jun 15 09:41:41 2013
MBUF state ACTIVE
DpModState: change server state from STARTING to ACTIVE

Sat Jun 15 09:46:11 2013
DpSigInt: caught signal 2
DpHalt: shutdown server >sscprdsap_PRD_00                        < (normal)
DpModState: change server state from ACTIVE to SHUTDOWN
Stop work processes

Sat Jun 15 09:46:13 2013
Stop gateway
Stop icman
Terminate gui connections
wait for end of work processes
wait for end of gateway
waiting for termination of gateway ...

Sat Jun 15 09:46:15 2013
wait for end of icman
waiting for termination of icman ...

Sat Jun 15 09:46:16 2013
waiting for termination of icman ...

Sat Jun 15 09:46:17 2013
waiting for termination of icman ...

Sat Jun 15 09:46:18 2013
waiting for termination of icman ...

Sat Jun 15 09:46:19 2013
waiting for termination of icman ...

Sat Jun 15 09:46:21 2013
DpStartStopMsg: send stop message (myname is >sscprdsap_PRD_00                        <)
DpStartStopMsg: stop msg sent

Sat Jun 15 09:46:22 2013
DpHalt: sync with message server o.k.
detach from message server
***LOG Q0M=> DpMsDetach, ms_detach () [dpxxdisp.c   12996]
MBUF state OFF
MBUF component DOWN
cleanup EM
cleanup event management
cleanup shared memory/semaphores
Profile configuration error detected, use temporary corrected setup
Shared Pool 40: ipc/shm_psize_40 = 128000000 (too small)
Shared Pool 40: (smaller than min requirement 153676088)
Shared Pool 40: (estimated size assumed 156000000)
*** INFO  Shm 42 in Pool 40    17547 KB estimated     12037 KB real (   -5510 KB    -32 %)
removing request queue
***LOG Q05=> DpHalt, DPStop ( 8631) [dpxxdisp.c   11467]
*** shutdown completed - server stopped ***

Thanks in advance for any help on the topic.

Regards,

BIJOY

csaba_goetz
Contributor
0 Kudos

Hello Bijoy,

One more remark reg.  point

5) Stopped all services running under <SIDADM> in central instance and application server

sapstartsrv process needs to be stopped as well. This process may be started by root (by sapinit script while booting, see e.g. SAP note 936273 / 823941).

Best regards,

Adam

Former Member
0 Kudos

Check log files for rslgsend process.

Former Member
0 Kudos

Hello All,

Under  /var/adm/messages, I could see the error as follows:

SC[SUNW.sap_ci_v2,prd-sap-rg,prd-sap-ci-res,sap_ci_svc_start]: [ID 930059 daemon.error] /sapmnt/PRD/exe/startsap_sscprdsap_00: No such file or directory

Regards,

BIJOY

csaba_goetz
Contributor
0 Kudos

Hello Bijoy,

we can see that the rslgsend process (se.sapPRD_DVEBMGS00) stopped

**** 2013/06/15 13:12:16 Child 9016 terminated with Status 2 . ****

but this is not the reason of startup issue.

Refer to my previous comment as well:

One more remark reg.  point

5) Stopped all services running under <SIDADM> in central instance and application server

sapstartsrv process needs to be stopped as well. This process may be started by root (by sapinit script while booting, see e.g. SAP note 936273 / 823941).

Adam

Former Member
0 Kudos

:

sapstartsrv processes starts (must to start) as <sid>adm user not root (-u option in /usr/sap/sapservices file).

What relation between sapstartsrv service (administration and monitoring service for SAP instance) and SAP instance which can lead to inability to start SAP instance itself?

Former Member:

Can you attach dev_w* log also?

csaba_goetz
Contributor
0 Kudos

Hello,

Just an example:

probud2:bcsadm 51> ps -ef | grep sapstartsrv

bcsadm    4118     1  0 Jun11 ?        00:00:00 /usr/sap/BCS/SCS01/exe/sapstartsrv pf=/usr/sap/BCS/SYS/profile/START_SCS01_probud2 -D -u bcsadm

Process is started by root and running with uid sidadm.

How it must be runnig and how it is running are sometimes two different things...

When changing the kernel sapstartsrv must be stopped as well. This is important.

How sapstartsrv is connected to start SAP instance? It's very simple. Just have a look at startsap script. sapcontrol is called to start SAP system (sapcontrol -nr NR -host HOST -pro PROT -function StartWait XXX YY). sapcontrol is the control program for sapstartsrv. Therefore if sapcontrol (...) StartWait (or just the -function Start) is called it goes to sapstartsrv and SAP system is started by sapstartsrv. This way if sapstartsrv is e.g. not running (or hanging) when sapcontrol calls the StarWait function SAP won't start either. Take a look at note 936273 for example.

Adam

Former Member
0 Kudos

Adam Csaba Goetz wrote:

probud2:bcsadm 51> ps -ef | grep sapstartsrv

bcsadm    4118     1  0 Jun11 ?        00:00:00 /usr/sap/BCS/SCS01/exe/sapstartsrv pf=/usr/sap/BCS/SYS/profile/START_SCS01_probud2 -D -u bcsadm

Process is started by root and running with uid sidadm.

How it must be runnig and how it is running are sometimes two different things...

You sapstartsrv is started as bcsadm user. You can't to start sapstartsrv service as root user until you adjust user profile (e.g. LD_LIBRARY_PATH to resolve dependencies - manual actions).

How sapstartsrv is connected to start SAP instance? It's very simple. Just have a look at startsap script. sapcontrol is called to start SAP system (sapcontrol -nr NR -host HOST -pro PROT -function StartWait XXX YY). sapcontrol is the control program for sapstartsrv. Therefore if sapcontrol (...) StartWait (or just the -function Start) is called it goes to sapstartsrv and SAP system is started by sapstartsrv. This way if sapstartsrv is e.g. not running (or hanging) when sapcontrol calls the StarWait function SAP won't start either. Take a look at note 936273 for example.

But in this case you simply get error of webservice method call. sapstartsrv starts SAP instance as usually you do with startsap command (more likely). Moreover in UNIX you can start SAP instance without running sapstartsrv. startsap will start it during startup. I can't see any relation how sapstartsrv can influence on result of SAP instance startup (success or fail). Also in cluster environments startup of SAP instances is handed to cluster software.

csaba_goetz
Contributor
0 Kudos

Hi,

You sapstartsrv is started as bcsadm user. You can't to start sapstartsrv service as root user until you adjust user profile (e.g. LD_LIBRARY_PATH to resolve dependencies - manual actions).

Yes I can without any adjustment:

probud2:~ # ps -ef | grep sapstartsrv

bcsadm    4118     1  0 Jun11 ?        00:00:00 /usr/sap/BCS/SCS01/exe/sapstartsrv pf=/usr/sap/BCS/SYS/profile/START_SCS01_probud2 -D -u bcsadm

probud2:~ # kill 4118

probud2:~ # /usr/sap/BCS/SCS01/exe/sapstartsrv pf=/usr/sap/BCS/SYS/profile/START_SCS01_probud2 -D

probud2:~ # ps -ef | grep sapstartsrv

root     26365     1  0 11:11 ?        00:00:00 /usr/sap/BCS/SCS01/exe/sapstartsrv pf=/usr/sap/BCS/SYS/profile/START_SCS01_probud2 -D

But it is NOT about my SAP test system... Its about the logic and the possibilities.

Lets focus on Bijoy's question.

But in this case you simply get error of webservice method call. sapstartsrv starts SAP instance as usually you do with startsap command (more likely).

When you call startsap it starts sapstartsrv as well. But what did I wrote before?

This way if sapstartsrv is e.g. not running (or hanging) WHEN sapcontrol calls the StartWait function SAP won't start either.

sapstartsrv does not run -> startsap is called -> sapstartsrv gets started -> sapcontol ... StartWait is called -> SAP will start (but this is NOT what I was talking about)

sapstartsrv does not run -> startsap is called -> sapstartsrv cannot be started (for any reasons) or hanging -> sapcontrol ... StartWait is called -> SAP won't start (this is what I was talking about)

Moreover in UNIX you can start SAP instance without running sapstartsrv.

When you call sapstart pf=path/startup_profile in background it will. But this is not the case how it should be started, at least not with SAP NW release 700 or later. And it is not how startsap works for these releases.

Adam

csaba_goetz
Contributor
0 Kudos
Hello Bijoy,
SC[SUNW.sap_ci_v2,prd-sap-rg,prd-sap-ci-res,sap_ci_svc_start]: [ID 930059 daemon.error] /sapmnt/PRD/exe/startsap_sscprdsap_00: No such file or directory
startsap_sscprdsap_00 is an alias of startsap which should be available in $HOME folder of sidadm user and not in /sapmnt/PRD/exe. See e.g. this document (quite old but helpful).
- Does startsap_sscprdsap_00 exist in /home/prdadm? Or in /sapmnt/PRD/exe
- Can you see any reference in login scripts (.cshrc, ...) of prdadm (in /home/prdadm/.*) to 'alias startsap'?
- Is startsap_<host>_<nr> available in the other system's /sapmnt/SID/exe folder?
Perhaps the file got lost during the steps
6) Renamed exe folder in /sapmnt/<PRD> to exe_backup
7) Renamed exe_new folder in /sapmnt/<SID> to exe
Try to copy it back from exe_backupto exe when you changethe kernel.
Or review the aliases, make an alias startsap to $HOME/startsap_<host>_<nr> and copy it to $HOME (if it is still not there).
Best regards,
Adam
Former Member
0 Kudos

> sapstartsrv does not run -> startsap is called -> sapstartsrv cannot be started (for any reasons) or hanging -> sapcontrol ... StartWait is called -> SAP won't start (this is what I was talking about)

In that case you receive errors about webservice method call errors (like NIECONN_REFUSED). Moreover, according to log provided by Former Member : sapstrartsrv was started successfully. But for some reason dev_disp was stopped:

> DpSigInt: caught signal 2