cancel
Showing results for 
Search instead for 
Did you mean: 

Strange sapinst start problem on SLES10 SP1

Former Member
0 Kudos

Hi folks

I am trying to start sapinst, it does work, even the gui starts correctly. But after about 10 seconds sapinst terminates, stating that the gui did not login properly. The strange thing is, that i was able to start sapinst once or twice correctly.

host1:/Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64 # ./sapinst                   
[==============================] | extracting...  done!

guiengine: no GUI connected; waiting for a connection on host migzm210, port 21212 to continue with the installation

guiengine: login in process.
..............................

guiengine: login timeout; the client was unable to establish a valid connection
CSynEvent::~CSynEvent: an error occured;: Success

Suse, sapinst and java versions:

host1:/Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64 # cat /etc/SuSE-release 
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 1

host1:/Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64 #./sapinst -v
[==============================] | extracting...  done!

This is SAPinst, version 642, build 917371
compiled on Jul 30 2007, 02:42:28

host1:/Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64 # java -version
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 2.2)
IBM J9SE VM (build 2.2, J2RE 1.4.2 IBM J9 2.2 Linux amd64-64 j9xa64142ifx-20070808 (JIT enabled)
J9VM - 20070807_1500_LHdSMr
JIT  - r7_level20070315_1745)

Has anyone had this before?

Regards

Michael

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Not sure about it, but port 21212 on host migzm210 might be still in use by another process, e.g. by a process remaining from a previous, aborted run of sapinst.

So look for such a process and stop it. If everything else fails, a reboot could sort it out.

hope this helps

Former Member
0 Kudos

Nope, when the port is in use, then this message will occur:

guiengine: call to bind() for socket 3 failed. Address already in use
ERROR      2008-04-11 12:22:40 [iaxxgenimp.cpp:532]
           init() 
FGE-00006  Attempt to open a communication port connection failed. Check whether the port 21212 is already in use.

ERROR      2008-04-11 12:22:40 
FCO-00034  An error occurred during the installation. Problem: error in GUI server subsystem.

But i tried different ports as well, the they act the same. I can even see the gui connects, but still the server disconnects after 10 secs. I also started the server with --nogui, and connected from another server, same problem.

Regards

Michael

markus_doehr2
Active Contributor
0 Kudos

Try a

netstat -an

and look if the port is still in use.

Markus

Former Member
0 Kudos

Hi Markus

Please note that i do NOT have the 'port in use problem'.

We applied the newest java version according to note 861215, but sapinst still closes 10 secs after the gui connects.

Regards

Michael

markus_doehr2
Active Contributor
0 Kudos

The only thing left could be, that accept() fails, because it gets an EPERM - is you firewall activated? Is SELinux activated? If yes, disable it and try again.

Markus

Former Member
0 Kudos

Hi again

I checked iptables, all are on accept policy, i can also telnet to port 21212 without problem, SELinux is not active as well.

I also got the latest sapinst from swdc (which was actually older, than the one of the master dvd), same problem. Even a NW 7.0 sapinst is not working. I am going to open a OSS message now. Thanks so far, but still waiting for input.

Best regards

Michael

markus_doehr2
Active Contributor
0 Kudos

Do you see the same error if you completely deactivate the firewall? I've installed a LOT of systems using sapinst on SLES 10 SP1 and I've never came across that problem...

Markus

Former Member
0 Kudos

Hi again

The firewall is inactive. In the meantime i found out, that i am able to start sapinst correctly as sidadm user. But it still doesn't work with root. I am now checking for differences between the users.

Regards Michael

markus_doehr2
Active Contributor
0 Kudos

This is REALLY strange.

I just installed yesterday a system where everything worked as expected...

Another "guess":

- check the ulimits for root (/etc/security/limits.conf) + ulimit -a

- check JAVA_HOME for the root user

- use a "remote" sapinst

With the last point I mean to start sapinst without a DISPLAY variable set and run sapinst on another machine and connect to the Linux system, that´s how I do the installation if I have no possibility to run X remotely...

Markus

Answers (7)

Answers (7)

Former Member
0 Kudos

Hi,

I'm getting exactly this problem on a RHEL cluster.

Can you give me more info on the fix from SUSE so I can try and match it to RHEL. ?

Many Thanks

Former Member
0 Kudos

Sorry, i cannot point you to the exact SUSE bug, but i suggest you go directly to your red hat support with the signal queue information.

Besides that, the linux NW 7.0 SR3 version worked for me as well, it seems to ignore the handshake check.

Regards Michael

Former Member
0 Kudos

Hi, I also met this problem in NW7.0 SR3. My host is AIX5.3 and client is windows XP. and I have try the java version from 1.4.2_08 to 1.6 **. So I want to know is there some advices on this issue.

hannes_kuehnemund
Active Contributor
0 Kudos

Hi,

the SAP LinuxLab is currently investigating what is going wrong here. We were able to reproduce the problem and work together with Novell to fix the issue. When we're done, i'm going to post the solution. Until then, I locked the topic.

Thanks,

Hannes

Former Member
0 Kudos

Hi everybody

The SAP Linuxlab and SUSE were able to reproduce the error. The problem turned out to be a linux kernel bug. If you are having the problem, please check if you see strange signal queue values:

cat /proc/<pid>/status

Where pid is the process id of the sapinst process. You most probably have the bug, if you see something like this: SigQ: 18446744073709545431/71679

Normally the first number has to be smaller than the second, this is ok: SigQ: 0/71679

SUSE will include the fix within the next maintenance update of the kernel, so be sure to apply it, when you run into the issue.

Thanks to everybody for your valuable input! Best regards

Michael

Former Member
0 Kudos

Hi,

check /etc/hosts for localhost entry and try setting it to servers IP address (not 127.0.0.1). Had similar problem and solved it somehow by changing JAVA_HOME variable and manipulating with /etc/hosts

Regs,

FS

Former Member
0 Kudos

Michael

the remote sapinst gui was started from a Win2k PC.

If I start the sapinst with SAPINST_START_GUI=false as either ROOT or my adm user, my remote sapinst connection fails with the SSL error.

I'm able to continue using sapinst running locally on the SLES10 server, so although a pain not a showstopper.

regards,

Stephen

Former Member
0 Kudos

Hi

I'm also having the same problem when using nogui option.

When I start my local sapinst and connect to the remote server, it times out after 10secs with:

Network input/output exception has occurred: Remote host closed connection during handshake 
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake

I also have an issue when starting sapinst on my Linux SLES10 server as root without setting the nogui option.

However, that works now.

Still have the SSLhandshake issue.

Former Member
0 Kudos

Well, I can update you with my status:

After several contacts with the development guy, he provided a sapinst which skips deliberately the handshake check. They are still figuring out, what is going wrong. I will post here, when the issue is finally resolved.

@Stephen: I am not fully convinced, we have absolutely the same problem, obviously we both have an error during the gui - sapinst connection handshake. But in our case the problem occurs no matter if -nogui is specified or not. Even worse, if we start the sapinst as sidadm, it always works, if we start it as root the handshake seems to work very rarely. Did you start the remote gui on a SLES10 box as well, or from a windows client?

Regards

Michael

hannes_kuehnemund
Active Contributor
0 Kudos

Hey Michael,

this might not be related to the error, but worth to be mentioned. Instead of calling sapinst this way

host1:/Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64 # ./sapinst

please use the following method instead:

host1:~ # mkdir /tmp/sapinst_install
host1:~ #  cd /tmp/sapinst_install
host1:~ #  /Installation_Master_6.20_6.40_07_07/IM_LINUX_X86_64/SAPINST/UNIX/LINUXX86_64/sapinst

Thanks,

Hannes

Former Member
0 Kudos

@Markus, Hannes: i tried your suggestions, no luck so far. I checked the limits, JAVA_HOME, and the mkdir /tmp/sapinst_install. I also tried with a remote gui (it is possible to start sapinst with --nogui). The problem seems to be the sapinst itself, not the gui. As soon as i connect with the gui, sapinst terminates after a short time.

I did not get any answer from SAP support so far, but a am waiting on some trace, debug options for sapinst.

Thanks so far, i will post any new findings, regards

Michael

markus_doehr2
Active Contributor
0 Kudos

What you could try also is to use


strace -F <sapinst> >& strace_sapinst.log &

and check the last lines, maybe we can find out the syscall, what´s failing.

Markus

Former Member
0 Kudos

Hi Markus

We had to add -f for the forked processes to work:

strace -fF <sapinst> 2> strace_sapinst.log &

The problem seems to be a child process / thread, which dies, here is the problem part:

[pid  9199] clone(Process 9200 attached
child_stack=0x42003260, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|C
LONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x420039d0, tls=0x42003940, child_
tidptr=0x420039d0) = 9200
[pid  9199] recvfrom(5,  <unfinished ...>
[pid  9200] write(2, "\nguiengine: login in process.", 29
guiengine: login in process.) = 29
[pid  9200] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 😎 = 0
[pid  9200] rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 😎 = 0
[pid  9200] rt_sigprocmask(SIG_SETMASK, [], NULL, 😎 = 0
[pid  9200] nanosleep({1, 0},  <unfinished ...>
[pid  9199] <... recvfrom resumed> "\0s<?xml version=\"1.0\"?><sapinstg"..., 1000, 0, NULL, NULL) = 11
7
[pid  9199] getpeername(5, {sa_family=AF_INET, sin_port=htons(21730), sin_addr=inet_addr("146.67.64.23
2")}, [841813590032]) = 0
[pid  9199] futex(0x2adcd4965f50, FUTEX_WAKE, 2147483647) = 0
[pid  9199] sendto(5, "\0`<?xml version=\"1.0\" encoding=\""..., 98, 0, NULL, 0) = 98
[pid  9199] tgkill(9160, 9200, SIGRTMIN) = -1 EAGAIN (Resource temporarily unavailable)

We are suspecting a problem with NPTL, under SLES9 (which always works) we always set LD_ASSUME_KERNEL. Under SLES10 this is not possible anymore.

Regards, Michael

markus_doehr2
Active Contributor
0 Kudos

I have installed/copied a lot of systems using SLES 10 SP1 and I haven't seen that error...

Do you use the SuSE provided IBM JDK (which is known to not work correctly) or do you have the one from the SAP site?

Markus

hannes_kuehnemund
Active Contributor
0 Kudos

Dear Michael,

please do not set LD_ASSUME_KERNEL on SLES10. The meaning, or better said, what it is doing changed from SLES9 to SLES10 (actually, it changed in the kernel version). Please remove any LD_ASSUME_KERNEL settings when using SLES10. The SAP sapinst should be completely aware of this and will not set LD_ASSUME_KERNEL. To verify, do you set LD_ASSUME_KERNEL in any of the profile you are using (you mentioned, that sapinst works sidadm but not for root..)?

Thanks,

Hannes

Former Member
0 Kudos

@Markus: J2RE 1.4.2 IBM build j9xa64142-20080130 (SR10), this is the latest version mentioned in SAP note 861215, which one do you have?

@Hannes: yes we are not using LD_ASSUME_KERNEL under SLES10, it won't work, because the libs needed are not there anymore

Regards, Michael

Former Member
0 Kudos

146.67.64.23

Is this the correct IP Adress of Host migzm210, and is it on the correct eth Interface?

Regards

Manuel

markus_doehr2
Active Contributor
0 Kudos

Do not use the SuSE provided Java installation but the one on the site given in

note 861215 - Recommended Settings for the Linux on AMD64/EM64T JVM

It contains special fixes for SAP installations.

Are you installing on the console or remotely?

Markus