cancel
Showing results for 
Search instead for 
Did you mean: 

Dispatcher doesn't reroute requests to 2nd server node when 1st node down

Former Member
0 Kudos

Hello,

We have a J2EE Engine with 2 server nodes.

I have made a little test: I brought one of the nodes down and left the other one running.

The result was that when users tried to loginto the portal they received the 500 error message: Dispatcher running but server not connected.

Isn't the whole idea of the dispatcher is to reroute requests to the 2nd server node once the 1st one is down? Perhaps something is wrong in my configuration?

I don't think this should matter but on this server one node is configured with debug on and the other is not.

I'll be happy to hear your opinion on that!

Roy

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Here is one log entry from the dispatcher defaulttrace that looks related:

14.06.2008 09:01:59.348 WARNING JMX connector exception occurred while processing external JMX request [ JMX request (java) v1.0 len: 314 | src: cluster target-node: 5648100 req: invoke params-number: 4 para

ms-bytes: 0 | :name=com.sap.portal.prt.bridge.service.mbeans.PRTMBeanRuntime,j2eeType=PRTBridge_JMX_SECTION,SAP_J2EEClusterNode=5648100,SAP_J2EECluster="" null null null ]

[EXCEPTION]

com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 314 | src: cluster target-node: 5648100 req: invoke params-number

: 4 params-bytes: 0 | :name=com.sap.portal.prt.bridge.service.mbeans.PRTMBeanRuntime,j2eeType=PRTBridge_JMX_SECTION,SAP_J2EEClusterNode=5648100,SAP_J2EECluster="" null null null ]

at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:537)

at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:586)

at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:90)

at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:172)

at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)

at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)

at com.sap.engine.frame.core.thread.Task.run(Task.java:64)

at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)

at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)

Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=com.sap.portal.prt.bridge.service.mbeans.PRTMBeanRuntime,j2eeType=PRTBridge_JMX_SECTION,SAP_J2EEClusterNode=5648100,SA

P_J2EECluster=P1P not found in repository

at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)

at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)

at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:576)

at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)

at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)

at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)

at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)

at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:531)

... 8 more

Here is another entry we see, but not sure if its related:

14.06.2008 13:37:37.907 ERROR Connection [1093120] is NOT removed! serverToConnections = [{5641951={}, 5641952={}, 5641953={}, 21570950={}, 21570951={}, 5641900={}, 5638850={}, 5638851={}, 5638852={}, 56388

53={}, 21570900={}, 5638800={}, 5635750={}, 5635751={}, 5635752={}, 5635753={}, 5651250={}, 5651251={}, 5651252={}, 5651253={}, 5635700={}, 5648150={com.sap.engine.core.manipulator.TCPRunnableConnection@8e3cf25

[closed=false, initialize=false, markedForClose=false, passCount=0, replyThreadRunning=false, replyThreadToStart=false, readThreadRunning=true, serviceName=http, synchronous=1, workingThreads=1, inProcess=false

, dataFromServer=false, connectionID=680448]}, 5651200={}, 5648151={com.sap.engine.core.manipulator.TCPRunnableConnection@8e3cf25[closed=false, initialize=false, markedForClose=false, passCount=0, replyThreadRu

nning=false, replyThreadToStart=false, readThreadRunning=true, serviceName=http, synchronous=1, workingThreads=1, inProcess=false, dataFromServer=false, connectionID=680448]}, 5648152={com.sap.engine.core.manip

ulator.TCPRunnableConnection@8e3cf25[closed=false, initialize=false, markedForClose=false, passCount=0, replyThreadRunning=false, replyThreadToStart=false, readThreadRunning=true, serviceName=http, synchronous=

1, workingThreads=1, inProcess=false, dataFromServer=false, connectionID=680448]}, 5648153={com.sap.engine.core.manipulator.TCPRunnableConnection@8e3cf25[closed=false, initialize=false, markedForClose=false, pa

ssCount=0, replyThreadRunning=false, replyThreadToStart=false, readThreadRunning=true, serviceName=http, synchronous=1, workingThreads=1, inProcess=false, dataFromServer=false, connectionID=680448], com.sap.eng

ine.core.manipulator.TCPRunnableConnection@23fe500a[closed=false, initialize=false, markedForClose=false, passCount=0, replyThreadRunning=false, replyThreadToStart=false, readThreadRunning=true, serviceName=htt

p, synchronous=2, workingThreads=0, inProcess=true, dataFromServer=false, connectionID=194816]}, 5645050={}, 5645051={}, 5645052={}, 5645053={}, -1={}, 5641950={}, 5645000={}}]

Thanks,

Tom

Former Member
0 Kudos

that is the idea. how about testing your message server to see if it still show that the node is online?

http://<server>:<msg http port>/msgserver/text/logon

jwise

Former Member
0 Kudos

Hi Joshua,

No it doesn't.

But I have noticed is this: If I bring down server1 and stay with server0 which is in debug mode the whole portal seems to be down. BUT, if I bring down server0 and stay with server1 which is in productive mode the whole portal seems to be up.

So, I assume the fact that the server is in it's debug settings does matter.

What I need to know is if I am right here and in case I am than explain why it is happening and whether a debug server can participate in a cluster environment.

Former Member
0 Kudos

From what I have see on my own systems.

If user A is on server 0 and user B is on server 1.

Server 0 goes down, user A or anyone else on that server gets kicked. They will get anywhere from Iview errors to 500 server errors. User B and other users go on like nothing happened.

User C attempts to log into the portal, they will get a 500 error. Why? I have see the dispatcher favor server 0. If server 0 is busy, requests go to server 1. If server 0 is down the dispatcher will still attempt to route them to server 0 if a new request comes in.

NOW if user A is on server 0 and user B is on server 1.

Server 1 goes down. User B gets kicked with anyone else on server 1. User A keeps on going.

If user C attempts to log on, they can.

A separate web dispatcher on another box may cure this.

Former Member
0 Kudos

Hello David,

So what you are saying is that in case server0 is down no matter if I have other server nodes up and running, new requests will fall until server0 will be up.

In my case it is happening with server1 and not server0.

Even though, the whole idea of the dispatcher is to reroute requests the a running server node and it doesn't seem to work.

What I would like to know if this is how it suppose to work because it doesn't seem OK to me...

Roy

Former Member
0 Kudos

Did you do your install with Rapid Installer?

Former Member
0 Kudos

Nop.

Former Member
0 Kudos

How much memory do you have in the dispatcher heap wise?

Former Member
0 Kudos

2048K why?

Former Member
0 Kudos

Is that your server heap or your dispatcher?

Former Member
0 Kudos

Apologies, server.

Dispatcher is on 1024K

Message was edited by:

Roy Cohen

Former Member
0 Kudos

Whoa, that's a lot!

Try max heap at 256 and:

-XX:NewSize=85m

-XX:MaxNewSize=85m

-XX:NewSize=28m

-XX:MaxNewSize=28m

-XX:+DisableExplicitGC

-Xms170m

And test it again.

Former Member
0 Kudos

Well, we have plenty of memory on the machine why not use it?

Anyway, before I change any of the JVM parameters can you please tell me why do you think a different memory adjustments can solve the problem?

Former Member
0 Kudos

I have found that the dispatcher does not need that much memory to operate. When I put the memory lower, more garbage collection occoured flushing out memory.

Theory is if the memory is flushed and stops pointing to an inactive server, it may not save the users on the dead server, but new logons will go to the active one.

Also, shift that extra memory to the servers then.

Former Member
0 Kudos

Hi David,

I don't think I agree with this theory because a dispatcher should identify that a server i down no matter how much memory it is on. Your settings will cause more GC major collections on the tenured generation which will slow down the responses.

(The servers have more than enough memory by the way...:))

Former Member
0 Kudos

Hmmmm, sounds like someone has a beefy server and budget to match.

Ever build a web dispatcher? I have never but, from what I can gather they are a must for a cluster.

Former Member
0 Kudos

Agree with the beefy budget

Regarding the web dispatcher, that still doesn't answer my question. You should look at the dispatcher and the two server nodes as a small logical cluster if you will and as such it should work. If you think of it it's not that hard either, just ping the server before forwarding the request... I don't beleive it is not there already what I don't understand is why it isn't working.

Former Member
0 Kudos

In the error log of the dispatcher, what is it saying? Is it giving you an httpIOexception or out of memory...................

Former Member
0 Kudos

Where can I find the dipatcher log files on the server?

Former Member
0 Kudos

Found it

I'll need to duplicate the problem and have another look.

Will let you know

Former Member
0 Kudos

I have just looked at the log (haven't duplicated the problem yet but I tested it 2 days ago and the trace still valid).

I don't see these exceptions, I see many others of course not the ones you stated...

Former Member
0 Kudos

Hi Roy,

>> So, I assume the fact that the server is in it's debug settings does matter.

Yes, it surely does. When a server is switched in debug mode, it should be excluded from the server load balancing (i.e. to ensure that no productive user requests could be dispatched there).

The reasoning can be found in the following paragraph in the <a href="http://help.sap.com/saphelp_nw04/helpdata/en/d4/31e24044b80b06e10000000a155106/frameset.htm">debugging documentation</a>:

"The problem comes with reaching breakpoints in the application being debugged. When the execution of such an application reaches a breakpoint, the Java Virtual Machine of the application is stopped. When the application runs on the J2EE Engine, this means that the entire server process which is handling the request will stop too. One server process may be processing many client requests at a time. So if the process is blocked because a debug request reached a breakpoint, all other requests will be blocked too, as they run in the same VM. If some of them are productive requests (meaning a request from a real user), this will be a problem. Furthermore, with the J2EE Engine, the entire cluster may be stopped. The reason is that the rest of the processes may wait for a response from the node processing the request – this could be cluster communication using the same database, holding a cluster-wide lock. For productive systems, this may be a critical problem."

So, if you would like to test load balancing, you should have at least two servers running in normal mode. Then, if you kill one of the servers, the user requests should be redirected to the second available server.

Hope that helps!

Former Member
0 Kudos

That's exactly what I was thinking just couldn't find the formal reference and justification for that!

I will check it, thanks.

Former Member
0 Kudos

Hi Roy

We are having similar issue. can you please explain me how you resolved the above issue.

Thank You

Reddy