cancel
Showing results for 
Search instead for 
Did you mean: 

Full GC on java server nodes

Former Member
0 Kudos

Hi guys,

My portal system is on SAP NW 7.0 EHP2 and backend on ECC 6 EHP6.  Recently, on the portal system I'm getting lot of Full GCs and concurrent mode failure. I have two server nodes and heap size set to 3gb for both (It was 2gb initially). When the Full GCs and concurrent mode failure are occurring, users are not able to work on the portal application and it is timing out. It is only becoming operational again long after a heap dump occurs or I restart the instance manually.  Please suggest how I can resolve this issue.

Additional info: The portal is a Tax Officer Portal used to view, input and amend tax returns. Since the portal is java stack, my ECC system uses this java stack for ADS (z-reports displayed in PDF).

So the issue is that I'm unable to know what is causing the frequent Full GC situation and the OOM heap dump. I tried to analyze using Memory Analyzer but it's not clear enough. Please advise.

Regards.

Accepted Solutions (0)

Answers (6)

Answers (6)

Reagan
Advisor
Advisor
0 Kudos

I suggest you to check the memory allocated to this particular VM.

There should be way you can find it and then check whether the allocated memory for the VM is higher than the memory allocated for the database and SAP.

Regards

RB

Former Member
0 Kudos

Hi all,

Regarding the main memory for the VM, please note that this VM is 'sharing' the 256GB of the main machine (global zone). On this main machine, there are other VMs also. I can know how much memory is being consumed by the VM (currently, it is consuming around 13GB). What I can do is that I can cap the memory of the portal server to something like 32GB, thus this 32GB will be reserved for this portal server.

regards

Former Member
0 Kudos

Hi all,

Just to give a feedback. I raised the issue to SAP. SAP said it is a bug for which a particular patch (SAP J2EE ENGINE 7.02) needs to be applied.

Thanks for all the helpful tips.

regards.

Former Member
0 Kudos

To answer on your question you need to provide logs for analysis. Can you do this?

Former Member
0 Kudos

Hi all,

In the morning it was working ok but just some minutes ago, server0 started giving this (std_server0.out):

58103.556: [GC 58103.556: [ParNew: 451883K->36605K(512000K), 0.1618678 secs] 2472347K->2058569K(3043328K), 0.1623421 secs] [Times: user=2.29 sys=0.01,

real=0.16 secs]

58107.064: [GC 58107.065: [ParNew: 446205K->36078K(512000K), 0.1622301 secs] 2468169K->2060564K(3043328K), 0.1629539 secs] [Times: user=2.27 sys=0.01,

real=0.16 secs]

58110.573: [GC 58110.573: [ParNew: 445678K->36381K(512000K), 0.1551900 secs] 2470164K->2062137K(3043328K), 0.1556522 secs] [Times: user=2.10 sys=0.01,

real=0.16 secs]

58110.732: [GC [1 CMS-initial-mark: 2025756K(2531328K)] 2069865K(3043328K), 0.0710652 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]

58110.804: [CMS-concurrent-mark-start]

58114.084: [GC 58114.085: [ParNew: 445981K->35487K(512000K), 0.1408452 secs] 2471737K->2062783K(3043328K), 0.1414961 secs] [Times: user=1.93 sys=0.01,

real=0.14 secs]

58117.622: [GC 58117.622: [ParNew: 445087K->42786K(512000K), 0.1572854 secs] 2472383K->2071419K(3043328K), 0.1577878 secs] [Times: user=2.12 sys=0.01,

real=0.16 secs]

58120.534: [CMS-concurrent-mark: 9.319/9.730 secs] [Times: user=24.35 sys=0.22, real=9.73 secs]

58120.535: [CMS-concurrent-preclean-start]

58120.576: [CMS-concurrent-preclean: 0.039/0.041 secs] [Times: user=0.09 sys=0.00, real=0.04 secs]

58120.577: [CMS-concurrent-abortable-preclean-start]

58121.073: [GC 58121.073: [ParNew: 452386K->38914K(512000K), 0.1523676 secs] 2481019K->2069226K(3043328K), 0.1528477 secs] [Times: user=2.06 sys=0.01,

real=0.15 secs]

58123.710: [CMS-concurrent-abortable-preclean: 2.938/3.133 secs] [Times: user=9.03 sys=0.11, real=3.13 secs]

58123.716: [GC[YG occupancy: 368254 K (512000 K)]58123.717: [Rescan (parallel) , 0.2551403 secs]58123.972: [weak refs processing, 0.0149601 secs]58123

.987: [class unloading, 0.3664999 secs]58124.354: [scrub symbol table, 0.2478379 secs]58124.603: [scrub string table, 0.0268882 secs] [1 CMS-remark: 2

030311K(2531328K)] 2398565K(3043328K), 1.1023869 secs] [Times: user=5.66 sys=0.11, real=1.10 secs]

58124.819: [CMS-concurrent-sweep-start]

58125.523: [GC 58125.523: [ParNew: 448514K->38014K(512000K), 0.1271674 secs] 2472868K->2063735K(3043328K), 0.1276991 secs] [Times: user=1.57 sys=0.01,

real=0.13 secs]

58126.907: [CMS-concurrent-sweep: 1.937/2.088 secs] [Times: user=5.80 sys=0.07, real=2.09 secs]

58126.909: [CMS-concurrent-reset-start]

58126.951: [CMS-concurrent-reset: 0.042/0.042 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]

58128.948: [GC 58128.949: [ParNew: 447614K->46857K(512000K), 0.1323817 secs] 2463404K->2063979K(3043328K), 0.1328720 secs] [Times: user=1.54 sys=0.01,

real=0.13 secs]

58132.450: [GC 58132.451: [ParNew: 456457K->40349K(512000K), 0.1694101 secs] 2473579K->2059096K(3043328K), 0.1699432 secs] [Times: user=2.40 sys=0.01,

real=0.17 secs]

58136.153: [GC 58136.154: [ParNew: 449949K->39679K(512000K), 0.1694896 secs] 2468696K->2060115K(3043328K), 0.1701838 secs] [Times: user=2.28 sys=0.01,

real=0.17 secs]

58139.679: [GC 58139.679: [ParNew: 449279K->37786K(512000K), 0.1319183 secs] 2469715K->2059800K(3043328K), 0.1324555 secs] [Times: user=1.57 sys=0.01,

real=0.13 secs]

58143.143: [GC 58143.144: [ParNew: 447386K->45807K(512000K), 0.1184929 secs] 2469400K->2069109K(3043328K), 0.1192081 secs] [Times: user=1.32 sys=0.01,

real=0.12 secs]

58146.689: [GC 58146.689: [ParNew: 455407K->40078K(512000K), 0.1266728 secs] 2478709K->2064879K(3043328K), 0.1271572 secs] [Times: user=1.51 sys=0.01,

real=0.13 secs]

58149.653: [GC 58149.654: [ParNew: 449678K->39412K(512000K), 0.1464457 secs] 2474479K->2065844K(3043328K), 0.1470831 secs] [Times: user=2.15 sys=0.01,

real=0.15 secs]

58149.804: [GC [1 CMS-initial-mark: 2026432K(2531328K)] 2073143K(3043328K), 0.0816621 secs] [Times: user=0.08 sys=0.00, real=0.08 secs]

58149.887: [CMS-concurrent-mark-start]

58153.255: [GC 58153.256: [ParNew: 449012K->46885K(512000K), 0.1606395 secs] 2475444K->2074547K(3043328K), 0.1613918 secs] [Times: user=2.08 sys=0.01,

real=0.16 secs]

58155.721: [CMS-concurrent-mark: 5.643/5.833 secs] [Times: user=14.00 sys=0.53, real=5.83 secs]

58155.722: [CMS-concurrent-preclean-start]

58155.792: [CMS-concurrent-preclean: 0.068/0.071 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]

58155.793: [CMS-concurrent-abortable-preclean-start]

I got the heap dump also, but users are able to access it, though a bit slower.

Any suggestions....

regards.

Former Member
0 Kudos

Please attach full log file to message.

Former Member
0 Kudos

Hi,

What is the JVM that you are using. How much memory do you have on the server . How many server nodes do you have in your landscape.

SAP JVM or HP JVM.

Note 1506939 - Recommendations of Concurrent Mark & Sweep for HPJVM

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi all,

I use SAP JVM 4.1

As mentioned earlier, I have 2 server nodes on the portal server. Also the portal server is a solaris container. It uses the memory from the global zone (main server) which has 256 GB. The main server has other containers which shares the same memory.

I'm attaching an extract of DefaultTrace0.log at time of Full GC.

regards.

Former Member
0 Kudos

Please attach std_server0.out log to message not defaultTraceX.log.

Former Member
0 Kudos

Hi,

Try to add one more server node and see if it takes care of the load.

You can go up to 4 server node to get the best result.

I think that the portal has more load than what it is sized too.

Thanks

Rishi abrol

Former Member
0 Kudos

Hi all,

I'm attaching the std_server0.out file.

Maybe I need to add one additional server node....because as I said earlier, on the ECC side I have some reports that use the ADS of the portal server to display as PDF. I have some 600 users. So maybe when several users are running these PDF reports, it's putting extra strain on the Portal....

Pls give your suggestions.

regards.

Former Member
0 Kudos

Thanks. In addition please attach also dev_server0 log file.

Former Member
0 Kudos

Hi,

I've attached the dev_server0 file also.

regards.

Former Member
0 Kudos

Hi,

Just got the visual view of the Std_server.out and found that the memory suddenly peaks.

[160]   12:16:34***Warning: Potential spurious wakeup occurred at object <0xfffffffe7c6b3208> (a java.lang.Object), awaking thread "SAPEngine_System_Thread[impl:5]_65" tid=0x0000000105f55000.

"SAPEngine_System_Thread[impl:5]_65" cpu=4246.35 [reset 4246.35] ms allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37

io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0  [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 ]

prio=3 tid=0x0000000105f55000 nid=0xa0 / 160  lwp-id=160 runnable  [_thread_in_vm (_running), stack(0xfffffffe4a300000,0xfffffffe4a500000)] [0xfffffffe4a4ff000]

    at java.lang.Object.wait(J)V(Native Method)

    - waiting on <0xfffffffe7c6b3208> (a java.lang.Object)

    at java.lang.Object.wait()V(Object.java:429)

    at com.sap.jms.server.destinationcontainer.AgentThreadSystem.run()V(AgentThreadSystem.java:148)

    - locked <0xfffffffe7c6b3208> (a java.lang.Object)

    at com.sap.engine.frame.core.thread.Task.run()Ljava/lang/Object;(Task.java:64)

    at com.sap.engine.core.thread.impl5.SingleThread.execute()V(SingleThread.java:83)

    at com.sap.engine.core.thread.impl5.SingleThread.run()V(SingleThread.java:156)

Can you please check what is the size of the PDF that the user downloads.

Below is the graph for heap usage after GC collection.

These lines are interesting.

allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37

io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0  [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi Rishi,

Thanks for the analysis. What is the tool you have used to display the graphs etc?

We have many PDF forms that the users are running. There is one when the user saves it, it is around 500KB. I guess if several users are running this report, then may be it can be a cause for concern.

I need to check the reports individually....

But is it normal to show something like 29.17MB memory consumed? What may be causing this?

regards.

Former Member
0 Kudos

Hi,

This is one of the tools called as jmeter.

Second thing i think PDF size 500 kb is quite large . What is this report.

So for example as you said that you have 600 user i take 10% as the concurrent users so that is 60 users concurrent.

If i think that there are 5000 records in the pdf and it take 50MB so it will take 256 mb memory.

256*60= 15 gb memory is required.

This is rough figure just to give an idea.

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi Rishi,

Is it Apache Jmeter? Is it installed on windows? It seems very useful.

Regarding the issue, I'm also thinking these PDF reports might be causing this. The report consists of many fields and text and can go over 5 pages. Don't know whether the developers can further optimise it.

Well I'm thinking of 2 solutions:

1) On the ECC Side, I point the ADS RFC to another java server instead of the live portal server. In this way, portal users won't face performance degradation issues.

2) I create a third server node with max heap set to 3GB also.

What do you think?

regards.

Former Member
0 Kudos

Hi,

No its an just Jmeter tool. You might not find it .

Please go ahead and try to add node.

But another thing i would first check with developer that how much time does it take to generate the pdf and download it and if some thing can be done to reduce the size .

Another thing that i fould was an old note for the related issue in the std_server.out.

Note 804048 - Unnecessary high CPU load and JMS exceptions

This is old and not for your release but you can try to raise message for sap and see what help they can provide.

Thanks

Rishi Abrol

Former Member
0 Kudos

Can you attach OOM heap dump also?

Former Member
0 Kudos

Hi all,

Roman,

The heap dump is around 3 gb so would be difficult to attach.

Is there any other info I can provide?

regards.

Former Member
0 Kudos

Ok. What is amount of RAM available to host? Are both server nodes on the same host? Is DBMS on the same host? If yes what is amount of RAM allocated to it? Can you attach dev_server1 log file also?

Former Member
0 Kudos

Hi all,

Roman,

I mentioned earlier, the portal server is a solaris container that uses the memory from the global zone which has 256 GB memory. DBMS is on the same host (container).

I've attached the dev_server1 file.

I haven't yet created the 3rd server node. I'm investigating whether developers can reduce the pdf forms or can enable some kind of caching.

In the case I need to add the 3rd server node, can someone please outline the steps to do it (so that I don't miss anything)

By the way, yesterday I got the issue again (OOM heap dump) and had to restart the instance manually. I'm attaching the std_server0 and std_server1 files.

regards.

Former Member
0 Kudos

Hi,

Just try to add the server node in the config tool.

You might have to revisit all the parameters in the new server node as the server node will be created with the default parameter based on the bootstap. So check that all the parameters are correct. Please also check if you have done additional customizing related to logoff url or thing with in your node. Please alos check that.

Thanks

Rishi abrol

Former Member
0 Kudos

Ok Rishi, thanks.

In fact, I'll create the 3rd node same as the 2 other nodes, i.e with 3GB heap size.

Can you please tell what you mean by 'done additional customizing related to logoff url or thing with in your node' - do you mean changing the logoff pages etc....

regards

Former Member
0 Kudos

Without OOM heap dump it's hard to solve the problem and impossible to find the root cause of issue.But try the following: increase -Xms and -Xmx to 4096M. You can reset back:

-XX:MaxPermSize=512M

-XX:PermSize=512M

Make sure that you have enough free main memory to avoid paging. After some time attach std_server0 and std_server1 log files again for analysis.

Former Member
0 Kudos

Hi,

What i really mean is that any setting that you have made in the node should be check in the new node.

Sorry about the logoff page as it is in the global accross the server node . I was talking about the configuration related to specific node like Configuring the Portal Start Page

Visual Administrator -> Cluster -> Server_X ->Services -> HTTP Provider

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi,

It seems problem is with HEAP memory for each server node to be 3047 MB. This will solve the issue.

http://scn.sap.com/message/10565377 - Answered.

Regards,

Prabhakar

Former Member
0 Kudos

Hi guys,

As I mentioned earlier, I've set the -Xms to 3072M, increased the -XX:MaxPermSize to 768M and -XX:PermSize to 768M and restarted the system. As of now, I'm not getting any Full GCs and concurrent mode failure but I'm monitoring it. Let's see over a few days how it behaves.

Is there some good documentation on Concurrent Mode Failure, Full GC, GC etc?

regards.

Former Member
0 Kudos

Hi Suraj,

Great now the issue has fixed, Please close the thread - answered.

you will not get the issue again, pls refr my doc, this has been fixed similally.

We can have new thread for Concurrent Mode Failure, Full GC, GC etc.

BR,

Prabhakar

Reagan
Advisor
Advisor
0 Kudos

Hello

I recommend you to first do an analysis to identify where the memory leak is with the help of MAT

http://www.eclipse.org/mat/

Adding more memory to the heap size will not do help.

You need to first identify whether there is enough memory on the machine.

If the system is configured with memory which is less than the allocated memory for the applications then there is no point in increasing memory parameters.

First check the total memory allocated for the database and the J2EE stack.

You also need to consider the CPU configuration on the system.

Regards

RB

Former Member
0 Kudos

Hi all,

Thanks for your replies.

1) Below are my JVM parameters (same for both server nodes):

Max Heap size in MB is 3072.

-Xms2048M

-XX:MaxNewSize=600M

-XX:NewSize=600M

-XX:MaxPermSize=512M

-XX:PermSize=512M

-XX:+DisableExplicitGC

-XX:TargetSurvivorRatio=90

-XX:SurvivorRatio=4

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-verbose:gc

-XX:SoftRefLRUPolicyMSPerMB=1

-XX:+HeapDumpOnOutOfMemoryError

-Djco.jarm=1

-Djava.awt.headless=true

-Djava.security.policy=./java.policy

-Djava.security.egd=file:/dev/urandom

-Dsun.io.useCanonCaches=false

-Dorg.omg.CORBA.ORBClass=com.sap.engine.system.ORBProxy

-Dorg.omg.CORBA.ORBSingletonClass=com.sap.engine.system.ORBSingletonProxy

-Dorg.omg.PortableInterceptor.ORBInitializerClass.com.sap.engine.services.ts.jts.ots.PortableInterceptor.JTSInitializer

-Djavax.rmi.CORBA.PortableRemoteObjectClass=com.sap.engine.system.PortableRemoteObjectProxy

-Xss2M

-XX:+UseConcMarkSweepGC

-XX:HeapDumpPath=OOM.hprof

-Dcom.sap.jvm.scenario=j2ee

2) I dont have Wily Introscope now. I had tried configuring earlier but didnt proceed. Need to look at it afresh.

3) Like RB is saying, I'm also relunctant to further increase the heap memory.

On the server/architecture side, note the following details:

The Portal resides on a VM (solaris container). It shares its memory with the global zone (main server - 256 GB). On the main server, there are some other VMs also. Memory usage at OS level looks fine. Maybe I need to cap the memory of the portal VM to something like 16GB or 32GB....

4) On the database side, note the below details:

SQL> show parameter sga;

NAME                                 TYPE        VALUE

lock_sga                             boolean     FALSE

pre_page_sga                         boolean     FALSE

sga_max_size                         big integer 4512M

sga_target                           big integer 0

SQL> show parameter pga;

NAME                                 TYPE        VALUE

pga_aggregate_target                 big integer 3068217262

5) I was also thinking at the scenario where my ECC uses the ADS of the Portal server for displaying z-reports....can this be a cause??

Can you please advise based on the above info?

regards.

Reagan
Advisor
Advisor
0 Kudos

Max Heap size in MB is 3072.

-Xms2048M

As far as I know the -Xmx and -Xms should have the same value.

With the information provided by you shows that the Max Heap size (-Xmx) in MB is 3072 and -Xms is 2048M.

Regards

RB

Former Member
0 Kudos

Hi RB,

Indeed I was also thinking about that. I thought the smallest value is 2048 and the max value of the heap is 3072. In fact, the heap dumps are of size 3GB.

So what do you advise, I set same value (3072) for -Xmx and -Xms?

regards.

Reagan
Advisor
Advisor
0 Kudos

Set them to 3072M and restart the system.

If possible increase the Perm size to 768M

-XX:MaxPermSize=768M

-XX:PermSize=768M

If there is still heavy garbage collection then check what is being executed from the ECC system.

With this configuration the system (Java + DB) will be consuming approx 12 to 13 GB of memory.

Regards

RB

Former Member
0 Kudos

Ok RB,

I'll try this and update here. Hope it works out fine.

regards.

Former Member
0 Kudos

Hi,

How did you find that GC is full.

Do you see that in the std_serverX.out. file.

Note 723909 - Java VM settings for J2EE 6.40/7.0

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi,

As mentioned the meaning of the

the initial and the maximal heap size should be equal: -Xmx1024m -Xms1024m

So you can set that to same but can be different also.

Thanks

Rishi Abrol

Former Member
0 Kudos

Hi Suraj,

Do you have Wily Introscope configured?  This is the best tool to see the memory usage and growth and you can drill down into the program causing the memory usage.

Regards,

Graham

bxiv
Active Contributor
0 Kudos

As @Graham suggested you need a Java Heap tool (Wily while free to use, take time to setup on a Solution Manager system and should be setup prior to issues on systems).

As a test you could raise 1 of the servers to 4 or 6 gigs of heap and see if the amount of time it takes for the GC to start occurring is about the same time or less; as this could help with narrowing down if it is a number of users connecting to the system or a bad transport went through and changed something.

Another option if you do have Solution Manager setup, is to try a E2E trace under the RCA tab as this might tell you where/what business processes are doing to the systems.

naveegarg
Explorer
0 Kudos

Hi Suraj,

Please check if any external applications are deployed on this portal and if possible paste the defaulttrace to check further what happened during OOM error.

Also please paste the parameters for server node (e.g new generation , heap etc from Confg tool)

Regards

Naveen Garg