on 10-22-2013 12:54 PM
Hi guys,
My portal system is on SAP NW 7.0 EHP2 and backend on ECC 6 EHP6. Recently, on the portal system I'm getting lot of Full GCs and concurrent mode failure. I have two server nodes and heap size set to 3gb for both (It was 2gb initially). When the Full GCs and concurrent mode failure are occurring, users are not able to work on the portal application and it is timing out. It is only becoming operational again long after a heap dump occurs or I restart the instance manually. Please suggest how I can resolve this issue.
Additional info: The portal is a Tax Officer Portal used to view, input and amend tax returns. Since the portal is java stack, my ECC system uses this java stack for ADS (z-reports displayed in PDF).
So the issue is that I'm unable to know what is causing the frequent Full GC situation and the OOM heap dump. I tried to analyze using Memory Analyzer but it's not clear enough. Please advise.
Regards.
I suggest you to check the memory allocated to this particular VM.
There should be way you can find it and then check whether the allocated memory for the VM is higher than the memory allocated for the database and SAP.
Regards
RB
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi all,
Regarding the main memory for the VM, please note that this VM is 'sharing' the 256GB of the main machine (global zone). On this main machine, there are other VMs also. I can know how much memory is being consumed by the VM (currently, it is consuming around 13GB). What I can do is that I can cap the memory of the portal server to something like 32GB, thus this 32GB will be reserved for this portal server.
regards
To answer on your question you need to provide logs for analysis. Can you do this?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi all,
In the morning it was working ok but just some minutes ago, server0 started giving this (std_server0.out):
58103.556: [GC 58103.556: [ParNew: 451883K->36605K(512000K), 0.1618678 secs] 2472347K->2058569K(3043328K), 0.1623421 secs] [Times: user=2.29 sys=0.01,
real=0.16 secs]
58107.064: [GC 58107.065: [ParNew: 446205K->36078K(512000K), 0.1622301 secs] 2468169K->2060564K(3043328K), 0.1629539 secs] [Times: user=2.27 sys=0.01,
real=0.16 secs]
58110.573: [GC 58110.573: [ParNew: 445678K->36381K(512000K), 0.1551900 secs] 2470164K->2062137K(3043328K), 0.1556522 secs] [Times: user=2.10 sys=0.01,
real=0.16 secs]
58110.732: [GC [1 CMS-initial-mark: 2025756K(2531328K)] 2069865K(3043328K), 0.0710652 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
58110.804: [CMS-concurrent-mark-start]
58114.084: [GC 58114.085: [ParNew: 445981K->35487K(512000K), 0.1408452 secs] 2471737K->2062783K(3043328K), 0.1414961 secs] [Times: user=1.93 sys=0.01,
real=0.14 secs]
58117.622: [GC 58117.622: [ParNew: 445087K->42786K(512000K), 0.1572854 secs] 2472383K->2071419K(3043328K), 0.1577878 secs] [Times: user=2.12 sys=0.01,
real=0.16 secs]
58120.534: [CMS-concurrent-mark: 9.319/9.730 secs] [Times: user=24.35 sys=0.22, real=9.73 secs]
58120.535: [CMS-concurrent-preclean-start]
58120.576: [CMS-concurrent-preclean: 0.039/0.041 secs] [Times: user=0.09 sys=0.00, real=0.04 secs]
58120.577: [CMS-concurrent-abortable-preclean-start]
58121.073: [GC 58121.073: [ParNew: 452386K->38914K(512000K), 0.1523676 secs] 2481019K->2069226K(3043328K), 0.1528477 secs] [Times: user=2.06 sys=0.01,
real=0.15 secs]
58123.710: [CMS-concurrent-abortable-preclean: 2.938/3.133 secs] [Times: user=9.03 sys=0.11, real=3.13 secs]
58123.716: [GC[YG occupancy: 368254 K (512000 K)]58123.717: [Rescan (parallel) , 0.2551403 secs]58123.972: [weak refs processing, 0.0149601 secs]58123
.987: [class unloading, 0.3664999 secs]58124.354: [scrub symbol table, 0.2478379 secs]58124.603: [scrub string table, 0.0268882 secs] [1 CMS-remark: 2
030311K(2531328K)] 2398565K(3043328K), 1.1023869 secs] [Times: user=5.66 sys=0.11, real=1.10 secs]
58124.819: [CMS-concurrent-sweep-start]
58125.523: [GC 58125.523: [ParNew: 448514K->38014K(512000K), 0.1271674 secs] 2472868K->2063735K(3043328K), 0.1276991 secs] [Times: user=1.57 sys=0.01,
real=0.13 secs]
58126.907: [CMS-concurrent-sweep: 1.937/2.088 secs] [Times: user=5.80 sys=0.07, real=2.09 secs]
58126.909: [CMS-concurrent-reset-start]
58126.951: [CMS-concurrent-reset: 0.042/0.042 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]
58128.948: [GC 58128.949: [ParNew: 447614K->46857K(512000K), 0.1323817 secs] 2463404K->2063979K(3043328K), 0.1328720 secs] [Times: user=1.54 sys=0.01,
real=0.13 secs]
58132.450: [GC 58132.451: [ParNew: 456457K->40349K(512000K), 0.1694101 secs] 2473579K->2059096K(3043328K), 0.1699432 secs] [Times: user=2.40 sys=0.01,
real=0.17 secs]
58136.153: [GC 58136.154: [ParNew: 449949K->39679K(512000K), 0.1694896 secs] 2468696K->2060115K(3043328K), 0.1701838 secs] [Times: user=2.28 sys=0.01,
real=0.17 secs]
58139.679: [GC 58139.679: [ParNew: 449279K->37786K(512000K), 0.1319183 secs] 2469715K->2059800K(3043328K), 0.1324555 secs] [Times: user=1.57 sys=0.01,
real=0.13 secs]
58143.143: [GC 58143.144: [ParNew: 447386K->45807K(512000K), 0.1184929 secs] 2469400K->2069109K(3043328K), 0.1192081 secs] [Times: user=1.32 sys=0.01,
real=0.12 secs]
58146.689: [GC 58146.689: [ParNew: 455407K->40078K(512000K), 0.1266728 secs] 2478709K->2064879K(3043328K), 0.1271572 secs] [Times: user=1.51 sys=0.01,
real=0.13 secs]
58149.653: [GC 58149.654: [ParNew: 449678K->39412K(512000K), 0.1464457 secs] 2474479K->2065844K(3043328K), 0.1470831 secs] [Times: user=2.15 sys=0.01,
real=0.15 secs]
58149.804: [GC [1 CMS-initial-mark: 2026432K(2531328K)] 2073143K(3043328K), 0.0816621 secs] [Times: user=0.08 sys=0.00, real=0.08 secs]
58149.887: [CMS-concurrent-mark-start]
58153.255: [GC 58153.256: [ParNew: 449012K->46885K(512000K), 0.1606395 secs] 2475444K->2074547K(3043328K), 0.1613918 secs] [Times: user=2.08 sys=0.01,
real=0.16 secs]
58155.721: [CMS-concurrent-mark: 5.643/5.833 secs] [Times: user=14.00 sys=0.53, real=5.83 secs]
58155.722: [CMS-concurrent-preclean-start]
58155.792: [CMS-concurrent-preclean: 0.068/0.071 secs] [Times: user=0.07 sys=0.00, real=0.07 secs]
58155.793: [CMS-concurrent-abortable-preclean-start]
I got the heap dump also, but users are able to access it, though a bit slower.
Any suggestions....
regards.
Hi,
What is the JVM that you are using. How much memory do you have on the server . How many server nodes do you have in your landscape.
SAP JVM or HP JVM.
Note 1506939 - Recommendations of Concurrent Mark & Sweep for HPJVM
Thanks
Rishi Abrol
Hi all,
I use SAP JVM 4.1
As mentioned earlier, I have 2 server nodes on the portal server. Also the portal server is a solaris container. It uses the memory from the global zone (main server) which has 256 GB. The main server has other containers which shares the same memory.
I'm attaching an extract of DefaultTrace0.log at time of Full GC.
regards.
Hi all,
I'm attaching the std_server0.out file.
Maybe I need to add one additional server node....because as I said earlier, on the ECC side I have some reports that use the ADS of the portal server to display as PDF. I have some 600 users. So maybe when several users are running these PDF reports, it's putting extra strain on the Portal....
Pls give your suggestions.
regards.
Hi,
Just got the visual view of the Std_server.out and found that the memory suddenly peaks.
[160] 12:16:34 | ***Warning: Potential spurious wakeup occurred at object <0xfffffffe7c6b3208> (a java.lang.Object), awaking thread "SAPEngine_System_Thread[impl:5]_65" tid=0x0000000105f55000. |
"SAPEngine_System_Thread[impl:5]_65" cpu=4246.35 [reset 4246.35] ms allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37
io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 ]
prio=3 tid=0x0000000105f55000 nid=0xa0 / 160 lwp-id=160 runnable [_thread_in_vm (_running), stack(0xfffffffe4a300000,0xfffffffe4a500000)] [0xfffffffe4a4ff000]
at java.lang.Object.wait(J)V(Native Method)
- waiting on <0xfffffffe7c6b3208> (a java.lang.Object)
at java.lang.Object.wait()V(Object.java:429)
at com.sap.jms.server.destinationcontainer.AgentThreadSystem.run()V(AgentThreadSystem.java:148)
- locked <0xfffffffe7c6b3208> (a java.lang.Object)
at com.sap.engine.frame.core.thread.Task.run()Ljava/lang/Object;(Task.java:64)
at com.sap.engine.core.thread.impl5.SingleThread.execute()V(SingleThread.java:83)
at com.sap.engine.core.thread.impl5.SingleThread.run()V(SingleThread.java:156)
Can you please check what is the size of the PDF that the user downloads.
Below is the graph for heap usage after GC collection.
These lines are interesting.
allocated=30582056 B (29.17 MB) [reset 30582056 B (29.17 MB)] defined_classes=37
io= file i/o: 93991/14346 B, net i/o: 15690/1193 B, files opened:0, socks opened:0 [reset file i/o: 93991/14346 B, net i/o: 15690/1193 B
Thanks
Rishi Abrol
Hi Rishi,
Thanks for the analysis. What is the tool you have used to display the graphs etc?
We have many PDF forms that the users are running. There is one when the user saves it, it is around 500KB. I guess if several users are running this report, then may be it can be a cause for concern.
I need to check the reports individually....
But is it normal to show something like 29.17MB memory consumed? What may be causing this?
regards.
Hi,
This is one of the tools called as jmeter.
Second thing i think PDF size 500 kb is quite large . What is this report.
So for example as you said that you have 600 user i take 10% as the concurrent users so that is 60 users concurrent.
If i think that there are 5000 records in the pdf and it take 50MB so it will take 256 mb memory.
256*60= 15 gb memory is required.
This is rough figure just to give an idea.
Thanks
Rishi Abrol
Hi Rishi,
Is it Apache Jmeter? Is it installed on windows? It seems very useful.
Regarding the issue, I'm also thinking these PDF reports might be causing this. The report consists of many fields and text and can go over 5 pages. Don't know whether the developers can further optimise it.
Well I'm thinking of 2 solutions:
1) On the ECC Side, I point the ADS RFC to another java server instead of the live portal server. In this way, portal users won't face performance degradation issues.
2) I create a third server node with max heap set to 3GB also.
What do you think?
regards.
Hi,
No its an just Jmeter tool. You might not find it .
Please go ahead and try to add node.
But another thing i would first check with developer that how much time does it take to generate the pdf and download it and if some thing can be done to reduce the size .
Another thing that i fould was an old note for the related issue in the std_server.out.
Note 804048 - Unnecessary high CPU load and JMS exceptions
This is old and not for your release but you can try to raise message for sap and see what help they can provide.
Thanks
Rishi Abrol
Hi all,
Roman,
I mentioned earlier, the portal server is a solaris container that uses the memory from the global zone which has 256 GB memory. DBMS is on the same host (container).
I've attached the dev_server1 file.
I haven't yet created the 3rd server node. I'm investigating whether developers can reduce the pdf forms or can enable some kind of caching.
In the case I need to add the 3rd server node, can someone please outline the steps to do it (so that I don't miss anything)
By the way, yesterday I got the issue again (OOM heap dump) and had to restart the instance manually. I'm attaching the std_server0 and std_server1 files.
regards.
Hi,
Just try to add the server node in the config tool.
You might have to revisit all the parameters in the new server node as the server node will be created with the default parameter based on the bootstap. So check that all the parameters are correct. Please also check if you have done additional customizing related to logoff url or thing with in your node. Please alos check that.
Thanks
Rishi abrol
Without OOM heap dump it's hard to solve the problem and impossible to find the root cause of issue.But try the following: increase -Xms and -Xmx to 4096M. You can reset back:
-XX:MaxPermSize=512M
-XX:PermSize=512M
Make sure that you have enough free main memory to avoid paging. After some time attach std_server0 and std_server1 log files again for analysis.
Hi,
What i really mean is that any setting that you have made in the node should be check in the new node.
Sorry about the logoff page as it is in the global accross the server node . I was talking about the configuration related to specific node like Configuring the Portal Start Page
Visual Administrator -> Cluster -> Server_X ->Services -> HTTP Provider
Thanks
Rishi Abrol
Hi,
It seems problem is with HEAP memory for each server node to be 3047 MB. This will solve the issue.
http://scn.sap.com/message/10565377 - Answered.
Regards,
Prabhakar
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi guys,
As I mentioned earlier, I've set the -Xms to 3072M, increased the -XX:MaxPermSize to 768M and -XX:PermSize to 768M and restarted the system. As of now, I'm not getting any Full GCs and concurrent mode failure but I'm monitoring it. Let's see over a few days how it behaves.
Is there some good documentation on Concurrent Mode Failure, Full GC, GC etc?
regards.
Hello
I recommend you to first do an analysis to identify where the memory leak is with the help of MAT
Adding more memory to the heap size will not do help.
You need to first identify whether there is enough memory on the machine.
If the system is configured with memory which is less than the allocated memory for the applications then there is no point in increasing memory parameters.
First check the total memory allocated for the database and the J2EE stack.
You also need to consider the CPU configuration on the system.
Regards
RB
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi all,
Thanks for your replies.
1) Below are my JVM parameters (same for both server nodes):
Max Heap size in MB is 3072.
-Xms2048M
-XX:MaxNewSize=600M
-XX:NewSize=600M
-XX:MaxPermSize=512M
-XX:PermSize=512M
-XX:+DisableExplicitGC
-XX:TargetSurvivorRatio=90
-XX:SurvivorRatio=4
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-verbose:gc
-XX:SoftRefLRUPolicyMSPerMB=1
-XX:+HeapDumpOnOutOfMemoryError
-Djco.jarm=1
-Djava.awt.headless=true
-Djava.security.policy=./java.policy
-Djava.security.egd=file:/dev/urandom
-Dsun.io.useCanonCaches=false
-Dorg.omg.CORBA.ORBClass=com.sap.engine.system.ORBProxy
-Dorg.omg.CORBA.ORBSingletonClass=com.sap.engine.system.ORBSingletonProxy
-Dorg.omg.PortableInterceptor.ORBInitializerClass.com.sap.engine.services.ts.jts.ots.PortableInterceptor.JTSInitializer
-Djavax.rmi.CORBA.PortableRemoteObjectClass=com.sap.engine.system.PortableRemoteObjectProxy
-Xss2M
-XX:+UseConcMarkSweepGC
-XX:HeapDumpPath=OOM.hprof
-Dcom.sap.jvm.scenario=j2ee
2) I dont have Wily Introscope now. I had tried configuring earlier but didnt proceed. Need to look at it afresh.
3) Like RB is saying, I'm also relunctant to further increase the heap memory.
On the server/architecture side, note the following details:
The Portal resides on a VM (solaris container). It shares its memory with the global zone (main server - 256 GB). On the main server, there are some other VMs also. Memory usage at OS level looks fine. Maybe I need to cap the memory of the portal VM to something like 16GB or 32GB....
4) On the database side, note the below details:
SQL> show parameter sga;
NAME TYPE VALUE
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 4512M
sga_target big integer 0
SQL> show parameter pga;
NAME TYPE VALUE
pga_aggregate_target big integer 3068217262
5) I was also thinking at the scenario where my ECC uses the ADS of the Portal server for displaying z-reports....can this be a cause??
Can you please advise based on the above info?
regards.
Set them to 3072M and restart the system.
If possible increase the Perm size to 768M
-XX:MaxPermSize=768M
-XX:PermSize=768M
If there is still heavy garbage collection then check what is being executed from the ECC system.
With this configuration the system (Java + DB) will be consuming approx 12 to 13 GB of memory.
Regards
RB
Hi,
How did you find that GC is full.
Do you see that in the std_serverX.out. file.
Note 723909 - Java VM settings for J2EE 6.40/7.0
Thanks
Rishi Abrol
Hi Suraj,
Do you have Wily Introscope configured? This is the best tool to see the memory usage and growth and you can drill down into the program causing the memory usage.
Regards,
Graham
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
As @Graham suggested you need a Java Heap tool (Wily while free to use, take time to setup on a Solution Manager system and should be setup prior to issues on systems).
As a test you could raise 1 of the servers to 4 or 6 gigs of heap and see if the amount of time it takes for the GC to start occurring is about the same time or less; as this could help with narrowing down if it is a number of users connecting to the system or a bad transport went through and changed something.
Another option if you do have Solution Manager setup, is to try a E2E trace under the RCA tab as this might tell you where/what business processes are doing to the systems.
Hi Suraj,
Please check if any external applications are deployed on this portal and if possible paste the defaulttrace to check further what happened during OOM error.
Also please paste the parameters for server node (e.g new generation , heap etc from Confg tool)
Regards
Naveen Garg
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
84 | |
10 | |
10 | |
9 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.