cancel
Showing results for 
Search instead for 
Did you mean: 

Solution Manager Monitoring - proper monitoring metric for disk performance

Former Member
0 Kudos

Dear all,

We have setup Solution Manager with some templates to monitor our SAP systems health. One of the areas we did this setup was in the Operating System space, where we had used the Average Service Time per Disk with the standard threshold of 100 ms. But what we are seeing is a huge number of occurrences happening several times per day. When looking in details to that, I noticed the messages happens almost all of the times for internal disks. As we are running in SuSe Linux 11 Enterprise, and the disks are all setup with Device Mapper, all of them appears as dm-XX no matter it is a storage disk or a internal one, so we can´t have distinguish between them.

Based in the current scenario I started to work with our Unix team to identify any bottleneck on disks, but so far we were unable to find any issue looking at the server performance stats.

Per what I could see in the Solution Manager this metrics is collected from SAPOSCOL:

"This is the average service time in milliseconds for I/O requests. This metric is retrieved from the saposcol running on the host."

My question here is how the SAPOSCOL get that information from OS, because I looked at IOSTAT documentation and it says the "Service Time"  metric (svctm) will be removed in future releases, so I am wondering we have same case for SAPOSCOL and this might not be accreted value.

Also, I'd like to know for those who are using the Solution Manager for system monitoring, how you are monitoring the disk performance for your systems.

Thanks in advance,

Itamar

Accepted Solutions (0)

Answers (1)

Answers (1)

bxiv
Active Contributor
0 Kudos

From a Windows Server perspective I have no had to adjust the 100 msec threshold and see this trigger more on LiveCache servers.

All of our SAP systems are sitting in VMware with a SAN using 4 x 8 GB Fiber connections; unsure if you have a similiar setup, but as you are looking to compare apples to apples this is probably something good to keep in mind.

What PL are you at for your saphostexec and saphostcontrol?  Also what values are you seeing from SolMan for the highs and have you verified how long they are staying that high?

Former Member
0 Kudos

Hi Billy,

thanks for your answer. Actually what we are seeing is the problem only happens to physical server and only for internal disks. Also, what I found interesting is, the metric is only showing such behavior when looking at the volumes using LVM (started by DM - device mapper). I think it should be more something related to the O.S. instead to the infrastructure itself. It seems it happens when OS processes are writing logs to the OS, per what I could see. It is a little bit different from what you have on your installation.

The current SAP Host Agent is 720 PL 181.

Also, I found a documentation on RedHat support site that explains the situation partially,

https://access.redhat.com/solutions/38912

Based on what I found so far, I think the question I still have is more related how do the saposcol get that information from the Linux OS, to understand if that metric is really useful or not.

Thanks again,

Itamar

bxiv
Active Contributor
0 Kudos

Saposcol should be apart of your host agent, I had an issue with windows for a while if the page file was set to span over all the disks; SolMan would see a value of 0 and constantly be wrong.  Not sure if this was fixed with a stack upgrade on SolMan or the PL of the host agent.

As an easy test you could upgrade from 181, I believe last week or the week prior I upgraded to 206.

Former Member
0 Kudos

Hi Billy,

Thanks for that. I will compare the results from other systems we have with newer versions to understand what are the differences in the results.

thanks,

Itamar