cancel
Showing results for 
Search instead for 
Did you mean: 

Timeouts increased after we moved USR, SAP data files and TLogs to new SAN

neil_hoff
Participant
0 Kudos

We are having issues with timeouts after we moved our USR, SAP SQL Datafiles and SAP Transaction Logs from our old SAN to a new SAN.

Timeouts for SAPGUI users are set to 10 minutes.

We are running Windows Server 2003 with SQL Server 2005.

The SAP database has 8 datafiles with a total size of about 350GB.

___________________________________________

Procedure we used to move SAP to new SAN:

1. Attached 3 new SAN Volumes

-a. USR

-b. Data Files

-c. Transaction Logs

2. Shutdown SAP and SQL services

3. Alligned the new volumes with a 1024kb offset and gave the Data files and Transaction log volumes a 64kb allocation

size. (The alignment and 64kb allocation size were not setup for these volumes on the old SAN)

4. Copied the 3 volumes from old to new.

5. Changed the new volumes drive letter to the drive letters of the old volumes.

-a. I had to restart in order to change the USR volume.

-b. Because of this I had to resetup the sapmnt and saploc shares.

6. Started SQL services and then SAP services and everything came up just fine.

____________________________________________

The week before we had anywhere from 1 to 9 timeouts per day.

This week: Monday had 20 and Tuesday had 26.

On Monday we saw that MD07 was the only transaction that was timing out, but Tuesday had others as well.

The amount of users in the system is about the same. The amount of orders going in are about the same. No big transports went in right before we switched.

Performance counters that I know about for disk look a lot better on the Data Files.

- PAGEIOLATCH_SH ms/request is about 50% better

- Under I/O Performance in DBACOCKPIT:

- MS/OP is now anywhere from 5 to 30 - Old SAN: 50 to 300

- The Hit Ratio is over 99% - same as the old SAN

Looking at Wiley Introscope graphs:

- The "SAP Host: Average queue length" is about 30% to 40% lower then the old SAN.

- the "SAP Host: Disk utilization in %" is about the same.

Questions:

1. Did we do anything wrong or miss anything with our move procedure?

a. Do we have to do anything in SQL since we changed volumes even though we kept the drive letters the same?

2. What other logs or performance counters should I be looking at?

Thank you,

Neil

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

How you see the overall performance in ST03. You may get more details about the disk from ST06N. Also make sure the the new storage has correctly configured in terms of cache etc...

neil_hoff
Participant
0 Kudos

Hi Sunil,

Two things i noticed in ST03 from this week compared to last week on our central instance:

Under the Database tab:

Under the "Average Sequential Read Time" column:

AUTOTH old SAN was at 0 now it is showing 37.0

BUFFER SYNC old SAN was at 0 and now is showing 30.6

Why would these numbers get worse when almost all the other statics are getting better?

Thank you,

Neil

Former Member
0 Kudos

have you compared the dialog respone time in ST03 is better or worst.

I think you must show the above findings to your storage admin and vendor.

Answers (4)

Answers (4)

xymanuel
Active Participant
0 Kudos

Hi,

what read times do you have?

Should be below 15ms for Data LUNs.

Write time for Transaction Log LUN should be below 5ms.

How long are your Disk Queues? Disk queue length should not exceed disk spindle count of your attached LUN.

Do you have a good distributeion of your LUNs over your storage Processors? Not all on one.

HBA settings ok? There maybe a parameter for the "Queue depth".

Are HBAs set to LoadBalancing? (Use paths in paralell)

How many IOPS do your DB Server generate? Peak count? How many and what type of disks do you have in an LUN?

Which is the RAID level?

One FC Disk with 15k should be able to handle about 180 IOPS.

Alignment with 1MB and 64k Blocksize is correct.

And just for my interesst, what type is your new storage?

Kind regards

Manuel

Edited by: Manuel Herr on Jan 28, 2011 10:53 PM

neil_hoff
Participant
0 Kudos

Our new SAN Vendor is Compellent. They have been fantastic. I would highly recommend checking them out.

The reasons for the timeouts had nothing to do with the SAN...Well kind of anyway.

I decided to check t-code SM20 to see what users were doing when these timeouts were happening. What I found was the program R_BAPI_NETWORK_MAINTAIN was being called thousands of times in a matter of 10 to 15 minutes at random times through out the day. It would take up about 50 to 80 percent of the amount of programs being executed during these times.

So, I sent this information to our developers and they found out that R_BAPI_NETWORK_MAINTAIN was being called from another program that was looping thousands of times. The trigger to stop the loop wasn't happening fast enough. They made a change and we haven't seen the timeouts since.

I think that the performance increases allowed the loop to run faster which caused the slow downs and timeouts to happen more often.

Thank you to everyone for their help!

Neil

markus_doehr2
Active Contributor
0 Kudos

...and I found

http://msdn.microsoft.com/en-us/library/dd758814.aspx

Maybe that will help to trace the problem down.

Markus

neil_hoff
Participant
0 Kudos

Hi Markus,

Thanks for the reply. We actually followed the recommendations in that document (http://msdn.microsoft.com/en-us/library/dd758814.aspx) to setup our disk.

Our Network admin contacted our storage vendor and they showed him all the stats and everything looks good on their end. Way better then it did on our old SAN.

This really doesn't make any sense to me. The SQL backup, run at night, takes about 35 minutes instead of 50 minutes, but during the day the system runs super slow and we get a bunch of timeouts.

I am in the process of looking at all of our profile parameters to see if their are any issues.

Thank you,

Neil

markus_doehr2
Active Contributor
0 Kudos

I would contact the storage vendor, maybe a wrong alignment of data is set on the storage side (we had this problem in the past).

Markus

Former Member
0 Kudos

Hi Neil,

For SAN performance issues I'd watch the disk utilization and the disk queue lengths in ST06. I'd also check to see if the load across the SAN is being balanced correctly.

Has anyone else commented that the application is slower? Has anything changed on the network? It seems odd that your performance numbers are better yet user are getting bounced out.

J. Haynes

neil_hoff
Participant
0 Kudos

Hi Joe,

Disk utilization is almost the same as it was before. During the day it bounces between 80% and 100%. This is one area I thought would have been better with the new SAN.

Disk Queue Length has gotten about 50% better with the new SAN.

How is Disk Utilization calculated? What should it be at during busy times?

Thank you,

Neil