cancel
Showing results for 
Search instead for 
Did you mean: 

I/O bottleneck verification

jmtorres
Active Participant
0 Kudos

Good Day,

We have an IQ 15.4 ESD#3 running on Red Hat 5.5 on Linux which is attached to an old  EMC Symmetrix SAN.

Usually during  peak day hours , say 9:00am-noon and 3:00pm - 5:00pm (basically all day)we have monitored  using top and iostat with the  outputs like these ones(in average) :

16 core machine

CPU     %user     %system             %iowait   

all           35.92     83.00                    60.15    

iostat output is attached(please see)


If you look at avgqu-sz , await* ,svctm and %util we see values that according to Linux information   idcicate that there is a disk bottleneck problem which causes top to show   alow %user time and a high system% time

Also there are some lack of HG indexes that should be created , but we are not  sure that this could fully end the i/o problem

Could you confirm this analysis?

Note: Regarding "svctm" , when running "man iostat", there is a note which states: "Do no trust this field anymore". This field will be removed in a future sysstat version"--does this mean we sholud ignore this  parameter?

Thank you

Regards

Jose-Miguel

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

Very high await time (10000 to 25000 miliseconds) indicates threads do not get enough time to be processed by CPU along with DISK processing. Along with iostat you should check vmstat output and see if there are many threads under kernel threads "blocked" column. If that is the case, then CPU is over loaded and you should also look at adding more CPU power to the machine.

Are these 16 cores are physical CPUs or less physical CPUs and multi-threaded CPUs showing logical CPUs as 16?

Regards

Shashi

jmtorres
Active Participant
0 Kudos

Shashi,

Our vmstat  output shows “procs”  instead of “kthr” . Is this the same.?

Kernel -2.6.18-238.el5

All cores are physical (no hyper threads)

Thank you

JMT

0 Kudos

Hi Jose,

   Can you please post the vmstat output.

Regards

Shashi

jmtorres
Active Participant
0 Kudos

Sashi,

I tried to attach an EXCEL sheet but  this page wouldn't allow me. Here is an text file.

Please, only review the last querter of the document

Thankyou

Regards

Jose-Miguel

0 Kudos

You definitely have CPU resource issue , which is causing additional problem of I/O bottleneck.

From the vmstat I see following,

32 235  49232 1179624 972076 104000384    0    0 153760 59023 8211 56896 35 50  1 15  0

50 123  49232 1179284 972260 104001928    0    0 90284 100199 11973 48552 42 44  0 13  0

and then,

73 126  49232 1161728 976076 104003552    0    0 45075 35660 6094 24953 39 55  4  3  0

27 91  49232 1157320 976324 104005536    0    0 136411 45769 6727 35513 25 65  1  9  0

35 67  49232 1155796 976560 104008048    0    0 156995 46076 5838 35792 19 70  1 10  0

27 113  49232 1157060 976816 104008336    0    0 65365 48806 6049 31659 29 63  1  7  0

29 126  49232 1157476 977036 104007464    0    0 34984 69843 6330 28955 19 69  4  7  0

48 93  49232 1158464 977280 104008608    0    0 56310 49762 6776 34677 25 67  3  6  0

12 131  49232 1151384 977512 104010960    0    0 86959 92209 6479 47788 36 48  3 13  0

35 129  49232 1152320 977784 104012144    0    0 93003 21899 7592 67730 40 32  7 22  0

24 76  49232 1149172 978020 104012856    0    0 47984 62455 5677 31412 34 60  2  3  0

29 23  49232 1150272 978212 104013928    0    0 92435 56222 7290 26770 50 46  1  2  0

As you have 16 CPUs and blocked threads are in hundreds, causing the issue. Most likely cause is on this server IQ or some other process (if there is another application running) is using CPU intensive task as well as I/O intensive task and you do not have enoough CPU power to handle I/O, Even you reduce iqnumbercpus in .cfg file, it wil lonly restrict optimizer, but not threads.

I do see such pattern at intermittent intervals in vmstat, indicating I/O is also random. As you also have extremely high I/O wait, FIRST thing you should do is check disk speeds at OS level and then controllers.

What is your "number of users" setting and how many users are actually active at peak time on the IQ server?

Below script of vmstat will also "add" timestamp, which will help in confirming with .iqmsg in terms of timing.

cat vmstat.scr

DIR_NAME=SET_THIS_AS_PER_YOUR_ENVIRONMENT_TO_COLLECT_DATA

MON=`date +%m`

DAY=`date +%d`

Hour=`date +%H`

MIN=`date +%M`

SEC=`date +%S`

PLATFORM=`uname -s`

export PLATFORM SEC MIN Hour DAY MON DIR_NAME

LOGFILE=$DIR_NAME/vmstat$MON$DAY$Hour$MIN$SEC.log

exposrt LOGFILE

      vmstat 2 2 > $DIR_NAME/@@vmstat.log

      if [ $PLATFORM != "AIX" ]

      then

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 3d | sed 2,3d`  >> $LOGFILE

      else

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 7d | sed 6,7d`  >> $LOGFILE

      fi

      if [ $PLATFORM != "AIX" ]

      then

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 3d | sed 1d | sed 2d`  >> $LOGFILE

      else

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 7d | sed 1,5d | sed 2d`  >> $LOGFILE

      fi

   while true

   do

      vmstat 2 2 > $DIR_NAME/@@vmstat.log

      VMSTAT_PID=$!

      if [ $PLATFORM != "AIX" ]

      then

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 1,3d`  >> $LOGFILE

      else

         echo `date +%m:%d:%H:%M:%S` `cat $DIR_NAME/@@vmstat.log |sed 1,7d`  >> $LOGFILE

      fi

     sleep 2

  done

jmtorres
Active Participant
0 Kudos

Shashi,

  -gm is at 100  , usually at peak hours is 100% user connections. Anyway, this weekend we’re moving to a new storage ( HP 3Par). We ran a full DB backup in this new SAN  and took ~10hours for 6.5TB  vs 30hr on the old EMC Symmetrix

Also as you pointed , there is a Linux  process running periodically called  “SPAZIO MFT/S Managed and Secure File Transfer” which we assume could be causings some  I/O problems and cpu contention. Do you know something about this process?

Thank you very much for the vmstat script!

Regards

Jose-Miguel

Answers (1)

Answers (1)

saroj_bagai
Contributor
0 Kudos

First you will need to resolve disk contention issue,  adding HG index will not  eliminate disk contention issue.