High await in iostat?

  • Follow


Greetings,

     We have a RH 5.5 machine running v7.0.2 of the sysstat tools:

# iostat -V
sysstat version 7.0.2

    This is hooked up to an HP EVA8100 via HBA's (4GB) running
multipathing (v0.4.7) and LVM.

# multipath -h
multipath-tools v0.4.7 (03/12, 2006)

     When I try "dd if=/dev/urandom of=./gig1.6 bs=4096 count=409600"
to create a 1.6G file, I see the following
in "iostat":

# iostat -xN -d vg1-lv1 5 5
Linux 2.6.18-194.17.4.el5 (burl-artest)         11/15/2010

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
vg1-lv1           0.00     0.00  0.03 21.75     0.22   173.98
8.00     0.55   22.35   0.04   0.08

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
vg1-lv1           0.00     0.00  0.00 3278.40     0.00 26227.20
8.00   219.91   67.08   0.01   4.64

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
vg1-lv1           0.00     0.00  0.00 1923.20     0.00 15385.60
8.00    56.66   29.46   0.01   2.78

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
vg1-lv1           0.00     0.00  0.00 2005.60     0.00 16044.80
8.00    58.70   29.27   0.01   2.84

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
vg1-lv1           0.00     0.00  0.00 2005.20     0.00 16041.60
8.00    57.38   28.61   0.01   2.80


    Any ideas what might be causing the high await times here? I've
checked the EVA and everything
is fine. I'm at a lost to explain this and it's much worse when
running "dt" to generate the file.

    Thanks for any help in advance.
                Tom




0
Reply T 11/15/2010 1:25:22 PM

am I the only one who finds iostat output really hard to parse?  a number of years ago I wrote the tool collectl that essentially reports the same data but in an an easier to read format but more important also can show cpu load, network, memory and a whole slew of other things, making it much easier to see on a single line per sampling period.

But back to the original question - iowait is one of the most confusing measurements in linux as it had nothing to do with CPU usage!  Rather, it just tells you the cpu had nothing else to do AND there was I/O in progress, something you'd expect to see when moving files around.  Clearly you should care if your other CPU loads indicators are high but this is not one of them.

If you want to see for yourself, try doing some file copies and watch iowait which nothing else is going on.  Now run some compute-bound tasks and repeat the I/O tests.  the iowait will go down but the file copies will run at the same speed.  In other words, the system used those SPARE cycles reported by iowait to do real work.  Make sense?

-mark
0
Reply Mark 11/28/2010 3:55:09 PM


At Sun, 28 Nov 2010 09:55:07 -0600 Mark Seger <user@compgroups.net/> wrote:

> 
> am I the only one who finds iostat output really hard to parse?  a number of years ago I wrote the tool collectl that essentially reports the same data but in an an easier to read format but more important also can show cpu load, network, memory and a whole slew of other things, making it much easier to see on a single line per sampling period.
> 
> But back to the original question - iowait is one of the most confusing measurements in linux as it had nothing to do with CPU usage!  Rather, it just tells you the cpu had nothing else to do AND there was I/O in progress, something you'd expect to see when moving files around.  Clearly you should care if your other CPU loads indicators are high but this is not one of them.
> 
> If you want to see for yourself, try doing some file copies and watch iowait which nothing else is going on.  Now run some compute-bound tasks and repeat the I/O tests.  the iowait will go down but the file copies will run at the same speed.  In other words, the system used those SPARE cycles reported by iowait to do real work.  Make sense?

Sure it makes sense.  High I/O Wait times just mean you have spare
processor cycles AND you are doing lots of I/O to *relatively* slow I/O
devices (slow relative to your processor and the devices are 'smart'
and 'automious', needing little or no processor cycles). If your system
is always showing I/O Wait, it just means you might consider installing
SETI at home (or something like that). 

> 
> -mark
> 
> 
> 
>                           

-- 
Robert Heller             -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software        -- http://www.deepsoft.com/
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments


                                                                                                          
0
Reply Robert 11/28/2010 4:56:08 PM

On Nov 28, 10:55=A0am, Mark Seger <u...@compgroups.net/> wrote:
> am I the only one who finds iostat output really hard to parse? =A0a numb=
er of years ago I wrote the tool collectl that essentially reports the same=
 data but in an an easier to read format but more important also can show c=
pu load, network, memory and a whole slew of other things, making it much e=
asier to see on a single line per sampling period.
>
> But back to the original question - iowait is one of the most confusing m=
easurements in linux as it had nothing to do with CPU usage! =A0Rather, it =
just tells you the cpu had nothing else to do AND there was I/O in progress=
, something you'd expect to see when moving files around. =A0Clearly you sh=
ould care if your other CPU loads indicators are high but this is not one o=
f them.
>
> If you want to see for yourself, try doing some file copies and watch iow=
ait which nothing else is going on. =A0Now run some compute-bound tasks and=
 repeat the I/O tests. =A0the iowait will go down but the file copies will =
run at the same speed. =A0In other words, the system used those SPARE cycle=
s reported by iowait to do real work. =A0Make sense?
>
> -mark

Hi,

Thanks for the follow up. If I understand you correctly your saying
that the high "awaits" in iostat should be expected with faster CPU's
but slower disks, it showing how much faster the CPU is compared to
the disk...

The man page says:

await:
The average  time (in milliseconds) for I/O requests issued to the
device to be served. This includes the time spent by the requests in
queue and the time spent servicing them.

I thought that meant how fast the disk could run the command and how
long it spent waiting to run the command...

I took a look at collectl (to see if I can find the bottleneck in the
system). I ran:

dt of=3Dtest limit=3D60g bs=3D1m disable=3Dcompare,verify

I then ran collectl:

% collectl -scmd -i 5
waiting for 5 second sample...
#<--------CPU--------><-----------Memory-----------><----------
Disks----------->
#cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads
KBWrit Writes
   8   8  1511    922  52G 364M   9G   7G   1G 280M      0      0
162538    492
  11  11  1531    677  50G 364M  10G   8G   1G 280M      0      0
177973    524
   7   7  1490   1509  47G 364M  13G  11G   1G 280M      0      0
170507    483
   7   7  1438   1196  45G 364M  15G  13G   1G 280M      0      0
138596    427
   8   8  1436   1131  42G 364M  17G  15G   1G 280M      0      0
138558    422
   8   8  1457   1197  40G 364M  20G  18G   1G 280M      0      0
132041    433
   8   8  1430   1189  37G 364M  22G  20G   1G 281M      0      0
132460    428
   8   8  1436   1193  35G 364M  25G  22G   1G 281M      0      0
132174    429
   8   8  1414   1117  33G 364M  27G  25G   1G 281M      0      0
133698    408
  11  11  1502    901  31G 364M  29G  27G   1G 281M      0      0
148501    487
   8   8  1440   1255  28G 364M  31G  29G   2G 280M      0      0
147887    431
   7   7  1398   1227  26G 364M  33G  31G   2G 280M      0      0
129834    381
   7   7  1433   1141  24G 364M  35G  33G   2G 280M      0      0
134437    419
   8   8  1415   1131  22G 364M  37G  35G   2G 280M      0      0
136116    405
   3   3  1424   1161  21G 364M  38G  36G   2G 280M      0      0
133642    411
   4   4  1441   1109  20G 364M  38G  36G   2G 280M      0      0
135465    430
   3   3  1423   1140  20G 364M  39G  37G   2G 280M      0      0
137583    413
   3   3  1428   1185  19G 364M  40G  38G   2G 281M      0      0
133996    419
   3   3  1456   1268  18G 364M  40G  38G   2G 280M      0      0
133798    443
   3   3  1415   1261  18G 364M  41G  39G   2G 281M      0      0
133591    408
   3   3  1413   1160  17G 364M  41G  39G   2G 281M      0      0
133241    404
   3   3  1406   1183  17G 364M  42G  40G   2G 281M      0      0
138246    401
   3   3  1411   1182  16G 364M  43G  41G   2G 281M      0      0
133536    408
   3   3  1400   1274  15G 364M  43G  41G   2G 281M      0      0
135719    394
Ouch!


If I'm reading this correctly, everything looks fine... but this
machine is slower than a 3 year old
Dell server and I can't explain why...

Thanks
  Tom
0
Reply T 11/30/2010 1:19:58 PM

> Thanks for the follow up. If I understand you correctly your saying
> that the high "awaits" in iostat should be expected with faster CPU's
> but slower disks, it showing how much faster the CPU is compared to
> the disk...
>
> The man page says:
>
> await:
> The average =A0time (in milliseconds) for I/O requests issued to the
> device to be served. This includes the time spent by the requests in
> queue and the time spent servicing them.
>
> I thought that meant how fast the disk could run the command and how
> long it spent waiting to run the command...
>

consider what an I/O wait time on 1000msec means when doing
synchronous I/O, for example in vi.  you type a single character and
vi writes it to disk (for recovery) and waits for the data to actually
be committed to disk.  If you have to wait 1000msec for your data to
get there you literally end up waiting 1 full second for your
character to echo!  Some people may think this system is overloaded
when in fact the cpu might even be idle.  You're just waiting for your
request to get through the queue.  I've seen systems with a wait of 5K
msec on which vi was unusable.  On the other hand asynchronous i/o
would run just fine.

> I took a look at collectl (to see if I can find the bottleneck in the
> system). I ran:
>
> dt of=3Dtest limit=3D60g bs=3D1m disable=3Dcompare,verify
>
> I then ran collectl:
>
> % collectl -scmd -i 5
> waiting for 5 second sample...
> #<--------CPU--------><-----------Memory-----------><----------
> Disks----------->
> #cpu sys inter =A0ctxsw Free Buff Cach Inac Slab =A0Map KBRead =A0Reads
> KBWrit Writes
> =A0 =A08 =A0 8 =A01511 =A0 =A0922 =A052G 364M =A0 9G =A0 7G =A0 1G 280M =
=A0 =A0 =A00 =A0 =A0 =A00
> 162538 =A0 =A0492
> =A0 11 =A011 =A01531 =A0 =A0677 =A050G 364M =A010G =A0 8G =A0 1G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 177973 =A0 =A0524
> =A0 =A07 =A0 7 =A01490 =A0 1509 =A047G 364M =A013G =A011G =A0 1G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 170507 =A0 =A0483
> =A0 =A07 =A0 7 =A01438 =A0 1196 =A045G 364M =A015G =A013G =A0 1G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 138596 =A0 =A0427
> =A0 =A08 =A0 8 =A01436 =A0 1131 =A042G 364M =A017G =A015G =A0 1G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 138558 =A0 =A0422
> =A0 =A08 =A0 8 =A01457 =A0 1197 =A040G 364M =A020G =A018G =A0 1G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 132041 =A0 =A0433
> =A0 =A08 =A0 8 =A01430 =A0 1189 =A037G 364M =A022G =A020G =A0 1G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 132460 =A0 =A0428
> =A0 =A08 =A0 8 =A01436 =A0 1193 =A035G 364M =A025G =A022G =A0 1G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 132174 =A0 =A0429
> =A0 =A08 =A0 8 =A01414 =A0 1117 =A033G 364M =A027G =A025G =A0 1G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133698 =A0 =A0408
> =A0 11 =A011 =A01502 =A0 =A0901 =A031G 364M =A029G =A027G =A0 1G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 148501 =A0 =A0487
> =A0 =A08 =A0 8 =A01440 =A0 1255 =A028G 364M =A031G =A029G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 147887 =A0 =A0431
> =A0 =A07 =A0 7 =A01398 =A0 1227 =A026G 364M =A033G =A031G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 129834 =A0 =A0381
> =A0 =A07 =A0 7 =A01433 =A0 1141 =A024G 364M =A035G =A033G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 134437 =A0 =A0419
> =A0 =A08 =A0 8 =A01415 =A0 1131 =A022G 364M =A037G =A035G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 136116 =A0 =A0405
> =A0 =A03 =A0 3 =A01424 =A0 1161 =A021G 364M =A038G =A036G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133642 =A0 =A0411
> =A0 =A04 =A0 4 =A01441 =A0 1109 =A020G 364M =A038G =A036G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 135465 =A0 =A0430
> =A0 =A03 =A0 3 =A01423 =A0 1140 =A020G 364M =A039G =A037G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 137583 =A0 =A0413
> =A0 =A03 =A0 3 =A01428 =A0 1185 =A019G 364M =A040G =A038G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133996 =A0 =A0419
> =A0 =A03 =A0 3 =A01456 =A0 1268 =A018G 364M =A040G =A038G =A0 2G 280M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133798 =A0 =A0443
> =A0 =A03 =A0 3 =A01415 =A0 1261 =A018G 364M =A041G =A039G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133591 =A0 =A0408
> =A0 =A03 =A0 3 =A01413 =A0 1160 =A017G 364M =A041G =A039G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133241 =A0 =A0404
> =A0 =A03 =A0 3 =A01406 =A0 1183 =A017G 364M =A042G =A040G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 138246 =A0 =A0401
> =A0 =A03 =A0 3 =A01411 =A0 1182 =A016G 364M =A043G =A041G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 133536 =A0 =A0408
> =A0 =A03 =A0 3 =A01400 =A0 1274 =A015G 364M =A043G =A041G =A0 2G 281M =A0=
 =A0 =A00 =A0 =A0 =A00
> 135719 =A0 =A0394
> Ouch!
>
> If I'm reading this correctly, everything looks fine... but this
> machine is slower than a 3 year old
> Dell server and I can't explain why...
>

alas, with this command you're actually looking at total [summary data
in collectl-speak] and not the same thing iostat would show.  Try the
command "collectl -sD" [uppercase D means detail data] and you'll see
the same thing I/O stats shows.  Now try the command "collectl -sD --
home" and it will move the cursor to the home position, resulting in
top-like output for disk usage.  Very slick if I do say so myself.
You can also toss in cpu, memory and network summary data if you like
by "collectl -scmnD --home" or any other combination of subsystem data
with -s.

enjoy
-mark
0
Reply Mark 12/1/2010 2:56:56 PM

4 Replies
2692 Views

(page loaded in 0.078 seconds)

Similiar Articles:













7/20/2012 1:48:25 PM


Reply: