Month: September 2009

iostat demystified

Recently, we’ve been looking into our options for a new SAN at work. That I’ll save for a whole other post. In our search, it became apparent that we didnt truly understand how much we were utilizing our current system. Our current product requires that we purchase a license in order to check these statistics on the SAN, so we turned to the servers for some more insight.

The majority (if not ALL) of our servers are running some flavour of linux, most of which are RHEL 4.x and 5.x. RHEL (and most other distro’s) offer a package called sysstat, which includes an I/O reporting tool called iostat.

The output of iostat looks something like:

[war@somehost ~]$ iostat -x
Linux 2.6.30.5-43.fc11.i686.PAE (somehost) 	09/24/2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.87    0.01    1.52    0.22    0.00   96.38

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.08    24.70    4.79    4.36   258.97   232.48    53.74     0.21   23.45   2.27   2.07
sda1              0.00     0.00    0.00    0.00     0.00     0.00    16.26     0.00    7.58   6.91   0.00
sda2              0.08    24.70    4.79    4.36   258.97   232.48    53.74     0.21   23.45   2.27   2.07
dm-0              0.00     0.00    4.83   28.95   258.63   231.64    14.51     0.04    1.21   0.61   2.07
dm-1              0.00     0.00    0.04    0.10     0.34     0.84     8.00     0.02  120.62   1.57   0.02

This is a bit daunting. Lots of info, and no real descriptions. sda{1,2} are your partitions/mounts, dm-{0,1} are virtual devices used by LVM (if you’re using LVM). The rest is somewhat cryptic. The man page for iostat clears things up slightly, but you may not have a full understanding after just reading these descriptions.

(from the iostat man page)
rrqm/s: The number of read requests merged per second that were queued to the device.
wrqm/s: The number of write requests merged per second that were queued to the device.
r/s: The number of read requests that were issued to the device per second.
w/s: The number of write requests that were issued to the device per second.
rsec/s: The number of sectors read from the device per second.
wsec/s: The number of sectors written to the device per second.
rkB/s: The number of kilobytes read from the device per second.
wkB/s: The number of kilobytes written to the device per second.
avgrq-sz: The average size (in sectors) of the requests that were issued to the device.
avgqu-sz: The average queue length of the requests that were issued to the device.
await: The average time (in milliseconds) for I/O requests issued to the device to be served.
svctm: The average service time (in milliseconds) for I/O requests that were issued to the device.
%util: Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

Personally, I’m working on learning this output, so I’m going to use this blog entry as my notes on what these stats mean, and how they react to disk activity. I’ll review all of the stats which i’ve been able to figure out.

rrqm/s and wrqm/s, r/s and w/s

These are all about read and write requests that had to be queued because the drive was busy when the request came in. You can drive these up with some simple tests.

Use DD to write a lot of data to a local disk, and you’ll see the wrqm/s, and w/s counters raise.

I started iostat, and then started dd, writing a 2GB file to my home directory.
dd:

[war@somehost ~]$ dd if=/dev/zero of=foo bs=8k count=262144
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 31.0321 s, 69.2 MB/s
[war@somehost ~]$ 

Now, here’s the iostat command, the -x displays extended statistics, and the 1 tells it to refresh every second.

[war@somehost ~]$ iostat -x 1
Linux 2.6.30.5-43.fc11.i686.PAE (somehost) 	09/24/2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.71    0.00    2.99    0.00    0.00   93.30

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.44    0.00    5.78   12.15    0.00   80.63

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00  7811.00    3.00  395.00    40.00 29024.00    73.03    48.30   76.60   1.11  44.10
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00  7811.00    3.00  395.00    40.00 29024.00    73.03    48.30   76.60   1.11  44.10
dm-0              0.00     0.00    3.00 8354.00    40.00 66832.00     8.00   949.32   51.63   0.05  44.10
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.29    0.00    4.29   35.00    0.00   56.43

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 13542.00    0.00  372.00     0.00 108336.00   291.23   141.91  350.92   2.69 100.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00 13542.00    0.00  372.00     0.00 108336.00   291.23   141.91  350.92   2.69 100.00
dm-0              0.00     0.00    0.00 13897.00     0.00 111176.00     8.00  5494.24  349.96   0.07 100.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.44    0.00    4.56   32.73    0.00   61.27

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00 18120.00    0.00  468.00     0.00 147456.00   315.08   138.46  316.45   2.14 100.00
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00 18120.00    0.00  468.00     0.00 147456.00   315.08   138.46  316.45   2.14 100.00
dm-0              0.00     0.00    0.00 18592.00     0.00 148736.00     8.00  5450.12  313.56   0.05 100.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

/dev/sda2 is a Logical Volume that contains / on my system.
/dev/dm-0 must be the virtual device for that logvol (honestly, i’m guessing here, look at iostat, you’ll see what i mean, look at the w/s on dm-0!)

Now, let’s see if we can get the read counters to raise.

First i tried scping a file from my workstation, to my laptop. That didnt really get me the dramatic raise in activity that dd did. Understandably, its a much slower process. Let’s see what else i can abuse.

I connected my blackberry via usb 2.0. It’s got 8gb of memory. This is the closest thing to a usb mass storage device i had handy.

This was slightly better, but still not extremely fast. I suppose the best way to stress this would be a local drive to local drive copy. At any rate, i did see the r/s and rrqm/s counters rise while the copy was being performed.

Ah Ha! /dev/null is the answer. Copy your 2gb file (created by DD earlier) to /dev/null. You’ll see r/s jump. I got about 800 out of my test.

rsec/s and wsec/s

These counters are very similar to r/s and w/s, except that they deal with sectors. Whether these are useful to you are not, depends on what sort of data collection you’re looking for.

In our example from earlier, you can see the wsec/s rose as w/s and wrqm/s did.

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda2              0.00  7811.00    3.00  395.00    40.00 29024.00    73.03    48.30   76.60   1.11  44.10

await

This is a rather important stat. This tells us how long requests sent to the drive are being forced to wait, in milliseconds. The higher this nubmer gets, the more of a bottleneck we can see in our storage.

I’m continuing to work with this utility, I’ll post more progress as it comes along. I’m hoping to truly get a feel for the rest of the stats.

-War…


PeerGuardian lists, imported to iptables.

At home, I have a Smoothwall which connects my network to the internet. It’s a very robust replacement for these soho routers that everyone seems to use. It’s not quite as plug and play, but it works very well, and I have a lot more control over it.

I also run PeerGuardian, from Phoenix Labs, on my workstations to help block certain access to my machines. Peer Guardian is a great program, and most of the time it works very well. The problem is, sometimes it has issues, and to be honest, I always thought it’d be cleaner to put the firewalling, on my…. Firewall! So i set out to find a way to add peerguardian’s lists to my Smoothie.

There’s a Project called moblock, which is supposed to do this. Well, i’ve never seen it work. Thats not to say it doesnt work, i just couldnt get it working on my Smoothie. So for a very long time, i went on using peer guardian locally. Well recently I happened to be watching peer guardian run its update, and realized that it;s pulling its lists from an http address. Makes sense that i might be able to do the same, right? So i pointed my web browser there, and sure enough, i’m presented with a list of rules! Rules that dont match iptables, but look very easy to parse! So, I did just that. I started writing my own parser, and before long, i had a very long list of iptables compatible rules. By very long, I mean long! Over 226000 lines!

I decided that the best way to make this list easy to update was to create a new chain, called PGBLOCK, and put my rules in there. I also created a chain called PGALLOW which supersedes the block list. So i can add exceptions if i’d like.

So, on my Fedora 11 Test machine in added the following to /etc/sysconfig/iptables.

After the chain definitions (the :CHAINNAME [number:number] lines) i added 4 lines.
-N PGALLOW
-N PGBLOCK
-A INPUT -j PGALLOW
-A INPUT -j PGBLOCK

This adds the chains, and adds them to the iptables INPUT chain. This tells iptables to pass all inbound packets through my chains before they even touch any other rules.

At first, i tried entering all of my rules into the PGBLOCK chain. This worked, but delayed every inbound packet to the point that my network connection was almost useless.

So I made a slight change. I made a new chain for each class a. 253 in all (i skipped 10. and 127.), and then i setup more specific rules inside of the PGBLOCK chain. PGBLOCK now contains lines similar to:

-A INPUT -s 1.0.0.0/8 -j PGBLOCK1
-A INPUT -s 2.0.0.0/8 -j PGBLOCK2
..
-A INPUT -s 254.0.0.0/8 -j PGBLOCK254
-A INPUT -s 255.0.0.0/8 -j PGBLOCK255

Now each packet gets subjected to a couple hundred (or thousand) rules instead of 226000 of them.

Wondering if you can get ahold of my script?
Here it is: http://www.undrground.org/scripts/getpg.tar.gz

Making this work is pretty easy.
There are a few variables at the top of the scipt that point to where you’d like some things to be saved. It needs a scratch directory for the lists it downloads. You need write access as the user youre running as, to the directory you’re running it from, and the lists directory, of course. But just set all that up, and run the script. It’ll generate a file called pg.firewall. Use that along with iptables-restore to build the firewall.

iptables-restore –noflush < pg.firewall Now, updating the firewall is a little more tricky, you need to flush the tables manually before re-importing. I did this with a perl script that looks something like: #!/usr/bin/perl foreach (1..255) { if ($_ eq 10 || $_ eq 127) { next; } system("/usr/sbin/iptables -F PGBLOCK$_"); } system("/usr/sbin/iptables-restore --noflush < /root/pg.firewall"); This flushes the tables, and then imports the new list. I hope this helps someone else out along the way. Enjoy! -War...