Month: October 2009

Categories

Just a quick note. I’ve taken a little time to setup categories, and then took a bit more time moving all of my entries into their appropriate categories. Given the random nature of this blog, i thought this would be helpful for those of you who like reading my tech articles, but dont really care…


SLES 10, your kernel is not safe!

So, i recently came across a startling discovery. On a SLES 10 server, when you install a kernel update, the update process kindly DELETES your old kernel. It’s not clear to me yet if it does this after its next succesful reboot, or if it does it during the update. In other words, the people at SuSE/Novell are _SO_ confident that you’ll never have a problem with a brand spankin new kernel, that they perform without a net. I’m a very conservative sysadmin. I don’t like to do anything without a backup plan. When it comes to kernel updates, that backup plan is option 1 in my grub.conf (option 1 being the second option in my boot list, generally, my old kernel). From what I’m reading, there’s also no way to tell the update process NOT to delete the old kernel. So you’re sort of stuck with this behaviour. This actually bit us a few days ago, when due to some rather odd circumstances, we ended up with a SLES that was trying to boot a kernel that was 1 revision old. Because SLES thought it had cleaned up this kernel, the /lib/modules// directory for this kernel was empty. This obviously caused some confusion on the kernel’s part, and it refused to boot. If the update process had left the older boot/module files alone, and left it up to a responsible sysadmin to clean up old kernels when they saw fit, this wouldn’t have happened. Granted, in this case, the server had other issues, but that’s a different story.

So I’ve set out to fix this. Giving yourself some peace of mind is as simple as taking your kernel, and its modules, and locking copies of them away in a safe deposit box (or at least a backup directory) during the update process. And then putting them somewhere accessible afterwards, then re-add ing the old kernel to grub. This is all well and good, if you had one, maybe two servers to worry about, go ahead and do it manually. If you have a couple dozen, this is a considerable amount of work to do manually, and it takes up your time!

So, i wrote a script to do it for you! It’s a perl script, and it should run on a base install of SLES (or, so it has in my testing).
You can download it here.

Just download it to your SLES server, and run it, it’ll do the work for you. Run it before your update, and select option 1, which backs up the kernel. Then run it again after the update, and select option 2, which restores the kernel.

Enjoy!

-War…


Zettabyte File System (ZFS)

We’ve been doing a lot of storage research lately, and there’s been a lot of talk about ZFS. I’m going to spare you the magazine article (if you want to read more on what it is, and where it comes from, look elsewhere) and give you some guts.

ZFS is a 128-bit file system, and unfortunately isnt likely to be built into the linux kernel anytime soon. You can however, use it in userspace, using zfs-fuse, similarly to how you might use NTFS on linux (for those of us still dual booting). The machine i’m running on, runs solely Fedora Core 11, and has a handsome amount of beef behind it. It’s also got 500gb of local storage, so I can play around with huge files no sweat. You can do the same things i’m doing, with smaller files, if you’d like.

First of all, you’ll need to install zfs-fuze, this was simple on Fedora.

$ sudo yum install zfs-fuse

Next some blank disk images to toy with.

$ mkdir zfs
$ cd zfs
$ for i in $(seq 8); do dd if=/dev/zero of=$i bs=1024 count=2097152;done

This gives me 8, 2gb blobs. Make these smaller if you’d like. I wanted enough space to throw some large files at zfs. You’ll see in a bit.

Now let’s make our first zfs pool.

$ sudo zpool create jose ~/zfs/1 ~/zfs/2 ~/zfs/3 ~/zfs/4 

I named my pool jose. I like it when my blog entries have personality. 😛

zfs list will give you a list of your zfs pools.

$ sudo zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
jose    72K  7.81G    18K  /jose

Creating the pool also mounts it.

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      454G  210G  221G  49% /
/dev/sda1             190M   30M  151M  17% /boot
tmpfs                 2.0G   25M  2.0G   2% /dev/shm
jose                  7.9G   18K  7.9G   1% /jose

An interesting note. I never created a file system on this pool, i just told zfs to have at it. zfs must work at a block level with the drives.

Now, let’s poke jose with a stick, and see what he does.

$ sudo dd if=/dev/zero of=/jose/testfile bs=1024 count=2097512
2097512+0 records in
2097512+0 records out
2147852288 bytes (2.1 GB) copied, 118.966 s, 18.1 MB/s

$ sudo zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
jose  2.00G  5.81G  2.00G  /jose

Its worth note, that with a zpool add /dev/whatever you can add space to a pool of this sort.

That’s all fun, but this is essentially just a large file system. No really cool features yet. Let’s see what we can really so with this thing.

Let’s make a raid group, instead of just a standard pool.

Goodbye Jose

$ sudo zpool destroy jose

From jose’s ashes, lets make a new pool.

$ sudo zpool create susan raidz ~/zfs/1 ~/zfs/2 ~/zfs/3 ~/zfs/4
$ sudo zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
susan  92.0K  5.84G  26.9K  /susan

Notice that susan is smaller than jose, using the same disks. This isn’t because susan has made more trips to the gym than jose, rather it’s because of the raid set. This is similar to raid 5, where one disk is taken for parity. So you lose a one disk worth of capacity.

Let’s remedy that, by throwing more (virtual) hardware at it.

You cant expand a raid group, by adding a disk, so we’ll do it by recreating the group.

$ sudo zpool destroy susan
$ sudo zpool create susan raidz ~/zfs/1 ~/zfs/2 ~/zfs/3 ~/zfs/4 ~/zfs/5
$ sudo zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
susan  98.3K  7.81G  28.8K  /susan

And there you go, about 8gb again.
Now let’s poke susan with a stick.

First, here’s her status:

$ sudo zpool status
  pool: susan
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Tue Oct  6 15:22:24 2009
config:

	NAME                    STATE     READ WRITE CKSUM
	susan                   ONLINE       0     0     0
	  raidz1                ONLINE       0     0     0
	    /home/lagern/zfs/1  ONLINE       0     0     0
	    /home/lagern/zfs/2  ONLINE       0     0     0
	    /home/lagern/zfs/3  ONLINE       0     0     0
	    /home/lagern/zfs/4  ONLINE       0     0     0
	    /home/lagern/zfs/5  ONLINE       0     0     0

errors: No known data errors

Now we’ll dd another file to susan, and we’ll see if we can damage the array.

$ sudo dd if=/dev/zero of=/susan/testfile bs=1024 count=2097512

Then, in another terminal…

$ sudo zpool offline susan ~/zfs/4
$ sudo zpool status
  pool: susan
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
 scrub: scrub completed after 0h0m with 0 errors on Tue Oct  6 15:22:24 2009
config:

	NAME                    STATE     READ WRITE CKSUM
	susan                   DEGRADED     0     0     0
	  raidz1                DEGRADED     0     0     0
	    /home/lagern/zfs/1  ONLINE       0     0     0
	    /home/lagern/zfs/2  ONLINE       0     0     0
	    /home/lagern/zfs/3  ONLINE       0     0     0
	    /home/lagern/zfs/4  OFFLINE      0     0     0
	    /home/lagern/zfs/5  ONLINE       0     0     0

errors: No known data errors

The dd is still running.

$ sudo zpool online susan ~/zfs/4

DD’s still going…..

DD finally finished, and it took a little longer than the first copy, but it finished, and the file appears correct.

Now, let’s try something else. With raid, you generally wont just take a drive offline, and then bring it right back, so let’s see what happens if you replace the drive.

Another dd session, and then the drive swap commands.

$ sudo dd if=/dev/zero of=/susan/testfile2 bs=1024 count=2097512

In another terminal…

$ sudo zpool status
  pool: susan
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Tue Oct  6 15:26:06 2009
config:

	NAME                    STATE     READ WRITE CKSUM
	susan                   ONLINE       0     0     0
	  raidz1                ONLINE       0     0     0
	    /home/lagern/zfs/1  ONLINE       0     0     0
	    /home/lagern/zfs/2  ONLINE       0     0     0
	    /home/lagern/zfs/3  ONLINE       0     0     0
	    /home/lagern/zfs/4  ONLINE       0     0     0
	    /home/lagern/zfs/5  ONLINE       0     0     0

errors: No known data errors
$ sudo zpool offline susan ~/zfs/4
$ sudo zpool replace susan ~/zfs/4 ~/zfs/6
$ sudo zpool status
  pool: susan
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h1m, 25.87% done, 0h3m to go
config:

	NAME                      STATE     READ WRITE CKSUM
	susan                     DEGRADED     0     0     0
	  raidz1                  DEGRADED     0     0     0
	    /home/lagern/zfs/1    ONLINE       0     0     0
	    /home/lagern/zfs/2    ONLINE       0     0     0
	    /home/lagern/zfs/3    ONLINE       0     0     0
	    replacing             DEGRADED     0     0     0
	      /home/lagern/zfs/4  OFFLINE      0     0     0
	      /home/lagern/zfs/6  ONLINE       0     0     0
	    /home/lagern/zfs/5    ONLINE       0     0     0

errors: No known data errors

This procedure seriously degraded the speed of the dd. It also made my music chop, once.
After the dd finished, the status was happy again:

$ sudo dd if=/dev/zero of=/susan/testfile2 bs=1024 count=2097512
2097512+0 records in
2097512+0 records out
2147852288 bytes (2.1 GB) copied, 356.92 s, 6.0 MB/s

$ sudo zpool status
  pool: susan
 state: ONLINE
 scrub: resilver completed after 0h4m with 0 errors on Tue Oct  6 15:35:52 2009
config:

	NAME                    STATE     READ WRITE CKSUM
	susan                   ONLINE       0     0     0
	  raidz1                ONLINE       0     0     0
	    /home/lagern/zfs/1  ONLINE       0     0     0
	    /home/lagern/zfs/2  ONLINE       0     0     0
	    /home/lagern/zfs/3  ONLINE       0     0     0
	    /home/lagern/zfs/6  ONLINE       0     0     0
	    /home/lagern/zfs/5  ONLINE       0     0     0

errors: No known data errors

Note that 4 is now replaced with 6.

Time for some coffee………..

Now lets look at some really neat things.

I mentioned that you couldn’t expand a raid volume. What you can do is replace the disks, with larger ones. Its unclear how this affects your data though (at least, it is unclear to me!) so I’m going to try it.

First let’s make some larger “disks”.

for i in $(seq 9 13); do dd if=/dev/zero of=$i bs=1024 count=4195024; done

Here we are at the beginning

$ sudo zpool status
  pool: susan
 state: ONLINE
 scrub: resilver completed after 0h4m with 0 errors on Tue Oct  6 15:35:52 2009
config:

	NAME                    STATE     READ WRITE CKSUM
	susan                   ONLINE       0     0     0
	  raidz1                ONLINE       0     0     0
	    /home/lagern/zfs/1  ONLINE       0     0     0
	    /home/lagern/zfs/2  ONLINE       0     0     0
	    /home/lagern/zfs/3  ONLINE       0     0     0
	    /home/lagern/zfs/6  ONLINE       0     0     0
	    /home/lagern/zfs/5  ONLINE       0     0     0

errors: No known data errors

$ sudo zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
susan  4.00G  3.82G  4.00G  /susan

The new disks i created are 4GB, So we should be able to double the capacity in this pool using these disks.

$ sudo zpool replace susan ~/zfs/1 ~/zfs/9
$ sudo zpool replace susan ~/zfs/2 ~/zfs/10
$ sudo zpool status
  pool: susan
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 12.94% done, 0h6m to go
config:

	NAME                       STATE     READ WRITE CKSUM
	susan                      ONLINE       0     0     0
	  raidz1                   ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/1   ONLINE       0     0     0
	      /home/lagern/zfs/9   ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/2   ONLINE       0     0     0
	      /home/lagern/zfs/10  ONLINE       0     0     0
	    /home/lagern/zfs/3     ONLINE       0     0     0
	    /home/lagern/zfs/6     ONLINE       0     0     0
	    /home/lagern/zfs/5     ONLINE       0     0     0

errors: No known data errors
$ sudo zpool replace susan ~/zfs/3 ~/zfs/11
$ sudo zpool replace susan ~/zfs/6 ~/zfs/12
$ sudo zpool replace susan ~/zfs/5 ~/zfs/13
$ sudo zpool status
  pool: susan
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 8.21% done, 0h5m to go
config:

	NAME                       STATE     READ WRITE CKSUM
	susan                      ONLINE       0     0     0
	  raidz1                   ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/1   ONLINE       0     0     0
	      /home/lagern/zfs/9   ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/2   ONLINE       0     0     0
	      /home/lagern/zfs/10  ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/3   ONLINE       0     0     0
	      /home/lagern/zfs/11  ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/6   ONLINE       0     0     0
	      /home/lagern/zfs/12  ONLINE       0     0     0
	    replacing              ONLINE       0     0     0
	      /home/lagern/zfs/5   ONLINE       0     0     0
	      /home/lagern/zfs/13  ONLINE       0     0     0

errors: No known data errors

This took a while, and really hit my system hard. I’d recommend doing this one drive at a time.

$ top

top - 16:12:10 up 25 days,  5:27, 25 users,  load average: 11.36, 9.27, 6.20
Tasks: 280 total,   2 running, 278 sleeping,   0 stopped,   0 zombie
Cpu0  : 10.2%us,  1.3%sy,  0.0%ni, 61.0%id, 27.5%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  1.6%us,  2.9%sy,  0.0%ni,  5.5%id, 89.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  0.7%us,  0.7%sy,  0.0%ni, 92.7%id,  5.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  3.9%us,  2.0%sy,  0.0%ni, 94.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.0%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  1.3%us,  2.0%sy,  0.0%ni,  9.8%id, 86.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  5.4%us,  6.8%sy,  0.0%ni, 87.3%id,  0.0%wa,  0.0%hi,  0.6%si,  0.0%st
Cpu7  :  1.6%us,  1.3%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4121040k total,  4004956k used,   116084k free,    13756k buffers
Swap:  5406712k total,   322328k used,  5084384k free,  1441452k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                       
11021 lagern    20   0 1417m 1.1g  35m S 14.2 26.8   2393:07 VirtualBox                                                    
  313 lagern    20   0 1077m 555m  13m R 12.6 13.8   1089:52 firefox                                                       
22170 root      20   0  565m 221m 1428 S  6.6  5.5   5:57.71 zfs-fuse     

I think i’ll go read some things on my laptop while this finishes.

Done! Took about 15 minutes to complete. My test files are still present in the pool,

$ ls -lh /susan
total 4.0G
-rw-r--r-- 1 root root 2.1G 2009-10-06 15:27 testfile
-rw-r--r-- 1 root root 2.1G 2009-10-06 15:35 testfile2

My pool does not yet show the new size….

$ sudo zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
susan  4.00G  3.82G  4.00G  /susan

I remounted…

$ sudo zfs umount /susan
$ sudo zfs mount susan

No change….

According to harryd a reboot is necesasry. I’m not in the rebooting mood at the moment. I’ll try this, and report back if it doesnt work.

So, there you have it, zfs! Oh, another note. raidz is not the only raid option. raidz2 supports two parity drives. Like raid6. You can specify this via the zpool create command, using raidz2 where raidz was.

Enjoy!

-War…