[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?



On Sat, Nov 07, 2015 at 12:21:29AM -0600, Jason Pepas wrote:
> Hi,
> 
> So I've been hacking together an nbdkit plugin (similar to the "file"
> plugin, but it splits the file up into chunks):
> https://github.com/pepaslabs/nbdkit-chunks-plugin
> 
> I got it to the point of being a working prototype.  Then I threw it
> onto a raspberry pi, which it turns out only has a 50/50 shot of
> fallocate() working correctly.
> 
> I'm checking the return code of fallocate(), and my chunks_pwrite()
> returns -1 if it fails.  No problems there.
> 
> When I run mkfs.ext2 /dev/nbd0 on the client, I see this on the nbd-server:
> 
> 
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030723'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030724'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030725'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030726'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030727'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000030728'
> nbdkit: chunks[1]: error: Unable to fallocate
> '/home/cell/nbds/default/chunks/00000000000000031232'
> 
> 
> Indeed, there is definitely a problem with fallocate, as some of the
> chunks are the correct size (256k), and some are zero length:
> 
> cell pi1$ pwd
> /home/cell/nbds/default/chunks
> cell pi1$ ls -l | tail
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032256
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032257
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032258
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032259
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032260
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032261
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032262
> -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032263
> -rw------- 1 cell cell      0 Nov  7 06:01 00000000000000032264
> -rw------- 1 cell cell      0 Nov  7 06:01 00000000000000032767
> 
> 
> But that's my concern.  The problem is that, alarmingly, mkfs.ext2
> isn't phased by this at all:
> 
> 
> root debian:~# nbd-client pi1 10809 /dev/nbd0
> Negotiation: ..size = 8192MB
> bs=1024, sz=8589934592 bytes
> root debian:~# mkfs.ext2 /dev/nbd0
> mke2fs 1.42.12 (29-Aug-2014)
> Creating filesystem with 2097152 4k blocks and 524288 inodes
> Filesystem UUID: 2230269c-6d2a-4927-93df-d9dd9f4fa40c
> Superblock backups stored on blocks:
>         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
> 
> Allocating group tables: done
> Writing inode tables: done
> Writing superblocks and filesystem accounting information: done
> 
> root debian:~#
> 
> 
> However, the nbd-client's dmesg is chock full of errors:
> 
> 
> [ 9832.409219] block nbd0: Other side returned error (22)
> [ 9832.457401] block nbd0: Other side returned error (22)
> [ 9832.503100] block nbd0: Other side returned error (22)
> [ 9832.542457] block nbd0: Other side returned error (22)
> [ 9832.590394] block nbd0: Other side returned error (22)
> [ 9832.642393] block nbd0: Other side returned error (22)
> [ 9832.681455] block nbd0: Other side returned error (22)
> [ 9832.721355] block nbd0: Other side returned error (22)
> [ 9832.722676] quiet_error: 15129 callbacks suppressed
> [ 9832.722679] Buffer I/O error on device nbd0, logical block 6293248
> [ 9832.724274] lost page write due to I/O error on nbd0
> [ 9832.724282] Buffer I/O error on device nbd0, logical block 6293249
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293250
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293251
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293252
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293253
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293254
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293255
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293256
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293257
> [ 9832.725110] lost page write due to I/O error on nbd0
> [ 9832.743111] block nbd0: Other side returned error (22)
> [ 9832.744420] blk_update_request: 125 callbacks suppressed
> [ 9832.744422] end_request: I/O error, dev nbd0, sector 12587008
> [ 9832.758203] block nbd0: Other side returned error (22)
> [ 9832.759513] end_request: I/O error, dev nbd0, sector 12845056
> [ 9832.777635] block nbd0: Other side returned error (22)
> [ 9832.779511] end_request: I/O error, dev nbd0, sector 12849160
> [ 9832.805950] block nbd0: Other side returned error (22)
> [ 9832.810278] end_request: I/O error, dev nbd0, sector 12849416
> [ 9832.846880] block nbd0: Other side returned error (22)
> 
> 
> So, my question / concern is, how is it that the nbd-client's kernel
> is correctly detecting massive I/O errors, but apparently not sending
> them through to mkfs.ext2?

It's definitely not good, but I don't think it can be nbdkit, since
nbd-client is seeing the errors.

> Or perhaps mkfs.ext2 doesn't check for I/O errors?  That's a bit hard
> to believe...

How about 'strace mkfs.ext2 ..' and see if any system calls are
returning errors.  That would show you whether nbd-client is throwing
errors away, or whether mkfs is getting the errors and ignoring them
(seems pretty unlikely, but you never know).

After that, it'd be down to tracing where the errors end up in the
kernel.

Rich.

> Anyway, I'm sure someone on this list has run into similar issues, so
> I thought I'd reach out before I go too far down a rabbit hole.
> 
> Thanks,
> Jason Pepas
> 
> _______________________________________________
> Libguestfs mailing list
> Libguestfs redhat com
> https://www.redhat.com/mailman/listinfo/libguestfs

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]