[linux-lvm] Page cache corruption when creating a snapshot

malahal at us.ibm.com malahal at us.ibm.com
Fri Feb 29 19:29:20 UTC 2008


Greg Hudson [ghudson at MIT.EDU] wrote:
> On Fri, 2008-02-29 at 18:31 +0000, Alasdair G Kergon wrote:
> > On Fri, Feb 29, 2008 at 12:32:41PM -0500, ghudson at MIT.EDU wrote:
> > > The reproduction recipe looks like:
> > >   rm -rf /tmp/test
> > >   mkdir /tmp/test
> > >   # Put around 60MB of files into /tmp/test
> > >   find /tmp/test -type f | xargs md5sum > /tmp/sum.pre
> > >   lvcreate --size 2G --snapshot /dev/dink/gutsy-i386-sbuild --name testsnapshot
> > >   find /tmp/test -type f | xargs md5sum > /tmp/sum.post
> > 
> > Can you do that twice?
> >     find /tmp/test -type f | xargs md5sum > /tmp/sum.post2
> > and check the two post files are the same?
> 
> In three reproductions of the page cache corruption, sum.post2 was
> always the same as sum.post.
> 
> In my experiences with this problem in general, the page cache
> corruption is not particularly transient; once it happens, the file
> continues to appear modified (with the same incorrect contents) for the
> indefinite future, until the machine is rebooted.
> 
> > And add some syncs/blockdev --flushbufs at different places
> > in the script to see if you can make it go away.
> 
> Nope, that never made it go away.  I'm not sure in what situations
> flushing write buffers would have any effect.  If I had a way to throw
> away the read-only page cache and force a file reload from disk, I would
> expect that to eliminate the visible effect of the corruption; at the
> moment the only reliable way I know how to do that is to reboot.

Not an expert on O_DIRECT, but it is supposed to read from the disk
without creating page cache. I don't really know what it does if page
cache exists! The "dd" command has O_DIRECT support and see if you
notice any change with the corrupted file when you do "dd" with and
without O_DIRECT.

--Malahal.




More information about the linux-lvm mailing list