[Linux-cachefs] File corruption using OpenVZ, NFS and cachefilesd
Kelsey Cummings
kelsey.cummings at sonic.com
Fri Jan 13 00:58:01 UTC 2017
We have OpenVZ containers in a HA web cluster using NetApp shared NFS
storage. NFS is almost exclusively RO and use for serving php and
associated objects to the web servers. This config has been in place
for serveral years.*
Typical upgrade procedure for the app is to copy the new version of the
app into the bound mount while on one of the guests. Sometime over the
last few weeks we've seen occasional corruption of files as returned by
one of the guests. There's no indication of hardware related issues.
Today, we found Guest B exhibiting corruption on a recently updated
file. Guest A and a 3rd party host all showed the correct uncorrupted
version of the file. The contents of the file returned by B appeared to
be the original version truncated to the length of the new version. The
stat() object on both guests and the 3rd host all matched. Touching the
file on B did not cause it to correct itself. Touching the file on one
of the other hosts did cause B to return the correct contents.
Hopefully this explanation makes sense! Any ideas?
Host A (OpenVZ Host)
nfs mount /nfs/apps
cachefilesd
Guest A (Container)
bound mount /nfs/apps/X /nfs/approot
Host B (OpenVZ Host)
nfs mount /nfs/apps
cachefilesd
Guest B (Container)
bound mount /nfs/apps/X /nfs/approot
*) While investigating this we discovered that the fsc option was not
enabled on the nfs mounts but despite this, the cache was clearly being
populated and by files stored on NFS. Presumably this implies it is
working and reading from the cache as well.
Scientific Linux 6.8 (rolling)
OpenVZ Kernel 2.6.32-042stab120.16.x86_64
cachefilesd-0.10.2-3
--
kelsey.cummings at sonic.com sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 Santa Rosa, CA
More information about the Linux-cachefs
mailing list