[Linux-cachefs] File corruption using OpenVZ, NFS and cachefilesd

Kelsey Cummings kelsey.cummings at sonic.com
Fri Jan 13 00:58:01 UTC 2017


We have OpenVZ containers in a HA web cluster using NetApp shared NFS 
storage. NFS is almost exclusively RO and use for serving php and 
associated objects to the web servers.  This config has been in place 
for serveral years.*

Typical upgrade procedure for the app is to copy the new version of the 
app into the bound mount while on one of the guests.  Sometime over the 
last few weeks we've seen occasional corruption of files as returned by 
one of the guests.  There's no indication of hardware related issues.

Today, we found Guest B exhibiting corruption on a recently updated 
file.  Guest A and a 3rd party host all showed the correct uncorrupted 
version of the file.  The contents of the file returned by B appeared to 
be the original version truncated to the length of the new version.  The 
stat() object on both guests and the 3rd host all matched.  Touching the 
file on B did not cause it to correct itself.  Touching the file on one 
of the other hosts did cause B to return the correct contents.

Hopefully this explanation makes sense!  Any ideas?

Host A (OpenVZ Host)
  nfs mount /nfs/apps
  cachefilesd

Guest A (Container)
  bound mount /nfs/apps/X /nfs/approot

Host B (OpenVZ Host)
  nfs mount /nfs/apps
  cachefilesd

Guest B (Container)
  bound mount /nfs/apps/X /nfs/approot

*) While investigating this we discovered that the fsc option was not 
enabled on the nfs mounts but despite this, the cache was clearly being 
populated and by files stored on NFS.  Presumably this implies it is 
working and reading from the cache as well.

Scientific Linux 6.8 (rolling)
OpenVZ Kernel 2.6.32-042stab120.16.x86_64
cachefilesd-0.10.2-3

-- 
kelsey.cummings at sonic.com                 sonic.net, inc.
System Architect                          2260 Apollo Way
707.522.1000                              Santa Rosa, CA




More information about the Linux-cachefs mailing list