[Linux-cachefs] Caveats for using cachefs

Fri Sep 5 01:11:57 UTC 2014

Hello

This email is for people who are looking to use cachefs in a scenario 
where there are lots of small clients accessing the storage and where 
files sizes are small( smaller than 100MB) or non streaming IO patterns. 
These are my experiences and may not be the same for everyone.

1. No way of monitoring the benefit of the caching external to your 
application. The stats that are provided don't include any reads from 
memory, only from the disk cache. This means that it greatly 
underreports the cache hits. There is no way to determine the data churn 
in your cache.
2. Metadata is not cached. When you are accessing files particularly 
application/executable files, reads are quite small. Every read 
generates a metadata read on the original filer. Meta data operations 
for small read type IO patterns can keep the filers very busy.
3. Doesn't handle Dynamic IP clustering. In the case of EMC isilon, the 
client is given a dynamic IP. This means that even if the same location 
is mounted it invalidates the cache if a different IP is given.

We were trying to use it in a very controlled scenario, accessing RO 
data where files are never modified. Files are only created. Even in 
this situation the benefits were minimal.

We have been excited about this feature since it was a tech preview in 
RHEL5.1 in November 2007. It has only become stable for us in RHEL6.5, 
released Nov 2013 still as a tech preview.

If anyone would like clarification on the above please feel free to email.

I hope this helps others.

Grant