[Linux-cachefs] locking/refcount problems in cachefiles.

Wed Jan 29 01:25:53 UTC 2014

Hi,
 I have a report of a crash in fs/cachefiles that I'm looking into.
 The crash happened on a 3.0 based kernel (SLES11-SP3) but the code looks
 much the same in mainline.

 Analysis of the crash dump suggests that fscache_object_destroy, and thus
 __rb_erase_colour, is being called on an object that has already been
 destroy and is no longer in the rb tree.  The rbtree code gets upset and
 crashes.

 Looking at the code I think it is easy to see what happened (well ... easy
 about the 3rd time I looked.  Before then it was impossible).  However I
 might have missed something and I don't know the code very well, hence this
 email to check my understanding and comment on a possible fix.

 The problem I see revolves around the refcounting in struct
 cachefiles_object.  This has the fairly common property that when the count
 reaches  zero, the object is destroyed.  In such cases it is vitally
 important that it is never increased from zero to non-zero.

 There are two rules to achieving this.
    - The first is that normally we only take a reference when we already own
      a reference.
    - The second is that if we don't own a reference, then we must hold a lock
      that prevents the last reference being dropped while we take our
      reference.

  This last is often enabled using "atomic_dec_and_lock".

  cachefiles_objects live in an rbtree which does not imply a reference to
  the object.  So just finding something in the rbtree does not mean there is
  a reference.  So it is only safe to take a reference if you are holding an
  appropriate lock.

  cachefiles_mark_object_active() together with cachefiles_put_object() get
  this wrong.

  If cachefiles_mark_object_active() finds an object in the rbtree it will
  increment the refcount while still holding the ->active_lock, which is
  correct.  However cachefiles_put_object() does not take the lock when the
  refcount might become zero.

  Thus you can get a race

     Thread 1                       Thread 2
                                    cachefiles_mark_object_active finds a
                                      conflicting object.
     cachefiles_put_object
        decrements ->usage to zero
        so decides to free it.
                                    cachefiles_mark_object_active increments
                                       ->usage (to 1) and drops the lock
     cachefiles_put_object calls
        fscache_object_destroy which
        unlinks from the rb tree.
                                    cachefiles_mark_object_active notices that
                                    CACHEFILES_OBJECT_ACTIVE is not set, so
                                    calls ->put_object which calls in to
                                    cachefiles_put_object and
                                    fscache_object_destroy and the object is
                                    removed from the rbtree again.


  A simple fix would be to replace the atomic_dec_and_test() in
  cachefiles_put_object with atomic_dec_and_lock(), locking
  cache->active_lock, and unlock() the ->active_lock after
  fscache_object_destroy() has been called.  This looks like
  it is safe from a locking-hierarchy perspective, but I could be missing
  something.

  As ->active_lock is a rwlock rather than a spin_lock, we need to make an
  atomic_dec_and_write_lock(), but that is quite straight forward.

 So, I'm proposing the following.  If you agree with my reasoning and my fix
 I'll submit a properly described patch, and maybe even move
 atomic_dec_and_write_lock() into a library somewhere.

 (note: patch compile tested but not run).

Thanks,
NeilBrown

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 57e17fe6121a..4e9c230f1be6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -301,6 +301,22 @@ static void cachefiles_drop_object(struct fscache_object *_object)
 	_leave("");
 }
 
+/* cribbed from lib/dec_and_lock.c */
+static int atomic_dec_and_write_lock(atomic_t *atomic, rwlock_t *lock)
+{
+	/* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+	if (atomic_add_unless(atomic, -1, 1))
+		return 0;
+
+	/* Otherwise do it the slow way */
+	write_lock(lock);
+	if (atomic_dec_and_test(atomic))
+		return 1;
+	write_unlock(lock);
+	return 0;
+}
+
+
 /*
  * dispose of a reference to an object
  */
@@ -308,6 +324,7 @@ static void cachefiles_put_object(struct fscache_object *_object)
 {
 	struct cachefiles_object *object;
 	struct fscache_cache *cache;
+	struct cachefiles_cache *cfcache;
 
 	ASSERT(_object);
 
@@ -323,7 +340,9 @@ static void cachefiles_put_object(struct fscache_object *_object)
 	ASSERTIFCMP(object->fscache.parent,
 		    object->fscache.parent->n_children, >, 0);
 
-	if (atomic_dec_and_test(&object->usage)) {
+	cache = object->fscache.cache;
+	cfcache = container_of(cache, struct cachefiles_cache, cache);
+	if (atomic_dec_and_write_lock(&object->usage, &cfcache->active_lock)) {
 		_debug("- kill object OBJ%x", object->fscache.debug_id);
 
 		ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
@@ -340,8 +359,8 @@ static void cachefiles_put_object(struct fscache_object *_object)
 			object->lookup_data = NULL;
 		}
 
-		cache = object->fscache.cache;
 		fscache_object_destroy(&object->fscache);
+		write_unlock(&cfcache->active_lock);
 		kmem_cache_free(cachefiles_object_jar, object);
 		fscache_object_destroyed(cache);
 	}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cachefs/attachments/20140129/0edca61f/attachment.sig>