[Linux-cachefs] locking/refcount problems in cachefiles.
NeilBrown
neilb at suse.de
Wed Jan 29 01:25:53 UTC 2014
Hi,
I have a report of a crash in fs/cachefiles that I'm looking into.
The crash happened on a 3.0 based kernel (SLES11-SP3) but the code looks
much the same in mainline.
Analysis of the crash dump suggests that fscache_object_destroy, and thus
__rb_erase_colour, is being called on an object that has already been
destroy and is no longer in the rb tree. The rbtree code gets upset and
crashes.
Looking at the code I think it is easy to see what happened (well ... easy
about the 3rd time I looked. Before then it was impossible). However I
might have missed something and I don't know the code very well, hence this
email to check my understanding and comment on a possible fix.
The problem I see revolves around the refcounting in struct
cachefiles_object. This has the fairly common property that when the count
reaches zero, the object is destroyed. In such cases it is vitally
important that it is never increased from zero to non-zero.
There are two rules to achieving this.
- The first is that normally we only take a reference when we already own
a reference.
- The second is that if we don't own a reference, then we must hold a lock
that prevents the last reference being dropped while we take our
reference.
This last is often enabled using "atomic_dec_and_lock".
cachefiles_objects live in an rbtree which does not imply a reference to
the object. So just finding something in the rbtree does not mean there is
a reference. So it is only safe to take a reference if you are holding an
appropriate lock.
cachefiles_mark_object_active() together with cachefiles_put_object() get
this wrong.
If cachefiles_mark_object_active() finds an object in the rbtree it will
increment the refcount while still holding the ->active_lock, which is
correct. However cachefiles_put_object() does not take the lock when the
refcount might become zero.
Thus you can get a race
Thread 1 Thread 2
cachefiles_mark_object_active finds a
conflicting object.
cachefiles_put_object
decrements ->usage to zero
so decides to free it.
cachefiles_mark_object_active increments
->usage (to 1) and drops the lock
cachefiles_put_object calls
fscache_object_destroy which
unlinks from the rb tree.
cachefiles_mark_object_active notices that
CACHEFILES_OBJECT_ACTIVE is not set, so
calls ->put_object which calls in to
cachefiles_put_object and
fscache_object_destroy and the object is
removed from the rbtree again.
A simple fix would be to replace the atomic_dec_and_test() in
cachefiles_put_object with atomic_dec_and_lock(), locking
cache->active_lock, and unlock() the ->active_lock after
fscache_object_destroy() has been called. This looks like
it is safe from a locking-hierarchy perspective, but I could be missing
something.
As ->active_lock is a rwlock rather than a spin_lock, we need to make an
atomic_dec_and_write_lock(), but that is quite straight forward.
So, I'm proposing the following. If you agree with my reasoning and my fix
I'll submit a properly described patch, and maybe even move
atomic_dec_and_write_lock() into a library somewhere.
(note: patch compile tested but not run).
Thanks,
NeilBrown
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 57e17fe6121a..4e9c230f1be6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -301,6 +301,22 @@ static void cachefiles_drop_object(struct fscache_object *_object)
_leave("");
}
+/* cribbed from lib/dec_and_lock.c */
+static int atomic_dec_and_write_lock(atomic_t *atomic, rwlock_t *lock)
+{
+ /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+ if (atomic_add_unless(atomic, -1, 1))
+ return 0;
+
+ /* Otherwise do it the slow way */
+ write_lock(lock);
+ if (atomic_dec_and_test(atomic))
+ return 1;
+ write_unlock(lock);
+ return 0;
+}
+
+
/*
* dispose of a reference to an object
*/
@@ -308,6 +324,7 @@ static void cachefiles_put_object(struct fscache_object *_object)
{
struct cachefiles_object *object;
struct fscache_cache *cache;
+ struct cachefiles_cache *cfcache;
ASSERT(_object);
@@ -323,7 +340,9 @@ static void cachefiles_put_object(struct fscache_object *_object)
ASSERTIFCMP(object->fscache.parent,
object->fscache.parent->n_children, >, 0);
- if (atomic_dec_and_test(&object->usage)) {
+ cache = object->fscache.cache;
+ cfcache = container_of(cache, struct cachefiles_cache, cache);
+ if (atomic_dec_and_write_lock(&object->usage, &cfcache->active_lock)) {
_debug("- kill object OBJ%x", object->fscache.debug_id);
ASSERT(!test_bit(CACHEFILES_OBJECT_ACTIVE, &object->flags));
@@ -340,8 +359,8 @@ static void cachefiles_put_object(struct fscache_object *_object)
object->lookup_data = NULL;
}
- cache = object->fscache.cache;
fscache_object_destroy(&object->fscache);
+ write_unlock(&cfcache->active_lock);
kmem_cache_free(cachefiles_object_jar, object);
fscache_object_destroyed(cache);
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cachefs/attachments/20140129/0edca61f/attachment.sig>
More information about the Linux-cachefs
mailing list