[linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"

Heming Zhao heming.zhao at suse.com
Mon Oct 14 03:13:13 UTC 2019


For the issue in bcache_flush, it's related with cache->errored.

I give my fix. I believe there should have better solution than my.

Solution:
To keep cache->errored, but this list only use to save error data,
and the error data never resend.
So bcache_flush check the cache->errored, when the errored list is not empty,
bcache_flush return false, it will trigger caller/upper to do the clean jobs.

```
commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master)
Author: Zhao Heming <heming.zhao at suse.com>
Date:   Mon Oct 14 10:57:54 2019 +0800

     The fd in cache->errored may already be closed before calling bcache_flush,
     so bcache_flush shouldn't rewrite data in cache->errored. Currently
     solution is return error to caller when cache->errored is not empty, and
     caller should do all the clean jobs.
     
     Signed-off-by: Zhao Heming <heming.zhao at suse.com>

diff --git a/lib/device/bcache.c b/lib/device/bcache.c
index cfe01bac2f..2eb3f0ee34 100644
--- a/lib/device/bcache.c
+++ b/lib/device/bcache.c
@@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache)
   * High level IO handling
   *--------------------------------------------------------------*/
  
-static void _wait_all(struct bcache *cache)
+static bool _wait_all(struct bcache *cache)
  {
+       bool ret = true;
         while (!dm_list_empty(&cache->io_pending))
-               _wait_io(cache);
+               ret = _wait_io(cache);
+       return ret;
  }
  
-static void _wait_specific(struct block *b)
+static bool _wait_specific(struct block *b)
  {
+       bool ret = true;
         while (_test_flags(b, BF_IO_PENDING))
-               _wait_io(b->cache);
+               ret = _wait_io(b->cache);
+       return ret;
  }
  
  static unsigned _writeback(struct bcache *cache, unsigned count)
@@ -1262,10 +1266,7 @@ void bcache_put(struct block *b)
  
  bool bcache_flush(struct bcache *cache)
  {
-       // Only dirty data is on the errored list, since bad read blocks get
-       // recycled straight away.  So we put these back on the dirty list, and
-       // try and rewrite everything.
-       dm_list_splice(&cache->dirty, &cache->errored);
+       bool ret = true;
  
         while (!dm_list_empty(&cache->dirty)) {
                 struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block);
@@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache)
                 }
  
                 _issue_write(b);
+               if (b->error) ret = false;
         }
  
-       _wait_all(cache);
+       ret = _wait_all(cache);
  
-       return dm_list_empty(&cache->errored);
+       // merge the errored list to dirty, return false to trigger caller to
+       // clean them.
+       if (!dm_list_empty(&cache->errored)) {
+               dm_list_splice(&cache->dirty, &cache->errored);
+               ret = false;
+       }
+       return ret;
  }
  
  //----------------------------------------------------------------
```





More information about the linux-lvm mailing list