[linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"

Heming Zhao heming.zhao at suse.com
Fri Oct 11 09:22:57 UTC 2019


Only one thing I am confusion all the time.
When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd.
So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd.
This will cause f747 metadata overwrite not f748.

the sequence of disk scanning:
```
  scsi-360060e80072a670000302a670000fc69 <=== successful
  scsi-360060e80072a670000302a670000fc68 <=== first failed
  scsi-360060e80072a670000302a670000fc67
  scsi-360060e80072a670000302a670000fc66
  scsi-360060e80072a660000302a660000f74c
  scsi-360060e80072a660000302a660000f74a
  scsi-360060e80072a660000302a660000f749
  scsi-360060e80072a660000302a660000f748 (has fc68 metadata) <=== last failed
  scsi-360060e80072a660000302a660000f747 <=== first successfully read following last failed
```

Hope you understand my saying.


On 10/11/19 4:11 PM, Heming Zhao wrote:
> Hello list,
> 
> I analyze this issue for some days. It looks a new bug.
> 
> trigger steps:
> user execute pvresize to enlarge the pv.
> After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata.
> 
> once log (execute pvresize cmd), there are 7 disk occur read/write failed:
> ```
> scsi-360060e80072a670000302a670000fc68
> scsi-360060e80072a670000302a670000fc67
> scsi-360060e80072a670000302a670000fc66
> scsi-360060e80072a660000302a660000f74c
> scsi-360060e80072a660000302a660000f74a
> scsi-360060e80072a660000302a660000f749
> scsi-360060e80072a660000302a660000f748 (has fc68 metadata)
> ```
> the f748 metadata was overwritten by fc68.
> 




More information about the linux-lvm mailing list