[linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
Heming Zhao
heming.zhao at suse.com
Fri Oct 11 09:22:57 UTC 2019
Only one thing I am confusion all the time.
When read/write error, lvm will call bcache_invalidate_fd & _scan_dev_close to close fd.
So the first successfully read (i.e.: f747), which following f748 finally has fc68's fd.
This will cause f747 metadata overwrite not f748.
the sequence of disk scanning:
```
scsi-360060e80072a670000302a670000fc69 <=== successful
scsi-360060e80072a670000302a670000fc68 <=== first failed
scsi-360060e80072a670000302a670000fc67
scsi-360060e80072a670000302a670000fc66
scsi-360060e80072a660000302a660000f74c
scsi-360060e80072a660000302a660000f74a
scsi-360060e80072a660000302a660000f749
scsi-360060e80072a660000302a660000f748 (has fc68 metadata) <=== last failed
scsi-360060e80072a660000302a660000f747 <=== first successfully read following last failed
```
Hope you understand my saying.
On 10/11/19 4:11 PM, Heming Zhao wrote:
> Hello list,
>
> I analyze this issue for some days. It looks a new bug.
>
> trigger steps:
> user execute pvresize to enlarge the pv.
> After the command execution, one disk lvm metadata was overwrite by another disk lvm metadata.
>
> once log (execute pvresize cmd), there are 7 disk occur read/write failed:
> ```
> scsi-360060e80072a670000302a670000fc68
> scsi-360060e80072a670000302a670000fc67
> scsi-360060e80072a670000302a670000fc66
> scsi-360060e80072a660000302a660000f74c
> scsi-360060e80072a660000302a660000f74a
> scsi-360060e80072a660000302a660000f749
> scsi-360060e80072a660000302a660000f748 (has fc68 metadata)
> ```
> the f748 metadata was overwritten by fc68.
>
More information about the linux-lvm
mailing list