[lvm-devel] Handle transient errors for mirrored log in lvconvert --repair

Petr Rockai prockai at redhat.com
Thu Jul 29 16:49:34 UTC 2010


Hi Taka,

Takahiro Yasui <takahiro.yasui at hds.com> writes:
> As shown above, I hope this kind of code to be added to check the status
> of a log volume.

[snip]

> I understand your concern that this doesn't cover all cases. For example,
> there might be a problem when mirror_{log|mirror}_fault_policy is set
> to 'allocate' instead of 'removed.'

> Here is a discussion. We can rescue a case that 'removed' policy is set to
> mirror_{log|mirror}_fault_policy by adding lv_check_transient() for a mirrored
> log volume, while application will hang up when a transient error or medium
> error occurred on mirrored log without this patch.

> How about adding the patch for a short term solution to save in the case of
> 'removed' policy? We have already made a decision when the first patch is
> committed.

I think the main concern is that a sync over a partially failing PV will
make things a lot worse. On the other hand, I agree that having multiple
failing devices is a rare situation. I would concede to the following:

If a transient error is detected, repair the mirror as usual through
down-conversion, but refuse to do any allocation. This is the same
situation as when there are no spare PVs available. This is a
conservative over-approximation that is always correct. We are issuing a
log_warn that the mirror could not be restored to its previous state,
which should end up in syslog. From there, this is a matter of the
sysadmin to take action. The mirror should keep operating in a reduced
mode in the meantime.

If this was noted in documentation, I think this would be appropriate
for RHEL6. The check wouldn't be very hard to do, I believe: Count the
number of partial LVs in the VG before the transient check and count
again after, if the numbers differ, forbid any new allocations.

Would you find such a solution acceptable? Overall, it in the case of a
transient failure, it will work as if "remove" was specified regardless
of actual lvm.conf setting. For permanent failures, the policy is
respected as previously.

Yours,
   Petr.




More information about the lvm-devel mailing list