[linux-lvm] [lvmlockd] recovery lvmlockd after kill_vg
damon.devops at gmail.com
Thu Sep 27 14:12:44 UTC 2018
Thank you for your reply, I have another question under such circumstances.
I usually run "vgck" to check weather vg is good, but sometimes it
seems it stuck, and leave a VGLK on sanlock. (I'm sure io error will
cause it, but sometimes not because io error)
Then i'll try use sanlock client release -r xxx to release it, but it
also sometimes not work.(be stuck)
Then I may lvmlockctl -r to drop vg lockspace, but it still may stuck,
and I'm io is ok when it stuck
This usually happens on multipath storage, I consider multipath will
queue some io is blamed, but not sure.
Thanks for your reply again
On Wed, Sep 26, 2018 at 12:44 AM David Teigland <teigland at redhat.com> wrote:
> On Tue, Sep 25, 2018 at 06:18:53PM +0800, Damon Wang wrote:
> > Hi,
> > AFAIK once sanlock can not access lease storage, it will run
> > "kill_vg" to lvmlockd, and the standard process should be deactivate
> > logical volumes and drop vg locks.
> > But sometimes the storage will recovery after kill_vg(and before we
> > deactivate or drop lock), and then it will prints "storage failed for
> > sanlock leases" on lvm commands like this:
> > [root at dev1-2 ~]# vgck 71b1110c97bd48aaa25366e2dc11f65f
> > WARNING: Not using lvmetad because config setting use_lvmetad=0.
> > WARNING: To avoid corruption, rescan devices to make changes visible
> > (pvscan --cache).
> > VG 71b1110c97bd48aaa25366e2dc11f65f lock skipped: storage failed for
> > sanlock leases
> > Reading VG 71b1110c97bd48aaa25366e2dc11f65f without a lock.
> > so what should I do to recovery this, (better) without affect
> > volumes in using?
> > I find a way but it seems very tricky: save "lvmlockctl -i" output,
> > run lvmlockctl -r vg and then activate volumes as the previous output.
> > Do we have an "official" way to handle this? Since it is pretty
> > common that when I find lvmlockd failed, the storage has already
> > recovered.
> Hi, to figure out that workaround, you've probably already read the
> section of the lvmlockd man page: "sanlock lease storage failure", which
> gives some background about what's happening and why. What the man page
> is missing is some help about false failure detections like you're seeing.
> It sounds like io delays from your storage are a little longer than
> sanlock is allowing for. With the default 10 sec io timeout, sanlock will
> initiate recovery (kill_vg in lvmlockd) after 80 seconds of no successful
> io from the storage. After this, it decides the storage has failed. If
> it's not failed, just slow, then the proper way to handle that is to
> increase the timeouts. (Or perhaps try to configure the storage to avoid
> such lengthy delays.) Once a failure is detected and recovery is begun,
> there's not an official way to back out of it.
> You can increase the sanlock io timeout with lvmlockd -o <seconds>.
> sanlock multiplies that by 8 to get the total length of time before
> starting recovery. I'd look at how long your temporary storage outages
> last and set io_timeout so that 8*io_timeout will cover it.
More information about the linux-lvm