[libvirt-users] libvirt with sanlock
Alex Jia
ajia at redhat.com
Wed Mar 14 09:56:05 UTC 2012
On 03/14/2012 05:39 PM, Frido Roose wrote:
> On Wed, Mar 14, 2012 at 8:32 AM, Alex Jia <ajia at redhat.com
> <mailto:ajia at redhat.com>> wrote:
>
> On 03/13/2012 10:42 PM, Frido Roose wrote:
>> Hello,
>>
>> I configured libvirtd with the sanlock lock manager plugin:
>>
>> # rpm -qa | egrep "libvirt-0|sanlock-[01]"
>> libvirt-lock-sanlock-0.9.4-23.el6_2.4.x86_64
>> sanlock-1.8-2.el6.x86_64
>> libvirt-0.9.4-23.el6_2.4.x86_64
>>
>> # egrep -v "^#|^$" /etc/libvirt/qemu-sanlock.conf
>> auto_disk_leases = 1
>> disk_lease_dir = "/var/lib/libvirt/sanlock"
>> host_id = 4
>>
>> # mount | grep sanlock
>> /dev/mapper/kvm--shared-sanlock on /var/lib/libvirt/sanlock type
>> gfs2 (rw,noatime,hostdata=jid=0)
>>
>> # cat /etc/sysconfig/sanlock
>> SANLOCKOPTS="-R 1 -o 30"
>>
>> I increased the sanlock io_timeout to 30 seconds (default = 10),
>> because the sanlock dir is on a GFS2 volume and can be blocked
>> for some time while fencing and journal recovery takes place.
>> With the default sanlock io timeout, I get lease timeouts because
>> IO is blocked:
>> Mar 5 15:37:14 raiti sanlock[5858]: 3318 s1 check_our_lease
>> warning 79 last_success 3239
>> Mar 5 15:37:15 raiti sanlock[5858]: 3319 s1 check_our_lease
>> failed 80
>>
>> So far, all fine, but when I restart sanlock and libvirtd, it
>> takes about 2 * 30 seconds = 1 minute before libvirtd is usable.
>> "virsh list" hangs during this time. I can still live with that...
>> But it gets worse after a reboot, when running a "virsh list"
>> even takes a couple of minutes (like about 5 minutes) before it
>> responds. After this initial time, virsh is responding normally,
>> so it looks like an initialization issue to me.
>>
>> Is this a configuration issue, a bug, or expected behavior?
> Hi Frido,
> I'm not sure whether you met a sanlock AVC error in your
> /var/log/audit/audit.log, could you check it and provide your
> selinux-policy version? in addition, you should turn on selinux
> bool value for sanlock, for example,
>
> # getsebool -a|grep sanlock
> virt_use_sanlock --> off
> # setsebool -P virt_use_sanlock on
> # getsebool -a|grep sanlock
> virt_use_sanlock --> on
>
>
> Hello Alex,
>
> Thanks for your suggestions! I don't have any AVC errors in
> audit.log, but I also disabled selinux on the nodes for now.
>
> In addition, could you provide libvirt log as a attachment? please
> refer the following configuration:
>
> 1. /etc/libvirt/libvirtd.conf
>
> log_filters="1:libvirt 1:conf 1:locking"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>
> 2. service libvirtd restart
>
> 3. repeat your test steps
>
>
>
> I enabled the extra debug logging, which you can find as attachment.
> Instead of restarting libvirtd, I did a echo b >/proc/sysreq-trigger
> to force an unclean reboot.
> The result was that it took 300s to register the lockspace:
> 09:59:28.919: 3457: debug : virLockManagerSanlockInit:267 :
> version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0
> 10:05:29.539: 3457: debug : virLockManagerSanlockSetupLockspace:247
> : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been
> registered
>
> I also had a little discussion about this with David Teigland on the
> sanlock dev list. He gave me some more details about how the delays
> and timeouts work, and it explains why it takes 300s.
> I'll quote his reply:
Frido, David gave a great explanation, thanks for you forwarding these.
>
> David:
> "Yes, all the timeouts are derived from the io_timeout and are dictated by
> the recovery requirements and the algorithm the host_id leases are based
> on: "Light-Weight Leases for Storage-Centric Coordination" by Gregory
> Chockler and Dahlia Malkhi.
>
> Here are the actual equations copied from sanlock_internal.h.
> "delta" refers to host_id leases that take a long time to acquire at
> startup
> "free" corresponds to starting up after a clean shutdown
> "held" corresponds to starting up after an unclean shutdown
>
> You should find that with 30 sec io timeout these come out to 1 min /
> 4 min
> which you see when starting after a clean / unclean shutdown."
>
> Since I configured an io_timeout of 30s in sanlock (SANLOCKOPTS="-R 1
> -o 30"),
> the delays at sanlock startup is defined by the delta_acquire_held_min
> variable,
> which is calculated as:
>
> int max = host_dead_seconds;
> if (delta_large_delay > max)
> max = delta_large_delay;
>
> int delta_acquire_held_min = max;
>
> So max is host_dead_seconds, which is calculated as:
> int host_dead_seconds = id_renewal_fail_seconds +
> WATCHDOG_FIRE_TIMEOUT;
>
> And id_renewal_fail_seconds is:
> int id_renewal_fail_seconds = 8 * io_timeout_seconds;
>
> WATCHDOG_FIRE_TIMEOUT = 60
>
> So that makes 8 * 30 + 60, or a total of 300s before the lock can be
> acquired.
> And that's exactly the time that is shown in libvirtd.log before the
> lockspace is registered.
>
> When a proper reboot is done, or when sanlock/libvirtd is just
> restarted, delta_acquire_free_min
> defines the delay:
> int delta_short_delay = 2 * io_timeout_seconds;
> int delta_acquire_free_min = delta_short_delay;
> Which is confirmed by the 60s delay in the libvirtd.log file:
> 10:33:54.097: 7983: debug : virLockManagerSanlockInit:267 :
> version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0
> 10:34:55.111: 7983: debug : virLockManagerSanlockSetupLockspace:247
> : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been
> registered
>
> So the whole delay is caused by the io_timeout which is set to 30s
> because the lockspace is on GFS2,
> and the GFS2 volume can be locked for some time while a node gets
> fenced, and the journal is applied.
> Depending on clean/unclean restarts, the delay may differ.
>
> So delta_acquire_held_min is based on host_dead_seconds, because
> sanlock wants to be sure that the
> lock won't be reacquired before the host is actually dead.
> David told me that sanlock is supposed to use a block device.
> I wonder if this makes sense when used on top of GFS2, which already
> has its own locking and safety
> mechanism... It looks to me like it just adds the delay without any
> reason in this case?
> Perhaps delta_acquire_held_min should be the same
> as delta_acquire_free_min because we don't need
> this safety delay on top of gfs2.
>
> For NFS, it may be a whole other story...
>
> Best regards,
> Frido
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20120314/88829a9c/attachment.htm>
More information about the libvirt-users
mailing list