[libvirt-users] libvirt with sanlock

Wed Mar 14 09:56:05 UTC 2012

On 03/14/2012 05:39 PM, Frido Roose wrote:
> On Wed, Mar 14, 2012 at 8:32 AM, Alex Jia <ajia at redhat.com 
> <mailto:ajia at redhat.com>> wrote:
>
>     On 03/13/2012 10:42 PM, Frido Roose wrote:
>>     Hello,
>>
>>     I configured libvirtd with the sanlock lock manager plugin:
>>
>>     # rpm -qa | egrep "libvirt-0|sanlock-[01]"
>>     libvirt-lock-sanlock-0.9.4-23.el6_2.4.x86_64
>>     sanlock-1.8-2.el6.x86_64
>>     libvirt-0.9.4-23.el6_2.4.x86_64
>>
>>     # egrep -v "^#|^$" /etc/libvirt/qemu-sanlock.conf
>>     auto_disk_leases = 1
>>     disk_lease_dir = "/var/lib/libvirt/sanlock"
>>     host_id = 4
>>
>>     # mount | grep sanlock
>>     /dev/mapper/kvm--shared-sanlock on /var/lib/libvirt/sanlock type
>>     gfs2 (rw,noatime,hostdata=jid=0)
>>
>>     # cat /etc/sysconfig/sanlock
>>     SANLOCKOPTS="-R 1 -o 30"
>>
>>     I increased the sanlock io_timeout to 30 seconds (default = 10),
>>     because the sanlock dir is on a GFS2 volume and can be blocked
>>     for some time while fencing and journal recovery takes place.
>>     With the default sanlock io timeout, I get lease timeouts because
>>     IO is blocked:
>>        Mar  5 15:37:14 raiti sanlock[5858]: 3318 s1 check_our_lease
>>     warning 79 last_success 3239
>>        Mar  5 15:37:15 raiti sanlock[5858]: 3319 s1 check_our_lease
>>     failed 80
>>
>>     So far, all fine, but when I restart sanlock and libvirtd, it
>>     takes about 2 * 30 seconds = 1 minute before libvirtd is usable.
>>      "virsh list" hangs during this time.  I can still live with that...
>>     But it gets worse after a reboot, when running a "virsh list"
>>     even takes a couple of minutes (like about 5 minutes) before it
>>     responds.  After this initial time, virsh is responding normally,
>>     so it looks like an initialization issue to me.
>>
>>     Is this a configuration issue, a bug, or expected behavior?
>     Hi Frido,
>     I'm not sure whether you met a sanlock AVC error in your
>     /var/log/audit/audit.log, could you check it and provide your
>     selinux-policy version? in addition, you should turn on selinux
>     bool value for sanlock, for example,
>
>     # getsebool -a|grep sanlock
>     virt_use_sanlock --> off
>     # setsebool -P virt_use_sanlock on
>     # getsebool -a|grep sanlock
>     virt_use_sanlock --> on
>
>
> Hello Alex,
>
> Thanks for your suggestions!  I don't have any AVC errors in 
> audit.log, but I also disabled selinux on the nodes for now.
>
>     In addition, could you provide libvirt log as a attachment? please
>     refer the following configuration:
>
>     1. /etc/libvirt/libvirtd.conf
>
>     log_filters="1:libvirt 1:conf 1:locking"
>     log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>
>     2. service libvirtd restart
>
>     3. repeat your test steps
>
>
>
> I enabled the extra debug logging, which you can find as attachment. 
>  Instead of restarting libvirtd, I did a echo b >/proc/sysreq-trigger 
> to force an unclean reboot.
> The result was that it took 300s to register the lockspace:
>    09:59:28.919: 3457: debug : virLockManagerSanlockInit:267 : 
> version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0
>    10:05:29.539: 3457: debug : virLockManagerSanlockSetupLockspace:247 
> : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been 
> registered
>
> I also had a little discussion about this with David Teigland on the 
> sanlock dev list.  He gave me some more details about how the delays 
> and timeouts work, and it explains why it takes 300s.
> I'll quote his reply:
Frido, David gave a great explanation, thanks for you forwarding these.
>
> David:
> "Yes, all the timeouts are derived from the io_timeout and are dictated by
> the recovery requirements and the algorithm the host_id leases are based
> on: "Light-Weight Leases for Storage-Centric Coordination" by Gregory
> Chockler and Dahlia Malkhi.
>
> Here are the actual equations copied from sanlock_internal.h.
> "delta" refers to host_id leases that take a long time to acquire at 
> startup
> "free" corresponds to starting up after a clean shutdown
> "held" corresponds to starting up after an unclean shutdown
>
> You should find that with 30 sec io timeout these come out to 1 min / 
> 4 min
> which you see when starting after a clean / unclean shutdown."
>
> Since I configured an io_timeout of 30s in sanlock (SANLOCKOPTS="-R 1 
> -o 30"),
> the delays at sanlock startup is defined by the delta_acquire_held_min 
> variable,
> which is calculated as:
>
>         int max = host_dead_seconds;
>         if (delta_large_delay > max)
>                 max = delta_large_delay;
>
>         int delta_acquire_held_min = max;
>
> So max is host_dead_seconds, which is calculated as:
> int host_dead_seconds      = id_renewal_fail_seconds + 
> WATCHDOG_FIRE_TIMEOUT;
>
> And id_renewal_fail_seconds is:
> int id_renewal_fail_seconds = 8 * io_timeout_seconds;
>
> WATCHDOG_FIRE_TIMEOUT = 60
>
> So that makes 8 * 30 + 60, or a total of 300s before the lock can be 
> acquired.
> And that's exactly the time that is shown in libvirtd.log before the 
> lockspace is registered.
>
> When a proper reboot is done, or when sanlock/libvirtd is just 
> restarted, delta_acquire_free_min
> defines the delay:
> int delta_short_delay      = 2 * io_timeout_seconds;
> int delta_acquire_free_min = delta_short_delay;
> Which is confirmed by the 60s delay in the libvirtd.log file:
>    10:33:54.097: 7983: debug : virLockManagerSanlockInit:267 : 
> version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf flags=0
>    10:34:55.111: 7983: debug : virLockManagerSanlockSetupLockspace:247 
> : Lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been 
> registered
>
> So the whole delay is caused by the io_timeout which is set to 30s 
> because the lockspace is on GFS2,
> and the GFS2 volume can be locked for some time while a node gets 
> fenced, and the journal is applied.
> Depending on clean/unclean restarts, the delay may differ.
>
> So delta_acquire_held_min is based on host_dead_seconds, because 
> sanlock wants to be sure that the
> lock won't be reacquired before the host is actually dead.
> David told me that sanlock is supposed to use a block device.
> I wonder if this makes sense when used on top of GFS2, which already 
> has its own locking and safety
> mechanism...  It looks to me like it just adds the delay without any 
> reason in this case?
> Perhaps delta_acquire_held_min should be the same 
> as delta_acquire_free_min because we don't need
> this safety delay on top of gfs2.
>
> For NFS, it may be a whole other story...
>
> Best regards,
> Frido

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20120314/88829a9c/attachment.htm>