<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 03/14/2012 05:39 PM, Frido Roose wrote:
<blockquote
cite="mid:CAAZ+1KrgpT7cwm8j+Xix0Jr=xE7SGcNoHOJpKZq2b5QJ9=-MSw@mail.gmail.com"
type="cite">
<div class="gmail_quote">On Wed, Mar 14, 2012 at 8:32 AM, Alex Jia
<span dir="ltr"><<a moz-do-not-send="true"
href="mailto:ajia@redhat.com" target="_blank">ajia@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div bgcolor="#ffffff" text="#000000">
<div>
<div> On 03/13/2012 10:42 PM, Frido Roose wrote:
<blockquote type="cite">Hello,
<div><br>
</div>
<div>I configured libvirtd with the sanlock lock
manager plugin:</div>
<div><br>
</div>
<div>
<div># rpm -qa | egrep "libvirt-0|sanlock-[01]"</div>
<div>libvirt-lock-sanlock-0.9.4-23.el6_2.4.x86_64</div>
<div>sanlock-1.8-2.el6.x86_64</div>
<div>libvirt-0.9.4-23.el6_2.4.x86_64</div>
</div>
<div><br>
</div>
<div>
<div># egrep -v "^#|^$"
/etc/libvirt/qemu-sanlock.conf </div>
<div>auto_disk_leases = 1</div>
<div>disk_lease_dir = "/var/lib/libvirt/sanlock"</div>
<div>host_id = 4</div>
</div>
<div><br>
</div>
<div>
<div># mount | grep sanlock</div>
<div>/dev/mapper/kvm--shared-sanlock on
/var/lib/libvirt/sanlock type gfs2
(rw,noatime,hostdata=jid=0)</div>
</div>
<div><br>
</div>
<div>
<div># cat /etc/sysconfig/sanlock </div>
<div>SANLOCKOPTS="-R 1 -o 30"</div>
</div>
<div><br>
</div>
<div>I increased the sanlock io_timeout to 30 seconds
(default = 10), because the sanlock dir is on a GFS2
volume and can be blocked for some time while
fencing and journal recovery takes place.</div>
<div>With the default sanlock io timeout, I get lease
timeouts because IO is blocked:</div>
<div>
<div>
<div> Mar 5 15:37:14 raiti sanlock[5858]: 3318
s1 check_our_lease warning 79 last_success 3239</div>
<div> Mar 5 15:37:15 raiti sanlock[5858]: 3319
s1 check_our_lease failed 80</div>
</div>
</div>
<div><br>
</div>
<div>So far, all fine, but when I restart sanlock and
libvirtd, it takes about 2 * 30 seconds = 1 minute
before libvirtd is usable. "virsh list" hangs
during this time. I can still live with that...</div>
<div>But it gets worse after a reboot, when running a
"virsh list" even takes a couple of minutes (like
about 5 minutes) before it responds. After this
initial time, virsh is responding normally, so it
looks like an initialization issue to me.</div>
<div><br>
</div>
<div>Is this a configuration issue, a bug, or expected
behavior?</div>
</blockquote>
</div>
</div>
Hi Frido,<br>
I'm not sure whether you met a sanlock AVC error in your
/var/log/audit/audit.log, could you check it and provide
your selinux-policy version? in addition, you should turn on
selinux bool value for sanlock, for example, <br>
<br>
# getsebool -a|grep sanlock<br>
virt_use_sanlock --> off<br>
# setsebool -P virt_use_sanlock on<br>
# getsebool -a|grep sanlock<br>
virt_use_sanlock --> on<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div>
<div>Hello Alex,</div>
<div><br>
</div>
<div>Thanks for your suggestions! I don't have any AVC errors
in audit.log, but I also disabled selinux on the nodes for
now.</div>
</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<div bgcolor="#ffffff" text="#000000"> In addition, could you
provide libvirt log as a attachment? please refer the
following configuration:<br>
<br>
1. /etc/libvirt/libvirtd.conf<br>
<br>
log_filters="1:libvirt 1:conf 1:locking"<br>
log_outputs="1:<a moz-do-not-send="true">file:/var/log/libvirt/libvirtd.log</a>"<br>
<br>
2. service libvirtd restart<br>
<br>
3. repeat your test steps<br>
<br>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>I enabled the extra debug logging, which you can find as
attachment. Instead of restarting libvirtd, I did a echo b
>/proc/sysreq-trigger to force an unclean reboot.</div>
<div>The result was that it took 300s to register the lockspace:</div>
<div>
<div> 09:59:28.919: 3457: debug :
virLockManagerSanlockInit:267 : version=1000000
configFile=/etc/libvirt/qemu-sanlock.conf flags=0</div>
<div> 10:05:29.539: 3457: debug :
virLockManagerSanlockSetupLockspace:247 : Lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been
registered</div>
</div>
<div><br>
</div>
<div>I also had a little discussion about this with David
Teigland on the sanlock dev list. He gave me some more
details about how the delays and timeouts work, and it
explains why it takes 300s.</div>
<div>I'll quote his reply:</div>
</div>
</blockquote>
Frido, David gave a great explanation, thanks for you forwarding
these.<br>
<blockquote
cite="mid:CAAZ+1KrgpT7cwm8j+Xix0Jr=xE7SGcNoHOJpKZq2b5QJ9=-MSw@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div><br>
</div>
<div>David:</div>
<div>"<span>Yes, all the timeouts are derived from the
io_timeout and are dictated by</span></div>
<span>the recovery requirements and the algorithm the host_id
leases are based</span><br>
<span>on: "Light-Weight Leases for Storage-Centric Coordination"
by Gregory</span><br>
<span>Chockler and Dahlia Malkhi.</span><br>
<br>
<span>Here are the actual equations copied from
sanlock_internal.h.</span><br>
<span>"delta" refers to host_id leases that take a long time to
acquire at startup</span><br>
<span>"free" corresponds to starting up after a clean shutdown</span><br>
<span>"held" corresponds to starting up after an unclean
shutdown</span><br>
<br>
<span>You should find that with 30 sec io timeout these come out
to 1 min / 4 min</span><br>
<span>which you see when starting after a clean / unclean
shutdown."</span></div>
<div class="gmail_quote">
<font color="#222222" face="arial, sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">Since I configured an io_timeout of 30s in sanlock
(SANLOCKOPTS="-R 1 -o 30"),</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">the delays at sanlock startup is defined by
the delta_acquire_held_min variable, </font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">which is calculated as:</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">
<div class="gmail_quote"> int max = host_dead_seconds;</div>
<div class="gmail_quote"> if (delta_large_delay >
max)</div>
<div class="gmail_quote"> max =
delta_large_delay;</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote"> int delta_acquire_held_min =
max;</div>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">So max is host_dead_seconds, which is calculated
as:</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">int host_dead_seconds =
id_renewal_fail_seconds + WATCHDOG_FIRE_TIMEOUT;</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">And id_renewal_fail_seconds is:</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">int id_renewal_fail_seconds = 8 *
io_timeout_seconds;</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><span style="color: rgb(34, 34, 34);
font-family: arial,sans-serif;">WATCHDOG_FIRE_TIMEOUT = 60</span><font
color="#222222" face="arial, sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">So that makes 8 * 30 + 60, or a total of 300s
before the lock can be acquired.</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">And that's exactly the time that is shown in
libvirtd.log before the lockspace is registered.</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif"><br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">When a proper reboot is done, or when
sanlock/libvirtd is just restarted, delta_acquire_free_min</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">defines the delay:</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">int delta_short_delay = 2 *
io_timeout_seconds;<br>
</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">int delta_acquire_free_min = delta_short_delay;</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">Which is confirmed by the 60s delay in the
libvirtd.log file:</font></div>
<div class="gmail_quote"><font color="#222222" face="arial,
sans-serif">
<div class="gmail_quote">
10:33:54.097: 7983: debug : virLockManagerSanlockInit:267
: version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf
flags=0</div>
<div class="gmail_quote"> 10:34:55.111: 7983: debug :
virLockManagerSanlockSetupLockspace:247 : Lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been
registered</div>
</font>
<div><br>
</div>
<div>So the whole delay is caused by the io_timeout which is set
to 30s because the lockspace is on GFS2,</div>
<div>and the GFS2 volume can be locked for some time while a
node gets fenced, and the journal is applied.</div>
<div>Depending on clean/unclean restarts, the delay may differ.</div>
<div><br>
</div>
<div>So <span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">delta_acquire_held_min is based on </span><span
style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">host_dead_seconds, because sanlock wants
to be sure that the</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">lock won't be reacquired before the host
is actually dead.</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">David told me that sanlock is supposed to
use a block device.</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">I wonder if this makes sense when used on
top of GFS2, which already has its own locking and safety</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">mechanism... It looks to me like it just
adds the delay without any reason in this case?</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">Perhaps </span><span style="color:
rgb(34, 34, 34); font-family: arial,sans-serif;">delta_acquire_held_min
should be the same as</span><span style="color: rgb(34, 34,
34); font-family: arial,sans-serif;"> delta_acquire_free_min
because we don't need</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">this safety delay on top of gfs2.</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;"><br>
</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">For NFS, it may be a whole other story...</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;"><br>
</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">Best regards,</span></div>
<div><span style="color: rgb(34, 34, 34); font-family:
arial,sans-serif;">Frido</span></div>
</div>
</blockquote>
<br>
</body>
</html>