[libvirt] [PATCH V2] libxl: fix dom0 balloon logic
Martin Kletzander
mkletzan at redhat.com
Tue Mar 31 14:30:00 UTC 2015
On Tue, Mar 31, 2015 at 08:16:09AM -0600, Jim Fehlig wrote:
>Martin Kletzander wrote:
>> On Tue, Mar 24, 2015 at 02:43:42PM -0600, Jim Fehlig wrote:
>>> Recent testing on large memory systems revealed a bug in the Xen xl
>>> tool's freemem() function. When autoballooning is enabled, freemem()
>>> is used to ensure enough memory is available to start a domain,
>>> ballooning dom0 if necessary. When ballooning large amounts of memory
>>> from dom0, freemem() would exceed its self-imposed wait time and
>>> return an error. Meanwhile, dom0 continued to balloon. Starting the
>>> domain later, after sufficient memory was ballooned from dom0, would
>>> succeed. The libvirt implementation in libxlDomainFreeMem() suffers
>>> the same bug since it is modeled after freemem().
>>>
>>> In the end, the best place to fix the bug on the Xen side was to
>>> slightly change the behavior of libxl_wait_for_memory_target().
>>> Instead of failing after caller-provided wait_sec, the function now
>>> blocks as long as dom0 memory ballooning is progressing. It will return
>>> failure only when more memory is needed to reach the target and wait_sec
>>> have expired with no progress being made. See xen.git commit fd3aa246.
>>> There was a dicussion on how this would affect other libxl apps like
>>> libvirt
>>>
>>> http://lists.xen.org/archives/html/xen-devel/2015-03/msg00739.html
>>>
>>> If libvirt containing this patch was build against a Xen containing
>>> the old libxl_wait_for_memory_target() behavior, libxlDomainFreeMem()
>>> will fail after 30 sec and domain creation will be terminated.
>>> Without this patch and with old libxl_wait_for_memory_target() behavior,
>>> libxlDomainFreeMem() does not succeed after 30 sec, but returns success
>>> anyway. Domain creation continues resulting in all sorts of fun stuff
>>> like cpu soft lockups in the guest OS. It was decided to properly fix
>>> libxl_wait_for_memory_target(), and if anything improve the default
>>> behavior of apps using the freemem reference impl in xl.
>>>
>>> xl was patched to accommodate the change in
>>> libxl_wait_for_memory_target()
>>> with xen.git commit 883b30a0. This patch does the same in the libxl
>>> driver. While at it, I changed the logic to essentially match
>>> freemem() in $xensrc/tools/libxl/xl_cmdimpl.c. It was a bit cleaner
>>> IMO and will make it easier to spot future, potentially interesting
>>> divergences.
>>>
>>> Signed-off-by: Jim Fehlig <jfehlig at suse.com>
>>> ---
>>>
>>> V2: Actually use libxl_wait_for_memory_target(), instead of
>>> libxl_wait_for_free_memory()
>>>
>>> src/libxl/libxl_domain.c | 55
>>> +++++++++++++++++++++++-------------------------
>>> 1 file changed, 26 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c
>>> index 407a9bd..a1739aa 100644
>>> --- a/src/libxl/libxl_domain.c
>>> +++ b/src/libxl/libxl_domain.c
>>> @@ -1121,38 +1121,39 @@ libxlDomainFreeMem(libxlDomainObjPrivatePtr
>>> priv, libxl_domain_config *d_config)
>>> {
>>> uint32_t needed_mem;
>>> uint32_t free_mem;
>>> - size_t i;
>>> - int ret = -1;
>>> + int ret;
>>
>> This variable is unnecessary and confusing since you don't return it.
>> And without this, you can make ...
>>
>>> int tries = 3;
>>> int wait_secs = 10;
>>>
>>> - if ((ret = libxl_domain_need_memory(priv->ctx, &d_config->b_info,
>>> - &needed_mem)) >= 0) {
>>> - for (i = 0; i < tries; ++i) {
>>> - if ((ret = libxl_get_free_memory(priv->ctx, &free_mem))
>>> < 0)
>>> - break;
>>> + ret = libxl_domain_need_memory(priv->ctx, &d_config->b_info,
>>> &needed_mem);
>>> + if (ret < 0)
>>> + goto error;
>>>
>>
>> ... these conditions smaller, e.g.:
>>
>> if (libxl_domain_need_memory(priv->ctx, &d_config->b_info,
>> &needed_mem) < 0)
>> goto error;
>
>Yeah, too much copy and paste. I'll fix this up and send a V3.
>
>> Do you need this for the release? (sorry for noticing that late)
>
>No, this can wait. From a Xen perspective, more useful work for the
>release would have been this series
>
>https://www.redhat.com/archives/libvir-list/2015-March/msg01337.html
>
>But I think it is a bit too late now :-(.
>
I missed that, sorry. I'll have a look at it tomorrow. Not being
that Xen-knowledgeable, I probably haven't felt that confident
reviewing that.
>Regards,
>Jim
>
>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20150331/ef246b8d/attachment-0001.sig>
More information about the libvir-list
mailing list