[libvirt] [PATCH 00/10] libxl: switch driver to use a single libxl_ctx
Jim Fehlig
jfehlig at suse.com
Tue Mar 24 22:44:07 UTC 2015
Michal Privoznik wrote:
> On 18.02.2015 04:56, Jim Fehlig wrote:
>
>> This series is a follow up to
>>
>> https://www.redhat.com/archives/libvir-list/2015-February/msg00024.html
>>
>> It goes a step further and changes the libxl driver to use one,
>> driver-wide libxl_ctx. Currently the libxl driver has one driver-wide
>> ctx for operations that are not domain-specific and a ctx for each
>> domain. This approach was necessary back in the old Xen4.1 libxl days,
>> but with the newer libxl it is more of a hinderance than benefit.
>> Ian Jackson suggested moving to a single ctx while discussing some
>> deadlocks and assertions encountered in the libxl driver when under
>> load from tests such as OpenStack Tempest.
>>
>> Making such a change involves quite a bit of code movement. I've tried
>> to split that up into a reviewable series, the result of which are the
>> 9 patches that follow. I've ran this through all of my automated tests
>> as well as some hacky tests I created to reproduce failures revealed by
>> Tempest.
>>
>> One downside of moving to a single ctx is losing the per-domain log
>> files. Currently, a single log stream can be associated with ctx, hence
>> all logging from libxl will go to a single file. Ian is going to
>> investigate possibilities to accommodate per-domain log files in libxl,
>> but in the meantime folks using Xen are accustomed to a single
>> log file from the xend days.
>>
>> I've been testing this series on xen-unstable and Xen 4.4.1 + commits
>> 2ffeb5d7, 4b9143e4, 5a968257, 60ce518a, 66bff9fd, 77a1bf37, f49f9b41,
>> 6b5a5bba, 93699882d, f1335f0d, and 8bc64413. Results are much better
>> than before applying the series, but I do notice a stuck hypercall
>> after many (hundreds) concurrent domain create/destroy operations.
>> The single libxl_ctx is locked in the callpath, essentially deadlocking
>> the driver.
>>
>> Thread 1 (Thread 0x7f0649a198c0 (LWP 2235)):
>> 0 0x00007f0645272397 in ioctl () from /lib64/libc.so.6
>> 1 0x00007f0645d8e353 in linux_privcmd_hypercall (xch=<optimized out>,
>> h=<optimized out>, hypercall=<optimized out>) at xc_linux_osdep.c:134
>> 2 0x00007f0645d854b8 in do_xen_hypercall (xch=xch at entry=0x7f0630039390,
>> hypercall=hypercall at entry=0x7fffd53f80e0) at xc_private.c:249
>> 3 0x00007f0645d86aa4 in do_sysctl (sysctl=sysctl at entry=0x7fffd53f8080,
>> xch=xch at entry=0x7f0630039390) at xc_private.h:281
>> 4 xc_sysctl (xch=xch at entry=0x7f0630039390,
>> sysctl=sysctl at entry=0x7fffd53f8170) at xc_private.c:656
>> 5 0x00007f0645d7bfbf in xc_domain_getinfolist (xch=0x7f0630039390,
>> first_domain=first_domain at entry=119, max_domains=max_domains at entry=1,
>> info=info at entry=0x7fffd53f8260) at xc_domain.c:382
>> 6 0x00007f0645fabca6 in domain_death_xswatch_callback
>> (egc=0x7fffd53f83f0, w=<optimized out>, wpath=<optimized out>,
>> epath=<optimized out>) at libxl.c:1041
>> 7 0x00007f0645fd75a8 in watchfd_callback (egc=0x7fffd53f83f0,
>> ev=<optimized out>, fd=<optimized out>, events=<optimized out>,
>> revents=<optimized out>) at libxl_event.c:515
>> 8 0x00007f0645fd8ac3 in libxl_osevent_occurred_fd (ctx=<optimized out>,
>> for_libxl=<optimized out>, fd=<optimized out>,
>> events_ign=<optimized out>, revents_ign=<optimized out>) at
>> libxl_event.c:1259
>> 9 0x00007f063a23402c in libxlFDEventCallback (watch=454, fd=33,
>> vir_events=1, fd_info=0x7f0608007e70) at libxl/libxl_driver.c:123
>>
>> There is no hint in any logs or dmesg suggesting a cause for the stuck
>> hypercall. Any suggestions for further debugging tips appreciated.
>>
FYI, this was not a hung hypercall, but looping clear back in frame 6
that I overlooked. It was fixed in libxl by the following commit
http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=4783c99aab866f470bd59368cfbf5ad5f677b0ec
>> Jim Fehlig (10):
>> libxl: remove redundant calls to libxl_evdisable_domain_death
>> libxl: use libxl_ctx passed to libxlConsoleCallback
>> libxl: use driver-wide ctx in fd and timer event handling
>> libxl: Move setup of child processing code to driver initialization
>> libxl: move event registration to driver initialization
>> libxl: use global libxl_ctx in event handler
>> libxl: remove unnecessary libxlDomainEventsRegister
>> libxl: make libxlDomainFreeMem static
>> libxl: remove per-domain libxl_ctx
>> libxl: change libxl log stream to ERROR log level
>>
>> src/libxl/libxl_conf.c | 2 +-
>> src/libxl/libxl_domain.c | 438 ++++++---------------------------------
>> src/libxl/libxl_domain.h | 27 +--
>> src/libxl/libxl_driver.c | 484 +++++++++++++++++++++++++++++++-------------
>> src/libxl/libxl_migration.c | 17 +-
>> 5 files changed, 426 insertions(+), 542 deletions(-)
>>
>>
>
> ACK series
>
Thanks! 1 and 2 were pushed earlier as part of this trivial series
https://www.redhat.com/archives/libvir-list/2015-March/msg00102.html
I've now pushed 3-9, but held off on pushing 10 since it removes the
possibility to get debug level messages from libxl. I think a better
approach would be to introduce /etc/libvirt/libxl.conf with a
'log_level' setting, giving users the ability to change this a bit more
dynamically. Actually, an even better approach would be to set libxl
debug level based on the level set in /etc/libvirt/libvirtd.conf. But
AFAIK, the settings of various knobs in libvirtd.conf are generally not
available to the individual drivers.
Regards,
Jim
More information about the libvir-list
mailing list