On Sat, Jul 26, 2014 at 03:47:09PM +0800, James wrote:
On 2014/7/25 18:07, Martin Kletzander wrote:On Fri, Jul 25, 2014 at 04:45:55PM +0800, James wrote:There's a kind of situation that when libvirtd's under a lot of pressure, just as we start a lot of VMs at the same time, some libvirt APIs may take a lot of time to return. And this will block the up level job to be finished. Mostly we can't wait forever, we want a time out mechnism to help us out. When one API takes more than some time, it can return time out as a result, and do some rolling back. So my question is: do we have a plan to give a 'time out' solution or a better solution to fix this kind of problems in the future? And when?Is it only because there are not enough workers available? If yes, then changing the limits in libvirtd.conf (both global and per-connection) might be the easiest way to go. MartinThat's very nice to receive your reply quickly. The job pressure is just one point for time out mechnism. If something really bad happened just like a blocked bug which stops libvirt API returning, and it's very rare to happen, what can we do to assure the job not blocked by the blocked API? It's like Process A call libvirt API b, but b never returns, A is blocked there forever, so what's the best for us to do?
As that is pretty rare case that cannot be dealt with inside the API (since the API is the place where it gets locked), it has to be dealt with outside it. I guess whatever you would do by hand is OK. If, for example, you are used to restart libvirtd after the block is detected, then restart it and try again. You can spawn another process that will do it if you want some fine-grained control, or you can use client (and server) -side keepalive to be automatically disconnected in case the block happens inside the event loop (but it won't catch it outside). I'm not sure how to answer more properly since this is not libvirt-specific. If there's something libvirt-specific I missed, let me know. Martin
Description: Digital signature