On Tue, Aug 05, 2014 at 03:15:18PM +0800, James wrote:
In fact, to deal with this kind of situation, we add some timeout codes in libvirtd, during remote_dispatch process. The mechanism is like this: 1. when we call an API, we start a thread to do the timer, when time out, the timer set a timeout flag to the API, and return timeout result to the libvirt client. 2. when the API return to remote_dispatch level, it checkout the timeout flag to consider what to do next. If timeout, we do some rollback action. It's like detach device, if we attach device at first. In this solution, there's something trouble, first, we have to figure out suitable rollback actions. Second, I'm not sure it's the best way to solve this kind of block problem, not so elegant. How do you think about it?
I'm not sure what do you want to know. Yes, there are problems like "what rollback actions to do", which would depend on where the call got stuck and "what's the timeout that should be set", which depends on thousands of factors. I can't think of any elegant solution that would prevent locking properly. Mainly because this is literally the Halting problem  plus a bit more. I'd say that whatever works for you in this situation is OK, but will (most probably) work only for your particular scenario. Martin  https://en.wikipedia.org/wiki/Halting_problem
Description: Digital signature