[libvirt-users] Can not restore domain from a shared state file

Zhang Qian zhq527725 at gmail.com
Sun May 16 13:43:15 UTC 2010


Hi Laine,

Thanks for your reply.
I think I have found a different problem. I tried what you said "before
saving your domain, first suspend it with "virsh suspend <domain>", then
save it; after you've restored the domain with "virsh restore <image-file>",
resume the domain with "virsh resume <domain>"", when I  restore the domain
on another host, it failed with the error I mentioned before:

error: Failed to restore domain from testRes.dat
error: operation failed: failed to start VM

But now I can resolve my problem in an "ugly" way: before restoring the
domain on another host, read the whole suspend image completely on that
host, here is my code to do that read, it is very simple:
    if ((fd = open(suspendImage, O_RDONLY)) < 0) {
        goto error;
    }

    while ((size = read(fd, buf, MAXLINELEN))) {
        if (size == -1) {
            goto error;
        }
    }

    close(fd);

After these codes is executed, then restoring the domain on that host will
succeed! This solution(before restoring a domain on another host, read the
suspend image on that host completely) works every time in my environment up
to now.

I am not sure why it works, maybe this read operation triggers the NFS cache
refresh, so that the complete suspend image can be accessed in the target
host, I don't know...


Regards,
Qian

2010/5/12 Laine Stump <laine at laine.org>

> On 05/11/2010 04:40 AM, Zhang Qian wrote:
>
>> Hi,
>>
>> I have two KVM host: h1 and h2, both of them mount an NFS directory as a
>> shared storage.
>> I can save (virsh save <domain> <file>) a domain in h1 to a state file in
>> the shared storage successfully, but failed to restore it from h2 with the
>> following error message:
>> # virsh restore testRes.dat
>> error: Failed to restore domain from testRes.dat
>> error: operation failed: failed to start VM
>>
>> I can always restore it from h1, but sometimes works for h2 (wait for a
>> while, then "virsh restore" command may succeed in h2). I guess the state
>> file generated by "virsh save" command is not intact from h2 point view, may
>> be cause by the cache of NFS server?
>>
>
> There is a race condition in qemu when restarting a domain - it is possible
> for qemu to start the CPU before the domain image has been read from the
> file (this is regardless of where the file is stored). This may or may not
> be your problem (the error condition I saw due to this race was different
> from what you are seeing). It is easy to test for though - before saving
> your domain, first suspend it with "virsh suspend <domain>", then save it;
> after you've restored the domain with "virsh restore <image-file>", resume
> the domain with "virsh resume <domain>". If the domain successfully resumes,
> your problem was the race I describe. If not, you have found a different
> problem.
>
> I'm interested to know if this solves your problem.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20100516/223d4a56/attachment.htm>


More information about the libvirt-users mailing list