[libvirt] [RFC] Add a new feild to qemud_save_header to tell if the saved image is corrupt

Eric Blake eblake at redhat.com
Fri Aug 19 13:12:20 UTC 2011


On 08/18/2011 11:42 PM, Osier Yang wrote:
>> Remember, that 'migrate' is a long-running async job command, and can
>> be interrupted. That is, 'service libvirtd restart' is a legal action
>> to take during step 3, and it is not as severe as a libvirtd crash,
>> and we have already recently added patches to remember async job
>> status across libvirtd restarts with the intention of making it legal
>> to restart libvirtd in the middle of an async job (whether the async
>> job should still succeed, or should remove the save file, is a
>> slightly different question; but removing the save file would require
>> that we save in the XML the name of the file to remove if libvirtd is
>> restarted).
>
> Hmm, how about restart libvirtd during the process of managed saving?
>
> Domain will be restored from the corrupt save image automatically. We
> report an error like "image is corrupt" and quite the domain starting
> simply?
> This might be not good, as one will see a running domain fails to start
> after libvirtd restarting.
>
> Or we want to the managed saving still succeed? If so, we might need:
>
> 1) continue the managed saving job, (Per we are already support remeber
> the async job status across libvirtd restarting)
> 2) restore from the saved image finished in 1).

I think the easiest approach is:

if we restart libvirtd, and see that an async job for save-to-file was 
in progress, then we abort the job (leaving the file marked unfinished, 
whether it was managed save or user save), and log the error.

On managed restore (virDomainCreate or autostart), if the save file 
exists but is incomplete, then log the fact that the file is unusable, 
then unlink() the file and proceed to do a normal boot (nothing we can 
do to recover the lost autosave, but we can at least clean up on the 
user's behalf).

On user restore (virDomainRestore), if the save file exists but is 
incomplete, report the error to the user.  No unlink(), and no rebooting 
the guest; it's up to the user to decide how to handle the failed save.

But if we can figure out how to do better, by making a libvirtd restart 
able to complete the save process rather than ditch it, then that would 
be nicer.  It's just that I don't know how easy that would be, and we 
have to start this patch somewhere.

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




More information about the libvir-list mailing list