[libvirt] [nicsysco.com] Weird Libvirt Behavior

Daniel P. Berrangé berrange at redhat.com
Wed Jan 30 14:42:37 UTC 2019


On Tue, Jan 29, 2019 at 06:18:21PM -0800, nico wrote:
> Hi folks,
> 
>  
> 
> First time contributor, but I felt that what I discovered was (probably) a
> very rare situation. 
> 
>  
> 
> I'm running a Centos server (my only Linux deployment) to which customers
> all over the U.S. connect to process their micro-lender businesses. There
> are several VM's, among other one which runs the fortress system, called a2.
> In the beginning the .raw file was about 10GB, which was a 5X overkill in
> terms of capacity, at the time. 
> 
>  
> 
> For years we had no problems and the CentOS box would tick over day after
> day without as much as a hiccup. 
> 
>  
> 
> About three months ago a2 started to slow down, almost to the point of
> timing out when applications and users log on. The band-aid was to copy an
> earlier a2.raw backup over the current one on a regular basis, and it would
> rectify the problem. At first applying this band-aid on Sunday nights, would
> suffice. But, later we had to increase it to twice a week and these last
> couple of weeks we had to do it almost every night. The system also sent
> alerts that a "Degraded Array event had been detected on md device
> /dev/md1". Inspecting the drives showed no crisis. 
> 
>  
> 
> Today it folded completely and brought the system down, with clients' "our
> computers are down" response to their customers walking into their stores.
> Restarting the box just brought a2 to a paused state, never recovering. We
> had to killall to get rid of it.
> 
> Having nowhere else to go with it, I decided to rebuild a2 in another,
> separate drive to at least address the degraded array alerts. As I edited
> the .xml file, I saw the following:
> 
> <source file='/var/lib/libvirt/:machines/a2/a2-disk1.raw'/>
> 
> What the hell was that colon doing there? I checked the size of the .raw
> file. It has grown to over 96GB. Just to check the sanity-box, I checked the
> other VM's .xml files and they didn't have a colon, as I expected. 
> 
> I removed the colon and virsh-started a2, which fired up immediately, with
> the rest of the system following suit. No doubt that ":" was the culprit!
> 
> My question is: Would that colon cause an append-action to the .raw file? We
> have no idea when it got in there or how. We haven't worked on that xml file
> for a long time. Why would a2 even fire up at all?

QEMU's  -drive command line syntax allows for a ":" to denote use of a
particular QEMU block driver backend. So conceivably ":" in the filename
could confuse QEMU, but "/var/lib/libvirt/" would not be interpreted as
any kind of QEMU block driver AFAICT. In fact I'm rather puzzelled how
it would work at all unless you real do have a directory called
"/var/lib/libvirt/:machines" on your host.  I also can't explain why
QEMU would make that file grow arbitrarily. A raw file is a fixed size
from QEMU's pov and won't ever change unless you issue a "resize"
command to QEMU via libvirt.

Is there any way you might have some background job that is runing
that would resize the file either directly or by talking to libvirt
or QEMU ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list