[rhelv6-list] KVM issues post RHEL6-1->6.2 update

Thu Dec 8 09:31:39 UTC 2011

Just thought I'd share my experiences of updating a KVM host and guests this 
morning.  I'll acknowledge up front that I didn't do things in the right 
order so the mistakes were mine.

Start: RHEL6.1 KVM host, x2 RHEL6.1 guests using .img files (LVM partitions 
inside).  Fully up to date as of just before the RHEL6.2 errata release.

I did "yum clean all ; yum update" on both the host and the guests at the 
same time (yeah, I know).  In my defence, a seemingly identical setup I did 
this on yesterday worked without issues.

At the point at which the host was completing its cleanup this happened in 
/var/log/messages:

Dec  8 07:14:47 frazil libvirtd: 07:14:47.926: 14778: warning : qemudDispatchSignalEvent:403 : Shutting down on signal 15
Dec  8 07:14:49 frazil yum[1235]: Updated: libvirt-0.9.4-23.el6_2.1.x86_64

and further down

  Dec  8 07:15:00 frazil kernel: br1: port 2(vnet1) entering disabled state
  Dec  8 07:15:00 frazil kernel: device vnet1 left promiscuous mode
  Dec  8 07:15:00 frazil kernel: br1: port 2(vnet1) entering disabled state
  Dec  8 07:15:02 frazil ntpd[2194]: Deleting interface #23 vnet1, fe80::fc54:ff:fe01:6b3b#123, interface stats: received=0, sent=0, dropped=0, active_time=7241352 secs
  Dec  8 07:15:05 frazil kernel: br0: port 2(vnet0) entering disabled state
  Dec  8 07:15:05 frazil kernel: device vnet0 left promiscuous mode
  Dec  8 07:15:05 frazil kernel: br0: port 2(vnet0) entering disabled state
  Dec  8 07:15:07 frazil ntpd[2194]: Deleting interface #25 vnet0, fe80::fc54:ff:fe49:fae6#123, interface stats: received=0, sent=0, dropped=0, active_time=7238050 secs

At this point I lost connection to the guests, which (according to the SSH 
connections I had open to them) had apparently finished cleaning up after 
the yum update (according to the right-hand side X/Y counter) but hadn't 
returned a prompt yet so were obviously still busy doing stuff.

I guess the restart of the libvirtd service dropped the guests (except the 
same lines appear in the messages file of the server on which the guests 
didn't get killed).

Given I was rebooting the host anyway I didn't bother to bring the guests 
back up again and rebooted the host (yeah, I know).  On reboot neither of 
the guests autostarted, so I logged in to the host and tried to start them 
with "virsh start <domain>".  Both complained that

  error: internal error unable to reserve PCI address 0:0:2.0

and didn't start.  Checking the .xml files for both guests I noted that

  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>

was listed for the 'disk' device.  I also noticed that the following lines 
were missing

  <input type='mouse' bus='ps2'/>
  <graphics type='vnc' port='5901' autoport='no'/>
  <video>
    <model type='cirrus' vram='9216' heads='1'/>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
  </video>

whereas they were in place for the KVM setup/host and guests which had 
successfully updated.  I added in the lines, made the 'disk' PCI ID 
something else and after restarting libvirtd tried booting the guests again. 
Still no joy.  Still the same error.  In the end I commented out the 
"address type='pci'" line for 'video' and attempted to boot again.  This 
time I got failures booting the newly installed kernel at the point at which 
the root LVM mount was attempted.  It recommended I look at the "root=" part 
of the boot line, but didn't give me suggestions as to what to put there.

At this point I tried mounting the guests' disk images to see if the update 
of the kernel hadn't worked fully and the grub.conf was in a mess:

  # losetup /dev/loop0 foo.img
  # kpartx -av /dev/loop0
  # mount /dev/mapper/loop0p1 /mnt
  ...
  # umount /mnt
  # kpartx -dv /dev/loop0
  # losetup -d /dev/loop0

Once inside the image I looked at the grub.conf files and couldn't see any 
issues.  I umounted the image and tried booting into an older kernel and the 
guests booted successfully.  "yum update" indicated an incomplete 
transaction so I ran "yum-complete-transaction" and then "yum update kernel" 
and rebooted both guests successfully into the new kernel.  All now 
seems well.  Phew.

My questions are:

1) Is it a bad idea to patch the host's libvirtd while guests are running?
2) Should libvirtd have killed the guests like that?
3) With this update to KVM/qemu/libvird are "address type='pci'" now 
unnecessary and removable from /etc/libvirt/qemu/<domain>.xml files as PCI 
IDs are now dynamically assigned?

Ben
-- 
Unix Support, MISD, University of Cambridge, England
Plugger of wire, typer of keyboard, imparter of Clue
         Life Is Short.          It's All Good.