[Ovirt-devel] storage provisioning problems

Tue Jan 27 18:13:57 UTC 2009

On Tue, 27 Jan 2009 09:18:40 -0600
Eric Van Hensbergen <ericvh at gmail.com> wrote:

> On Mon, Jan 26, 2009 at 11:03 PM, Ian Main <imain at redhat.com> wrote:
> > On Mon, 26 Jan 2009 18:16:20 -0600
> > Eric Van Hensbergen <ericvh at gmail.com> wrote:
> >
> >> On Fri, Jan 23, 2009 at 11:06 AM, David Lutterkort <lutter at redhat.com> wrote:
> >> > On Fri, 2009-01-23 at 10:29 -0600, Eric Van Hensbergen wrote:
> >> >> I've been trying to deploy ovirt on a multi-node cluster for the past
> >> >> couple releases and always hit the same sticking point.
> >> >> Everything seems to come up fine, including the other nodes, but when
> >> >> I try to connect to an iSCSI server, the server gets stuck in
> >> >> pending_setup.
> >> >
> >> > You need to have a node in the same hardware pool turned on (i.e.,
> >> > listed as available and enabled) - if that's the case, and the pool
> >> > still stays in pending_setup, check the logs, both on the node and on
> >> > the server (in /var/log/ovirt-server/*)
> >> >
> >>
> >> Okay - I now notice that the hosts that I booted are showing up as
> >> unavailable (enabled).
> >> So, what's the magic here?  Is it the requirement that I manually
> >> install qemu on the appliance and nodes that Jeremy mentioned?
> >> qemu-kvm seems to be present on the nodes, but not on the appliance.
> >
> > Hrrm, that's a bit tricky to debug.  You can check the logs
> > in /var/log/ovirt-server/db-omatic.log and look for clues.
> 
> I see a bunch of sequences along the lines of:
> Wed Jan 21 23:40:44 +0000 2009: Marking object of type node as in service.
> Wed Jan 21 23:40:44 +0000 2009: Marking host node103.priv.ovirt.org as
> state available.
> ....
> Thu Jan 22 16:08:29 +0000 2009: db_omatic started.
> Thu Jan 22 16:08:31 +0000 2009: Marking host node103.priv.ovirt.org unavailable
> 
> Not sure what happened here - perhaps I should go for a complete
> reinstall of the appliance and look at the logs with only a single
> node to get a better idea of what is going on.
> 
> > The other
> > thing to try is to run
> > 'ruby /usr/share/ovirt-server/qmf-libvirt-example.rb' and you should
> > see a list of servers with various attributes printed out.  If you just
> > get ---'s then according to qpid, you don't have any nodes connected.
> 
> I get a bunch of nodes, the properties for the previously mentioned node are...
> ...
> node: node103.priv.ovirt.org
>   property: hostname, node103.priv.ovirt.org
>   property: uri, qemu:///system
>   property: libvirtVersion, 0.5.1
>   property: apiVersion, 0.5.1
>   property: hypervisorVersion, Unknown
>   property: hypervisorType, QEMU
>   property: model, x86_64
>   property: memory, 4060568
>   property: cpus, 4
>   property: mhz, 3000
>   property: nodes, 1
>   property: sockets, 2
>   property: cores, 2
>   property: threads, 1
> 
> > >From there you need to get on the node(s) and make sure that
> > libvirt-qpid is running ok.
> >
> 
> Ah, okay, this must be the ticket -- it is not running.
> I restarted it by hand and get:
> 
> Tue Jan 27 15:12:31 +0000 2009: Marking host node103.priv.ovirt.org as
> state available.
> 
> in the db-o-matic log.
> 
> Strangly, when I go to the web interface now, I see every node except
> for node103 marked as available(enabled) and node103 marked as
> unavailable with no corresponding entry in the db-omatic log --
> although there are a ton of:
> 
> Tue Jan 27 15:04:26 +0000 2009: Marking object of type pool as in service.
> (one for about every 5 seconds since the cluster was brought online)
> -- which may be normal I guess, but seems like a lot of log data.
> 
> Also (back to the original problem) - even though now I have multiple
> nodes that are available(enabled), storage is still marked as
> pending_setup.

Bleh.. hehe.  I think the trick might be to restart db-omatic.  There
is a bug there actually which causes db-omatic to mark the node as
unavailable if you kill libvirt-qpid.  You'll start a second instance
of libvirt-qpid but it still times out after 30 seconds from not being
able to reach the previous instance and marks it as unavailable again.
I wasn't too worried about it because it's user meddling but it's
showing itself here.

If your nodes are showing up in the qmf example then they should be
marked as available unless db-omatic is screwing up somehow.

Would you be willing to build the latest version?  There were big
changes to taskomatic in that time and I'm curious to see how it works
for you.. especially the iscsi part.  This is something I'm going to be
revisiting shortly to make sure it works really well so your help in
debugging this is greatly appreciated.

Thanks for your time on this!

	Ian