[Ovirt-devel] Qmf::Query Hang in db-omatic

Pierre-Gilles Mialon pmialon at linagora.com
Wed Dec 2 10:27:38 UTC 2009


	Hi all,

	We use for preproduction purpose the next branche of oVirt. We notice that a
lot of bugs appears when the number of message in qpidd increase. It seems
that qpidd is doing the job and that most of the issue are due to Qmf::Query .

	For example in db-omatic lines 265,296
    When you restart db-omatic, if you have multiple node, you have mutiple 
threads launch (line 266)  that hang on :
qmf_host = @qmfc.objects(Qmf::Query.new(:class => "node"), 'hostname' => host_info['hostname'])

The function never return. But qpidd never stop to answer correctly to the 
request done by ruby-qmf. 

A workarround for us consist to : 
 - stopping all the libvirt-qpid on every node, 
 - restarting db-omatic 
 - starting libvirt-qpid sequentially on every node. 

Doing this way work, and gave to us a concistent db for db-omatic.
What do you thing if we replace the Thread.new on line 266 by a begin ? Because
the concurrency of the requests on qpidd made by db-omatic seems the origin of
the hang. 

<code snipset of db-omatic lines 265,296>
 if state == Host::STATE_AVAILABLE
    Thread.new do
        @logger.info "#{host_info['hostname']} has moved to available, sleeping for updates to vms."
        sleep(20)

        # At this point we want to set all domains that are
        # unreachable to stopped.  We're using a thread here to
        # sleep for 10 seconds outside of the main dbomatic loop.
        # If after 10 seconds with this host up there are still
        # domains set to 'unreachable', then we're going to guess
        # the node rebooted and so the domains should be set to
        # stopped.
        @logger.info "Checking for dead VMs on newly available host #{host_info['hostname']}."

        # Double check to make sure this host is still up.
        begin
            qmf_host = @qmfc.objects(Qmf::Query.new(:class => "node"), 'hostname' => host_info['hostname'])
            if !qmf_host
                @logger.info "Host #{host_info['hostname']} is not up after waiting 20 seconds, skipping dead VM check."
            else
                db_vm = Vm.find(:all, :conditions => ["host_id = ? AND state = ?", db_host.id, Vm::STATE_UNREACHABLE])
                db_vm.each do |vm|
                    @logger.info "Moving vm #{vm.description} in state #{vm.state} to state stopped."
                    set_vm_stopped(vm)
                    vm.save!
                end
            end
        rescue Exception => e # just log any errors here
            @logger.info "Exception checking for dead VMs (could be normal): #{e.message}"
            @logger.info e.backtrace
        end
    end
end
</code>



-- 
Pierre-Gilles Mialon
Responsable hébergement :: Head of Hosting services
pmialon at linagora.com :: +33.1 58 18 65 46
Linagora :: http://www.linagora.com
27 rue de Berri :: 75008 PARIS
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/ovirt-devel/attachments/20091202/9ad48eb5/attachment.sig>


More information about the ovirt-devel mailing list