[Ovirt-devel] node becomes "unavailable"

Tue Aug 17 07:37:07 UTC 2010

You can look at /var/log/ovirt-server/db-omatic.log . Probably the node 
times out because it does not answer to heartbeat anymore.

To get more detail you can run the db-omatic script in no-daemon mode 
(/usr/share/ovirt-server/db-omatic/db_omatic.rb -n)

I see that very often on fedora 13, a bit less on fedora 12.

This is because the ruby aqmp bindings get stuck when they have to 
handle too many threads.

There's no fix for this yet, but a workaround : whenever that happens, 
restart everything in the node and server with this script :

http://ovirt.pastebin.com/JjNpEDak
http://ovirt.pastebin.com/tPAPJBpB

You can put that script in a cron job.

On 08/17/2010 05:35 AM, Justin Clacherty wrote:
>    After running for a while the node becomes "unavailable" in the server
> UI.  All VMs running on that node also become unavailable.  The node is
> still running fine as are all the VMs, they're just no longer manageable.
>
> I looked on the node and everything appeared to be running fine.  Looked
> on the server and ovirt-taskomatic was stopped (this seems to happen
> quite a bit).  Restarted it but that didn't help.  Restarting Matahari
> on the node sends information to the server but the node does not become
> available.  The only way I've been able to get it back is to shutdown
> all the VMs and reboot the node and management server.  Is anyone else
> seeing this happen?  What else can I look at when it happens again?
>
> Cheers,
> Justin.
>
> _______________________________________________
> Ovirt-devel mailing list
> Ovirt-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/ovirt-devel
>