[Linux-cluster] iscsi qdisk failure cause reboot

brem belguebli brem.belguebli at gmail.com
Mon Apr 12 14:48:19 UTC 2010


Hi Jose,

check out the logs of the other nodes (the ones that remained alive)
to see if you don't have a message telling you that  the node was
killed "because it has rejoined the cluster with existing state"

Also, you could add a max_error_cycles="your value" to your <quorumd
device..../> in order to make qdisk exit after "your value" missed
cycles.

I have posted a message a few times ago about this feature
'max_error_cycles' not working, but I was wrong....Thx Lon

If your quorum device is multipathed, make sure you don't queue
(no_path_retry queue) as it won't generate an ioerror to the upper
layer (qdisk) and that the number of retries isn't higher than your
qdisk interval (in my setup, no_path_retry fail, which means immediate
ioerror).

Brem


 2010/4/12 jose nuno neto <jose.neto at liber4e.com>:
> Hi2All
>
> I have the following setup:
> .2node + qdisk ( iscsi w/ 2network paths and multipath )
>
> on qkdisk I have allow_kill=0 and reboot=1 since I have some heuristics
> and want to force some switching on network events
>
> the issue I'm facing now is that on iscsi problems on 1node ( network down
> for ex )
> I have no impact on cluster ( witch is ok for me ) but at recovery the
> node gets rebooted ( not fenced by the other node )
>
> If on iscsi going down, I do a qdisk stop, then iscsi recover, then qdisk
> start I get no reboot
>
> Is this proper qdisk behavior? It keeps track of some error and forces
> reboot?
>
> Thanks
> Jose
>
>
>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list