[Linux-cluster] 2node cluster+qdiskd One Node link break to quorum disk gets rebooted

jose nuno neto jose.neto at liber4e.com
Thu Feb 18 15:12:33 UTC 2010


Hi

I have a 2node + qdisk ( on iSCSI) cluster conf.
Seems be ok, except for one scenario: 1node link break at quorum disk (
iscsi break)

On iSCSI link break node continues to work ok.
On iSCSI recovery node puts eth ifaces down, looses hearbeats and gest
fenced.
Is this Ok?

Thanks
Jose


Logs
###########################
iSCSI FAIL
Feb 18 11:44:49 node2 kernel:  connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 4297680222, last ping 4297685222, now
4297690222
Feb 18 11:44:49 node2 kernel:  connection2:0: detected conn error (1011)
Feb 18 11:44:50 node2 iscsid: Kernel reported iSCSI connection 2:0 error
(1011) state (3)
Feb 18 11:44:50 node2 iscsid: Kernel reported iSCSI connection 2:0 error
(1011) state (3)
Feb 18 11:44:52 node2 iscsid: connect to 172.26.244.4:3260 failed
(Connection refused)
Feb 18 11:44:52 node2 iscsid: connect to 172.26.244.4:3260 failed
(Connection refused)
Feb 18 11:44:56 node2 iscsid: connect to 172.26.244.4:3260 failed
(Connection refused)
Feb 18 11:44:56 node2 openais[6815]: [CMAN ] lost contact with quorum device
Feb 18 11:44:59 node2 iscsid: connect to 172.26.244.4:3260 failed
(Connection refused)


iSCSI RECOVER

Feb 18 11:52:47 node2 kernel: Synchronizing SCSI cache for disk sde:
Feb 18 11:52:47 node2 kernel: Synchronizing SCSI cache for disk sdd:
Feb 18 11:52:49 node2 kernel: FAILED
Feb 18 11:52:49 node2 kernel:   status = 1, message = 00, host = 0, driver
= 08
Feb 18 11:52:49 node2 kernel:   <6>sd: Current: sense key: Illegal Request
Feb 18 11:52:49 node2 kernel:     <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94
ASCQ=0x1
------------------------------

Feb 18 11:52:49 node2 kernel: bonding: bond0: backup interface eth5 is now
down
Feb 18 11:52:49 node2 kernel: ACPI: PCI interrupt for device 0000:1f:00.1
disabled
Feb 18 11:52:49 node2 kernel: bonding: bond0: link status down for active
interface eth0, disabling it
Feb 18 11:52:49 node2 kernel: bonding: bond0: now running without any
active interface !
Feb 18 11:52:49 node2 kernel: ACPI: PCI interrupt for device 0000:1f:00.0
disabled
Feb 18 11:52:49 node2 kernel: bonding: bond2: link status down for
interface eth1, disabling it in 2000 ms.
Feb 18 11:52:49 node2 kernel: ACPI: PCI interrupt for device 0000:10:00.1
disabled
Feb 18 11:52:49 node2 kernel: ACPI: PCI interrupt for device 0000:10:00.0
disabled
Feb 18 11:52:49 node2 kernel: bonding: bond2: link status down for
interface eth3, disabling it in 2000 ms.
Feb 18 11:52:50 node2 kernel: bonding: bond1: backup interface eth4 is now
down
Feb 18 11:52:50 node2 kernel: ACPI: PCI interrupt for device 0000:0a:00.1
disabled
Feb 18 11:52:50 node2 kernel: bonding: bond1: link status down for active
interface eth2, disabling it
Feb 18 11:52:50 node2 kernel: bonding: bond1: now running without any
active interface !
Feb 18 11:52:50 node2 kernel: ACPI: PCI interrupt for device 0000:0a:00.0
disabled


CleanNode Messages

Feb 18 11:53:02 node1 kernel: dlm: closing connection to node 2
--------
Feb 18 11:54:42 node1 fenced[7310]: node2.lux.eib.org not a cluster member
after 100 sec post_fail_delay
Feb 18 11:54:42 node1 fenced[7310]: fencing node "node2.lux.eib.org"
Feb 18 11:54:49 node1 fenced[7310]: fence "node2.lux.eib.org" success



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the Linux-cluster mailing list