[Linux-cluster] rhel 6.2 network bonding interface in cluster environment

SATHYA - IT sathyanarayanan.varadharajan at precisionit.co.in
Mon Jan 9 06:18:22 UTC 2012


Not sure whether you received the logs and cluster.conf file. Herewith
pasting the same...

On File Server1:

Jan  8 03:15:04 filesrv1 kernel: imklog 4.6.2, log source = /proc/kmsg
started.
Jan  8 03:15:04 filesrv1 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8765" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:52:42 filesrv1 kernel: imklog 4.6.2, log source = /proc/kmsg
started.
Jan  8 10:52:42 filesrv1 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8751" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:52:42 filesrv1 kernel: Initializing cgroup subsys cpuset
Jan  8 10:52:42 filesrv1 kernel: Initializing cgroup subsys cpu
Jan  8 10:52:42 filesrv1 kernel: Linux version 2.6.32-220.el6.x86_64
(mockbuild at x86-004.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red
Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011
Jan  8 10:52:42 filesrv1 kernel: Command line: ro
root=/dev/mapper/vg01-LogVol01 rd_LVM_LV=vg01/LogVol01
rd_LVM_LV=vg01/LogVol00 rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8
SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=128M rhgb
quiet acpi=off
Jan  8 10:52:42 filesrv1 kernel: KERNEL supported cpus:
Jan  8 10:52:42 filesrv1 kernel:  Intel GenuineIntel
Jan  8 10:52:42 filesrv1 kernel:  AMD AuthenticAMD
Jan  8 10:52:42 filesrv1 kernel:  Centaur CentaurHauls
Jan  8 10:52:42 filesrv1 kernel: BIOS-provided physical RAM map:
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 0000000000000000 -
000000000009f400 (usable)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 000000000009f400 -
00000000000a0000 (reserved)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000000f0000 -
0000000000100000 (reserved)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 0000000000100000 -
00000000d762f000 (usable)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000d762f000 -
00000000d763c000 (ACPI data)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000d763c000 -
00000000d763d000 (usable)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000d763d000 -
00000000dc000000 (reserved)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000fec00000 -
00000000fee10000 (reserved)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000ff800000 -
0000000100000000 (reserved)
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 0000000100000000 -
00000008a7fff000 (usable)
Jan  8 10:52:42 filesrv1 kernel: DMI 2.7 present.
Jan  8 10:52:42 filesrv1 kernel: SMBIOS version 2.7 @ 0xF4F40
Jan  8 10:52:42 filesrv1 kernel: last_pfn = 0x8a7fff max_arch_pfn =
0x400000000
Jan  8 10:52:42 filesrv1 kernel: x86 PAT enabled: cpu 0, old
0x7040600070406, new 0x7010600070106
Jan  8 10:52:42 filesrv1 kernel: last_pfn = 0xd763d max_arch_pfn =
0x400000000
.
.

On File Server 2:

Jan  8 03:09:06 filesrv2 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8648" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:48:07 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down
Jan  8 10:48:07 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down
Jan  8 10:48:07 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it
Jan  8 10:48:07 filesrv2 kernel: bonding: bond1: now running without any
active interface !
Jan  8 10:48:07 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it
Jan  8 10:48:09 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:09 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:48:09 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:48:09 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:09 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:09 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 1000 Mbps full duplex.
Jan  8 10:48:15 filesrv2 corosync[8933]:   [TOTEM ] A processor failed,
forming new configuration.
Jan  8 10:48:17 filesrv2 corosync[8933]:   [QUORUM] Members[1]: 2
Jan  8 10:48:17 filesrv2 corosync[8933]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  8 10:48:17 filesrv2 rgmanager[12557]: State change: clustsrv1 DOWN
Jan  8 10:48:17 filesrv2 corosync[8933]:   [CPG   ] chosen downlist: sender
r(0) ip(10.0.0.20) ; members(old:2 left:1)
Jan  8 10:48:17 filesrv2 corosync[8933]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  8 10:48:17 filesrv2 kernel: dlm: closing connection to node 1
Jan  8 10:48:17 filesrv2 fenced[8989]: fencing node clustsrv1
Jan  8 10:48:17 filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Trying to
acquire journal lock...
Jan  8 10:48:17 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Trying to
acquire journal lock...
Jan  8 10:48:24 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down
Jan  8 10:48:24 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down
Jan  8 10:48:24 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it
Jan  8 10:48:24 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it
Jan  8 10:48:24 filesrv2 kernel: bonding: bond1: now running without any
active interface !
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 100 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:25 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 100 Mbps full duplex.
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 100 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:25 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 100 Mbps full duplex.
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: now running without any
active interface !
Jan  8 10:48:27 filesrv2 fenced[8989]: fence clustsrv1 success
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Done
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Trying to
acquire journal lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Trying
to acquire journal lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Done
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:28 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Acquiring
the transaction lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Replaying
journal...
Jan  8 10:48:28 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 1000 Mbps full duplex.
Jan  8 10:48:28 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:48:28 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:28 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Replayed
29140 of 29474 blocks
Jan  8 10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Found 334
revoke tags
Jan  8 10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Journal
replayed in 2s
Jan  8 10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Done
Jan  8 10:49:01 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down
Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it
Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:49:01 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down
Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it
Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: now running without any
active interface !
Jan  8 10:49:03 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:49:03 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 1000 Mbps full duplex.
Jan  8 10:49:03 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:49:03 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:49:04 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:49:04 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Looking
at journal...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Acquiring the transaction lock...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Replaying journal...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Replayed 0 of 0 blocks
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Found 0
revoke tags
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Journal
replayed in 0s
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Done
Jan  8 10:52:37 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down
Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it
Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:52:38 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down
Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it
Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: now running without any
active interface !
Jan  8 10:52:40 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:52:40 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON
Jan  8 10:52:40 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 1000 Mbps full duplex.
Jan  8 10:52:40 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:52:40 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:52:40 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:52:44 filesrv2 corosync[8933]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  8 10:52:44 filesrv2 corosync[8933]:   [QUORUM] Members[2]: 1 2
Jan  8 10:52:44 filesrv2 corosync[8933]:   [QUORUM] Members[2]: 1 2
Jan  8 10:52:44 filesrv2 corosync[8933]:   [CPG   ] chosen downlist: sender
r(0) ip(10.0.0.10) ; members(old:1 left:0)
Jan  8 10:52:44 filesrv2 corosync[8933]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  8 10:52:51 filesrv2 kernel: dlm: got connection from 1
Jan  8 10:55:57 filesrv2 kernel: INFO: task gfs2_quotad:9389 blocked for
more than 120 seconds.
Jan  8 10:55:57 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 10:55:57 filesrv2 kernel: gfs2_quotad   D ffff8808a7824900     0
9389      2 0x00000080
Jan  8 10:55:57 filesrv2 kernel: ffff88087580da88 0000000000000046
0000000000000000 00000000000001c3
Jan  8 10:55:57 filesrv2 kernel: ffff88087580da18 ffff88087580da50
ffffffff810ea694 ffff88088b184080
Jan  8 10:55:57 filesrv2 kernel: ffff88088e71a5f8 ffff88087580dfd8
000000000000f4e8 ffff88088e71a5f8
Jan  8 10:55:57 filesrv2 kernel: Call Trace:
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810ea694>] ?
rb_reserve_next_event+0xb4/0x370
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff814eefb5>]
rwsem_down_failed_common+0x95/0x1d0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff814ef146>]
rwsem_down_read_failed+0x26/0x30
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81276e04>]
call_rwsem_down_read_failed+0x14/0x30
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff814ee644>] ? down_read+0x24/0x30
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa04fe4b2>] dlm_lock+0x62/0x1e0
[dlm]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810eab02>] ?
ring_buffer_lock_reserve+0xa2/0x160
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0550d62>] gdlm_lock+0xf2/0x130
[gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0550e60>] ? gdlm_ast+0x0/0xe0
[gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0550da0>] ? gdlm_bast+0x0/0x50
[gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa053430f>] do_xmote+0x17f/0x260
[gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa05344e1>] run_queue+0xf1/0x1d0
[gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0534807>]
gfs2_glock_nq+0x1b7/0x360 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8107cb7b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa054db88>]
gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff814ed84a>] ?
schedule_timeout+0x19a/0x2e0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa054db80>] ?
gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0545bb7>]
quotad_check_timeo+0x57/0xb0 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0545e44>]
gfs2_quotad+0x234/0x2b0 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0545c10>] ?
gfs2_quotad+0x0/0x2b0 [gfs2]
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan  8 10:57:57 filesrv2 kernel: INFO: task gfs2_quotad:9389 blocked for
more than 120 seconds.
Jan  8 10:57:57 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 10:57:57 filesrv2 kernel: gfs2_quotad   D ffff8808a7824900     0
9389      2 0x00000080
Jan  8 10:57:57 filesrv2 kernel: ffff88087580da88 0000000000000046
0000000000000000 00000000000001c3
Jan  8 10:57:57 filesrv2 kernel: ffff88087580da18 ffff88087580da50
ffffffff810ea694 ffff88088b184080
Jan  8 10:57:57 filesrv2 kernel: ffff88088e71a5f8 ffff88087580dfd8
000000000000f4e8 ffff88088e71a5f8
Jan  8 10:57:57 filesrv2 kernel: Call Trace:
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810ea694>] ?
rb_reserve_next_event+0xb4/0x370
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff814eefb5>]
rwsem_down_failed_common+0x95/0x1d0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff814ef146>]
rwsem_down_read_failed+0x26/0x30
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81276e04>]
call_rwsem_down_read_failed+0x14/0x30
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff814ee644>] ? down_read+0x24/0x30
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa04fe4b2>] dlm_lock+0x62/0x1e0
[dlm]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810eab02>] ?
ring_buffer_lock_reserve+0xa2/0x160
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0550d62>] gdlm_lock+0xf2/0x130
[gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0550e60>] ? gdlm_ast+0x0/0xe0
[gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0550da0>] ? gdlm_bast+0x0/0x50
[gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa053430f>] do_xmote+0x17f/0x260
[gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa05344e1>] run_queue+0xf1/0x1d0
[gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0534807>]
gfs2_glock_nq+0x1b7/0x360 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8107cb7b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa054db88>]
gfs2_statfs_sync+0x58/0x1b0 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff814ed84a>] ?
schedule_timeout+0x19a/0x2e0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa054db80>] ?
gfs2_statfs_sync+0x50/0x1b0 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0545bb7>]
quotad_check_timeo+0x57/0xb0 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0545e44>]
gfs2_quotad+0x234/0x2b0 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0545c10>] ?
gfs2_quotad+0x0/0x2b0 [gfs2]
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan  8 10:59:22 filesrv2 rgmanager[12557]: State change: clustsrv1 UP

Cluster.conf File:

<?xml version="1.0"?>
<cluster config_version="8" name="samba">
	<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
	<clusternodes>
		<clusternode name="clustsrv1" nodeid="1" votes="1">
			<fence>
				<method name="fenceilo1">
					<device name="ilosrv1"/>
				</method>
			</fence>
			<unfence>
			        <device action="on" name="ilosrv2"/>
		        </unfence> 
		</clusternode>
		<clusternode name="clustsrv2" nodeid="2" votes="1">
			<fence>
				<method name="fenceilo2">
					<device name="ilosrv2"/>
				</method>
			</fence>
			<unfence>
			        <device action="on" name="ilosrv2"/>
			</unfence> 
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.129.157"
lanplus="1" login="fence" name="ilosrv1" passwd="xxxxxxx"/>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.129.158"
lanplus="1" login="fence" name="ilosrv2" passwd="xxxxxxx"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources/>
	</rm>
</cluster>


Thanks

Sathya Narayanan V
Solution Architect	


-----Original Message-----
From: SATHYA - IT [mailto:sathyanarayanan.varadharajan at precisionit.co.in] 
Sent: Monday, January 09, 2012 11:21 AM
To: 'Digimer'; 'linux clustering'
Subject: Re: [Linux-cluster] rhel 6.2 network bonding interface in cluster
environment

Hi,

Herewith attaching the /var/log/messages of both the servers. Yesterday
(08th Jan) one of the server got fenced by other around 10:48 AM. I am also
attaching the cluster.conf file for your reference. 

On the related note, related to heartbeat - I am referring the channel used
by corosync. And the name which has been configured in cluster.conf file
resolves with bond1 only.

Related to the network card, we are using 2 dual port card where we
configured 1 port from each for bond0 and 1 port from the other for bond1.
So it doesn't seems be a network card related issue. Moreover, we are not
having any errors related to bond0.

Thanks

Sathya Narayanan V
Solution Architect	


This communication may contain confidential information. 
If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication.. 
Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email. 
Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group. 
All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use.




More information about the Linux-cluster mailing list