[Linux-cluster] rhel 6.2 network bonding interface in cluster environment

SATHYA - IT sathyanarayanan.varadharajan at precisionit.co.in
Mon Jan 9 10:43:15 UTC 2012


Klaus,

For your point the corosync network is not connected to the switch. They are
connected directly to the servers (server to server).

Thanks

Sathya Narayanan V
Solution Architect	

-----Original Message-----
From: SATHYA - IT [mailto:sathyanarayanan.varadharajan at precisionit.co.in] 
Sent: Monday, January 09, 2012 11:48 AM
To: 'Digimer'; 'linux clustering'
Subject: RE: [Linux-cluster] rhel 6.2 network bonding interface in cluster
environment

Not sure whether you received the logs and cluster.conf file. Herewith
pasting the same...

On File Server1:

Jan  8 03:15:04 filesrv1 kernel: imklog 4.6.2, log source = /proc/kmsg
started.
Jan  8 03:15:04 filesrv1 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8765" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:52:42 filesrv1 kernel: imklog 4.6.2, log source = /proc/kmsg
started.
Jan  8 10:52:42 filesrv1 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8751" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:52:42 filesrv1 kernel: Initializing cgroup subsys cpuset Jan  8
10:52:42 filesrv1 kernel: Initializing cgroup subsys cpu Jan  8 10:52:42
filesrv1 kernel: Linux version 2.6.32-220.el6.x86_64
(mockbuild at x86-004.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red
Hat 4.4.5-6) (GCC) ) #1 SMP Wed Nov 9 08:03:13 EST 2011 Jan  8 10:52:42
filesrv1 kernel: Command line: ro root=/dev/mapper/vg01-LogVol01
rd_LVM_LV=vg01/LogVol01 rd_LVM_LV=vg01/LogVol00 rd_NO_LUKS rd_NO_MD rd_NO_DM
LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us
crashkernel=128M rhgb quiet acpi=off Jan  8 10:52:42 filesrv1 kernel: KERNEL
supported cpus:
Jan  8 10:52:42 filesrv1 kernel:  Intel GenuineIntel Jan  8 10:52:42
filesrv1 kernel:  AMD AuthenticAMD Jan  8 10:52:42 filesrv1 kernel:  Centaur
CentaurHauls Jan  8 10:52:42 filesrv1 kernel: BIOS-provided physical RAM
map:
Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 0000000000000000 -
000000000009f400 (usable) Jan  8 10:52:42 filesrv1 kernel: BIOS-e820:
000000000009f400 - 00000000000a0000 (reserved) Jan  8 10:52:42 filesrv1
kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Jan  8
10:52:42 filesrv1 kernel: BIOS-e820: 0000000000100000 - 00000000d762f000
(usable) Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000d762f000 -
00000000d763c000 (ACPI data) Jan  8 10:52:42 filesrv1 kernel: BIOS-e820:
00000000d763c000 - 00000000d763d000 (usable) Jan  8 10:52:42 filesrv1
kernel: BIOS-e820: 00000000d763d000 - 00000000dc000000 (reserved) Jan  8
10:52:42 filesrv1 kernel: BIOS-e820: 00000000fec00000 - 00000000fee10000
(reserved) Jan  8 10:52:42 filesrv1 kernel: BIOS-e820: 00000000ff800000 -
0000000100000000 (reserved) Jan  8 10:52:42 filesrv1 kernel: BIOS-e820:
0000000100000000 - 00000008a7fff000 (usable) Jan  8 10:52:42 filesrv1
kernel: DMI 2.7 present.
Jan  8 10:52:42 filesrv1 kernel: SMBIOS version 2.7 @ 0xF4F40 Jan  8
10:52:42 filesrv1 kernel: last_pfn = 0x8a7fff max_arch_pfn = 0x400000000 Jan
8 10:52:42 filesrv1 kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new
0x7010600070106 Jan  8 10:52:42 filesrv1 kernel: last_pfn = 0xd763d
max_arch_pfn = 0x400000000 .
.

On File Server 2:

Jan  8 03:09:06 filesrv2 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.2" x-pid="8648" x-info="http://www.rsyslog.com"] (re)start
Jan  8 10:48:07 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down Jan  8 10:48:07 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper
Link is Down Jan  8 10:48:07 filesrv2 kernel: bonding: bond1: link status
definitely down for interface eth3, disabling it Jan  8 10:48:07 filesrv2
kernel: bonding: bond1: now running without any active interface !
Jan  8 10:48:07 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth4, disabling it Jan  8 10:48:09 filesrv2 kernel: bnx2
0000:04:00.0: eth4: NIC Copper Link is Up, 1000 Mbps full duplex, receive &
transmit flow control ON Jan  8 10:48:09 filesrv2 kernel: bond1: link status
definitely up for interface eth4, 1000 Mbps full duplex.
Jan  8 10:48:09 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:48:09 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:09 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON Jan  8
10:48:09 filesrv2 kernel: bond1: link status definitely up for interface
eth3, 1000 Mbps full duplex.
Jan  8 10:48:15 filesrv2 corosync[8933]:   [TOTEM ] A processor failed,
forming new configuration.
Jan  8 10:48:17 filesrv2 corosync[8933]:   [QUORUM] Members[1]: 2
Jan  8 10:48:17 filesrv2 corosync[8933]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  8 10:48:17 filesrv2 rgmanager[12557]: State change: clustsrv1 DOWN
Jan  8 10:48:17 filesrv2 corosync[8933]:   [CPG   ] chosen downlist: sender
r(0) ip(10.0.0.20) ; members(old:2 left:1)
Jan  8 10:48:17 filesrv2 corosync[8933]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  8 10:48:17 filesrv2 kernel: dlm: closing connection to node 1 Jan  8
10:48:17 filesrv2 fenced[8989]: fencing node clustsrv1 Jan  8 10:48:17
filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Trying to acquire journal
lock...
Jan  8 10:48:17 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Trying to
acquire journal lock...
Jan  8 10:48:24 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Down Jan  8 10:48:24 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper
Link is Down Jan  8 10:48:24 filesrv2 kernel: bonding: bond1: link status
definitely down for interface eth3, disabling it Jan  8 10:48:24 filesrv2
kernel: bonding: bond1: link status definitely down for interface eth4,
disabling it Jan  8 10:48:24 filesrv2 kernel: bonding: bond1: now running
without any active interface !
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 100 Mbps full duplex, receive & transmit flow control ON Jan  8 10:48:25
filesrv2 kernel: bond1: link status definitely up for interface eth4, 100
Mbps full duplex.
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: making interface eth4 the
new active one.
Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 100 Mbps full duplex, receive & transmit flow control ON Jan  8 10:48:25
filesrv2 kernel: bond1: link status definitely up for interface eth3, 100
Mbps full duplex.
Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down Jan  8 10:48:25 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper
Link is Down Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: link status
definitely down for interface eth3, disabling it Jan  8 10:48:25 filesrv2
kernel: bonding: bond1: link status definitely down for interface eth4,
disabling it Jan  8 10:48:25 filesrv2 kernel: bonding: bond1: now running
without any active interface !
Jan  8 10:48:27 filesrv2 fenced[8989]: fence clustsrv1 success Jan  8
10:48:28 filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:ctdb.0: jid=1: Done Jan  8
10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Trying to acquire
journal lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Trying
to acquire journal lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen02.0: jid=1: Done Jan
8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Looking at
journal...
Jan  8 10:48:28 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON Jan  8
10:48:28 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is Up,
1000 Mbps full duplex, receive & transmit flow control ON Jan  8 10:48:28
filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Acquiring the transaction
lock...
Jan  8 10:48:28 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Replaying
journal...
Jan  8 10:48:28 filesrv2 kernel: bond1: link status definitely up for
interface eth3, 1000 Mbps full duplex.
Jan  8 10:48:28 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:48:28 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:48:28 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Replayed
29140 of 29474 blocks Jan  8 10:48:30 filesrv2 kernel: GFS2:
fsid=samba:gen01.0: jid=1: Found 334 revoke tags Jan  8 10:48:30 filesrv2
kernel: GFS2: fsid=samba:gen01.0: jid=1: Journal replayed in 2s Jan  8
10:48:30 filesrv2 kernel: GFS2: fsid=samba:gen01.0: jid=1: Done Jan  8
10:49:01 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is Down
Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it Jan  8 10:49:01 filesrv2 kernel: bonding:
bond1: making interface eth4 the new active one.
Jan  8 10:49:01 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down Jan  8 10:49:01 filesrv2 kernel: bonding: bond1: link status definitely
down for interface eth4, disabling it Jan  8 10:49:01 filesrv2 kernel:
bonding: bond1: now running without any active interface !
Jan  8 10:49:03 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON Jan  8
10:49:03 filesrv2 kernel: bond1: link status definitely up for interface
eth3, 1000 Mbps full duplex.
Jan  8 10:49:03 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:49:03 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:49:04 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON Jan  8
10:49:04 filesrv2 kernel: bond1: link status definitely up for interface
eth4, 1000 Mbps full duplex.
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Looking
at journal...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Acquiring the transaction lock...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Replaying journal...
Jan  8 10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1:
Replayed 0 of 0 blocks Jan  8 10:50:13 filesrv2 kernel: GFS2:
fsid=samba:hadata02.0: jid=1: Found 0 revoke tags Jan  8 10:50:13 filesrv2
kernel: GFS2: fsid=samba:hadata02.0: jid=1: Journal replayed in 0s Jan  8
10:50:13 filesrv2 kernel: GFS2: fsid=samba:hadata02.0: jid=1: Done Jan  8
10:52:37 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is Down
Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: link status definitely down
for interface eth3, disabling it Jan  8 10:52:38 filesrv2 kernel: bonding:
bond1: making interface eth4 the new active one.
Jan  8 10:52:38 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is
Down Jan  8 10:52:38 filesrv2 kernel: bonding: bond1: link status definitely
down for interface eth4, disabling it Jan  8 10:52:38 filesrv2 kernel:
bonding: bond1: now running without any active interface !
Jan  8 10:52:40 filesrv2 kernel: bnx2 0000:03:00.1: eth3: NIC Copper Link is
Up, 1000 Mbps full duplex, receive & transmit flow control ON Jan  8
10:52:40 filesrv2 kernel: bnx2 0000:04:00.0: eth4: NIC Copper Link is Up,
1000 Mbps full duplex, receive & transmit flow control ON Jan  8 10:52:40
filesrv2 kernel: bond1: link status definitely up for interface eth3, 1000
Mbps full duplex.
Jan  8 10:52:40 filesrv2 kernel: bonding: bond1: making interface eth3 the
new active one.
Jan  8 10:52:40 filesrv2 kernel: bonding: bond1: first active interface up!
Jan  8 10:52:40 filesrv2 kernel: bond1: link status definitely up for
interface eth4, 1000 Mbps full duplex.
Jan  8 10:52:44 filesrv2 corosync[8933]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Jan  8 10:52:44 filesrv2 corosync[8933]:   [QUORUM] Members[2]: 1 2
Jan  8 10:52:44 filesrv2 corosync[8933]:   [QUORUM] Members[2]: 1 2
Jan  8 10:52:44 filesrv2 corosync[8933]:   [CPG   ] chosen downlist: sender
r(0) ip(10.0.0.10) ; members(old:1 left:0)
Jan  8 10:52:44 filesrv2 corosync[8933]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Jan  8 10:52:51 filesrv2 kernel: dlm: got connection from 1 Jan  8 10:55:57
filesrv2 kernel: INFO: task gfs2_quotad:9389 blocked for more than 120
seconds.
Jan  8 10:55:57 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 10:55:57 filesrv2 kernel: gfs2_quotad   D ffff8808a7824900     0
9389      2 0x00000080
Jan  8 10:55:57 filesrv2 kernel: ffff88087580da88 0000000000000046
0000000000000000 00000000000001c3 Jan  8 10:55:57 filesrv2 kernel:
ffff88087580da18 ffff88087580da50 ffffffff810ea694 ffff88088b184080 Jan  8
10:55:57 filesrv2 kernel: ffff88088e71a5f8 ffff88087580dfd8 000000000000f4e8
ffff88088e71a5f8 Jan  8 10:55:57 filesrv2 kernel: Call Trace:
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810ea694>] ?
rb_reserve_next_event+0xb4/0x370 Jan  8 10:55:57 filesrv2 kernel:
[<ffffffff81013563>] ? native_sched_clock+0x13/0x60 Jan  8 10:55:57 filesrv2
kernel: [<ffffffff814eefb5>] rwsem_down_failed_common+0x95/0x1d0
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60 Jan  8 10:55:57 filesrv2 kernel:
[<ffffffff814ef146>] rwsem_down_read_failed+0x26/0x30 Jan  8 10:55:57
filesrv2 kernel: [<ffffffff81276e04>] call_rwsem_down_read_failed+0x14/0x30
Jan  8 10:55:57 filesrv2 kernel: [<ffffffff814ee644>] ? down_read+0x24/0x30
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa04fe4b2>] dlm_lock+0x62/0x1e0
[dlm] Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810eab02>] ?
ring_buffer_lock_reserve+0xa2/0x160
Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0550d62>] gdlm_lock+0xf2/0x130
[gfs2] Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0550e60>] ?
gdlm_ast+0x0/0xe0 [gfs2] Jan  8 10:55:57 filesrv2 kernel:
[<ffffffffa0550da0>] ? gdlm_bast+0x0/0x50 [gfs2] Jan  8 10:55:57 filesrv2
kernel: [<ffffffffa053430f>] do_xmote+0x17f/0x260 [gfs2] Jan  8 10:55:57
filesrv2 kernel: [<ffffffffa05344e1>] run_queue+0xf1/0x1d0 [gfs2] Jan  8
10:55:57 filesrv2 kernel: [<ffffffffa0534807>] gfs2_glock_nq+0x1b7/0x360
[gfs2] Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8107cb7b>] ?
try_to_del_timer_sync+0x7b/0xe0 Jan  8 10:55:57 filesrv2 kernel:
[<ffffffffa054db88>] gfs2_statfs_sync+0x58/0x1b0 [gfs2] Jan  8 10:55:57
filesrv2 kernel: [<ffffffff814ed84a>] ? schedule_timeout+0x19a/0x2e0 Jan  8
10:55:57 filesrv2 kernel: [<ffffffffa054db80>] ? gfs2_statfs_sync+0x50/0x1b0
[gfs2] Jan  8 10:55:57 filesrv2 kernel: [<ffffffffa0545bb7>]
quotad_check_timeo+0x57/0xb0 [gfs2] Jan  8 10:55:57 filesrv2 kernel:
[<ffffffffa0545e44>] gfs2_quotad+0x234/0x2b0 [gfs2] Jan  8 10:55:57 filesrv2
kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 Jan  8
10:55:57 filesrv2 kernel: [<ffffffffa0545c10>] ? gfs2_quotad+0x0/0x2b0
[gfs2] Jan  8 10:55:57 filesrv2 kernel: [<ffffffff81090886>]
kthread+0x96/0xa0 Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8100c14a>]
child_rip+0xa/0x20 Jan  8 10:55:57 filesrv2 kernel: [<ffffffff810907f0>] ?
kthread+0x0/0xa0 Jan  8 10:55:57 filesrv2 kernel: [<ffffffff8100c140>] ?
child_rip+0x0/0x20 Jan  8 10:57:57 filesrv2 kernel: INFO: task
gfs2_quotad:9389 blocked for more than 120 seconds.
Jan  8 10:57:57 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 10:57:57 filesrv2 kernel: gfs2_quotad   D ffff8808a7824900     0
9389      2 0x00000080
Jan  8 10:57:57 filesrv2 kernel: ffff88087580da88 0000000000000046
0000000000000000 00000000000001c3 Jan  8 10:57:57 filesrv2 kernel:
ffff88087580da18 ffff88087580da50 ffffffff810ea694 ffff88088b184080 Jan  8
10:57:57 filesrv2 kernel: ffff88088e71a5f8 ffff88087580dfd8 000000000000f4e8
ffff88088e71a5f8 Jan  8 10:57:57 filesrv2 kernel: Call Trace:
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810ea694>] ?
rb_reserve_next_event+0xb4/0x370 Jan  8 10:57:57 filesrv2 kernel:
[<ffffffff81013563>] ? native_sched_clock+0x13/0x60 Jan  8 10:57:57 filesrv2
kernel: [<ffffffff814eefb5>] rwsem_down_failed_common+0x95/0x1d0
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81013563>] ?
native_sched_clock+0x13/0x60 Jan  8 10:57:57 filesrv2 kernel:
[<ffffffff814ef146>] rwsem_down_read_failed+0x26/0x30 Jan  8 10:57:57
filesrv2 kernel: [<ffffffff81276e04>] call_rwsem_down_read_failed+0x14/0x30
Jan  8 10:57:57 filesrv2 kernel: [<ffffffff814ee644>] ? down_read+0x24/0x30
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa04fe4b2>] dlm_lock+0x62/0x1e0
[dlm] Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810eab02>] ?
ring_buffer_lock_reserve+0xa2/0x160
Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0550d62>] gdlm_lock+0xf2/0x130
[gfs2] Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0550e60>] ?
gdlm_ast+0x0/0xe0 [gfs2] Jan  8 10:57:57 filesrv2 kernel:
[<ffffffffa0550da0>] ? gdlm_bast+0x0/0x50 [gfs2] Jan  8 10:57:57 filesrv2
kernel: [<ffffffffa053430f>] do_xmote+0x17f/0x260 [gfs2] Jan  8 10:57:57
filesrv2 kernel: [<ffffffffa05344e1>] run_queue+0xf1/0x1d0 [gfs2] Jan  8
10:57:57 filesrv2 kernel: [<ffffffffa0534807>] gfs2_glock_nq+0x1b7/0x360
[gfs2] Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8107cb7b>] ?
try_to_del_timer_sync+0x7b/0xe0 Jan  8 10:57:57 filesrv2 kernel:
[<ffffffffa054db88>] gfs2_statfs_sync+0x58/0x1b0 [gfs2] Jan  8 10:57:57
filesrv2 kernel: [<ffffffff814ed84a>] ? schedule_timeout+0x19a/0x2e0 Jan  8
10:57:57 filesrv2 kernel: [<ffffffffa054db80>] ? gfs2_statfs_sync+0x50/0x1b0
[gfs2] Jan  8 10:57:57 filesrv2 kernel: [<ffffffffa0545bb7>]
quotad_check_timeo+0x57/0xb0 [gfs2] Jan  8 10:57:57 filesrv2 kernel:
[<ffffffffa0545e44>] gfs2_quotad+0x234/0x2b0 [gfs2] Jan  8 10:57:57 filesrv2
kernel: [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 Jan  8
10:57:57 filesrv2 kernel: [<ffffffffa0545c10>] ? gfs2_quotad+0x0/0x2b0
[gfs2] Jan  8 10:57:57 filesrv2 kernel: [<ffffffff81090886>]
kthread+0x96/0xa0 Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8100c14a>]
child_rip+0xa/0x20 Jan  8 10:57:57 filesrv2 kernel: [<ffffffff810907f0>] ?
kthread+0x0/0xa0 Jan  8 10:57:57 filesrv2 kernel: [<ffffffff8100c140>] ?
child_rip+0x0/0x20 Jan  8 10:59:22 filesrv2 rgmanager[12557]: State change:
clustsrv1 UP

Cluster.conf File:

<?xml version="1.0"?>
<cluster config_version="8" name="samba">
	<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="3"/>
	<clusternodes>
		<clusternode name="clustsrv1" nodeid="1" votes="1">
			<fence>
				<method name="fenceilo1">
					<device name="ilosrv1"/>
				</method>
			</fence>
			<unfence>
			        <device action="on" name="ilosrv2"/>
		        </unfence> 
		</clusternode>
		<clusternode name="clustsrv2" nodeid="2" votes="1">
			<fence>
				<method name="fenceilo2">
					<device name="ilosrv2"/>
				</method>
			</fence>
			<unfence>
			        <device action="on" name="ilosrv2"/>
			</unfence> 
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.129.157"
lanplus="1" login="fence" name="ilosrv1" passwd="xxxxxxx"/>
		<fencedevice agent="fence_ipmilan" ipaddr="192.168.129.158"
lanplus="1" login="fence" name="ilosrv2" passwd="xxxxxxx"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources/>
	</rm>
</cluster>


Thanks

Sathya Narayanan V
Solution Architect	


-----Original Message-----
From: SATHYA - IT [mailto:sathyanarayanan.varadharajan at precisionit.co.in]
Sent: Monday, January 09, 2012 11:21 AM
To: 'Digimer'; 'linux clustering'
Subject: Re: [Linux-cluster] rhel 6.2 network bonding interface in cluster
environment

Hi,

Herewith attaching the /var/log/messages of both the servers. Yesterday
(08th Jan) one of the server got fenced by other around 10:48 AM. I am also
attaching the cluster.conf file for your reference. 

On the related note, related to heartbeat - I am referring the channel used
by corosync. And the name which has been configured in cluster.conf file
resolves with bond1 only.

Related to the network card, we are using 2 dual port card where we
configured 1 port from each for bond0 and 1 port from the other for bond1.
So it doesn't seems be a network card related issue. Moreover, we are not
having any errors related to bond0.

Thanks

Sathya Narayanan V
Solution Architect	


This communication may contain confidential information. 
If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication.. 
Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email. 
Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group. 
All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use.




More information about the Linux-cluster mailing list