[Linux-cluster] Questions about active-active Samba
Huang Xiong
huangxiong at uit.com.cn
Tue Jul 17 10:03:39 UTC 2007
Hello,
This thread is long, please pay some patience.
I am building active-active Samba across two nodes,
nodes(both installed RHEL4.5):
--------------------------
kaka1: 192.168.3.52
kaka2: 192.168.3.249
and here's the "/etc/cluster/cluster.conf":
---------------------------
<cluster alias="seedorf" config_version="159" name="seedorf">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="kaka1" votes="1">
<fence>
<method name="1">
<device name="NPS" nodename="kaka1"/>
</method>
</fence>
</clusternode>
<clusternode name="kaka2" votes="1">
<fence>
<method name="1">
<device name="NPS" nodename="kaka2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="NPS"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="failover-1" ordered="1">
<failoverdomainnode name="kaka1"
priority="1"/>
<failoverdomainnode name="kaka2"
priority="2"/>
</failoverdomain>
<failoverdomain name="failover-2" ordered="1">
<failoverdomainnode name="kaka1"
priority="2"/>
<failoverdomainnode name="kaka2"
priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<clusterfs device="/dev/milan/mirror"
force_unmount="0" fsid="37802" fstype="gfs" mountpoint="/nfsdata"
name="phillip_gfs" options="acl"/>
<smb name="samba_1" workgroup="samba_test"/>
<smb name="samba_2" workgroup="samba_test"/>
<script file="/etc/init.d/smb" name="smb_script"/>
<ip address="192.168.3.143" monitor_link="1"/>
<ip address="192.168.3.150" monitor_link="1"/>
</resources>
<service autostart="1" domain="failover-1" name="smb-1"
recovery="relocate">
<smb ref="samba_1">
<clusterfs ref="phillip_gfs"/>
<script ref="smb_script"/>
</smb>
<ip ref="192.168.3.143"/>
</service>
<service autostart="1" domain="failover-2" name="smb-2"
recovery="relocate">
<smb ref="samba_2">
<clusterfs ref="phillip_gfs"/>
<script ref="smb_script"/>
</smb>
<ip ref="192.168.3.150"/>
</service>
</rm>
</cluster>
-------------------------------------------
When these two nodes are both running, there will automatically
create /etc/samba/smb.conf.samba_1 in kaka1, and /etc/samba/smb.conf.samba_2
in kaka2:
On kaka1:
--------------------------
[root at kaka1 samba]# cat smb.conf.samba_1 | grep -v "#"
[global]
workgroup = samba_test
pid directory = /var/run/samba/samba_1
lock directory = /var/cache/samba/samba_1
log file = /var/log/samba/%m.log
encrypt passwords = yes
bind interfaces only = yes
netbios name = samba_1
interfaces = 192.168.3.143
[test]
public = yes
path = /nfsdata
read only = no
[root at kaka1 samba]# scp smb.conf.samba_1 kaka2:/etc/samba/
On kaka2:
---------------------------
[root at kaka2 samba]# cat smb.conf.samba_2 |grep -v "#"
[global]
workgroup = samba_test
pid directory = /var/run/samba/samba_2
lock directory = /var/cache/samba/samba_2
log file = /var/log/samba/%m.log
encrypt passwords = yes
bind interfaces only = yes
netbios name = samba_2
interfaces = 192.168.3.150
[test2]
public = yes
path = /nfsdata
read only = no
[root at kaka2 samba]# scp smb.conf.samba_2 kaka1:/etc/samba/
Now, reboot the nodes and check the cluster status:
---------------------------------
[root at kaka2 ~]# clustat
Member Status: Quorate
Member Name Status
------ ---- ------
kaka1 Online, rgmanager
kaka2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
smb-1 kaka1 started
smb-2 kaka2 started
and I can see the float IP(s) has been assigned:
----------------------------------
On kaka1:
[root at kaka1 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:e8:11:a1 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.52/24 brd 192.168.3.255 scope global eth0
inet 192.168.3.143/32 scope global eth0
inet6 fe80::20c:29ff:fee8:11a1/64 scope link
valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
On kaka2:
[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
inet 192.168.3.150/32 scope global eth0
inet6 fe80::20c:29ff:fe24:c72/64 scope link
valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
At this point, poweroff the "kaka1", and kaka1's original float
IP(192.168.3.143) would be appended to kaka2:
-------------------------------------------------
[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
inet 192.168.3.150/32 scope global eth0
inet 192.168.3.143/32 scope global eth0
inet6 fe80::20c:29ff:fe24:c72/64 scope link
valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
Hmm, it seems the samba services still keep running well, and the clients
accessing "192.168.3.143" do not feel interrupt.
---------------------------------------
[root at kaka2 ~]# clustat
Member Status: Quorate
Member Name Status
------ ---- ------
kaka1 Offline
kaka2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
smb-1 kaka2 started
smb-2 kaka2 started
However, when I power on kaka1, the trouble happens, not only "192.168.3.143"
would be removed, but also kaka2 lost its original float IP "192.168.3.150".
There're below errors in "/var/log/messages" on kaka2:
---------------------------------------------
[root at kaka2 ~] # tail -f /var/log/messages
Jul 17 17:49:24 kaka2 kernel: CMAN: node kaka1 rejoining
Jul 17 17:49:33 kaka2 clurgmgrd[3393]: <info> Magma Event: Membership Change
Jul 17 17:49:33 kaka2 clurgmgrd[3393]: <info> State change: kaka1 UP
Jul 17 17:49:35 kaka2 clurgmgrd[3393]: <notice> Stopping service smb-1
Jul 17 17:49:36 kaka2 clurgmgrd: [3393]: <info> Removing IPv4 address
192.168.3.143 from eth0
Jul 17 17:49:44 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb
status
Jul 17 17:49:46 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb stop
Jul 17 17:49:46 kaka2 smb: smbd shutdown succeeded
Jul 17 17:49:46 kaka2 nmbd[4571]: [2007/07/17 17:49:46, 0]
nmbd/nmbd.c:terminate(56)
Jul 17 17:49:46 kaka2 nmbd[4571]: Got SIGTERM: going down...
Jul 17 17:49:46 kaka2 nmbd[4571]: [2007/07/17 17:49:46, 0]
libsmb/nmblib.c:send_udp(790)
Jul 17 17:49:46 kaka2 nmbd[4571]: Packet send failed to 192.168.3.255(138)
ERRNO=Invalid argument
Jul 17 17:49:46 kaka2 smb: nmbd shutdown succeeded
Jul 17 17:49:47 kaka2 clurgmgrd: [3393]: <info> Stopping Samba
instance "samba_1"
Jul 17 17:49:47 kaka2 nmbd[6736]: [2007/07/17 17:49:47, 0]
nmbd/nmbd.c:terminate(56)
Jul 17 17:49:47 kaka2 nmbd[6736]: Got SIGTERM: going down...
Jul 17 17:49:47 kaka2 nmbd[6736]: [2007/07/17 17:49:47, 0]
libsmb/nmblib.c:send_udp(790)
Jul 17 17:49:47 kaka2 nmbd[6736]: Packet send failed to 192.168.3.255(138)
ERRNO=Invalid argument
Jul 17 17:49:47 kaka2 clurgmgrd[3393]: <notice> Service smb-1 is stopped
Jul 17 17:50:14 kaka2 clurgmgrd: [3393]: <err> share_start_stop: nmbd for
service died!
Jul 17 17:50:14 kaka2 clurgmgrd[3393]: <notice> status on smb:samba_2 returned
255 (unspecified)
Jul 17 17:50:14 kaka2 clurgmgrd[3393]: <notice> Stopping service smb-2
Jul 17 17:50:14 kaka2 clurgmgrd: [3393]: <info> Removing IPv4 address
192.168.3.150 from eth0
Jul 17 17:50:15 kaka2 nmbd[4488]: [2007/07/17 17:50:15, 0]
lib/interface.c:load_interfaces(220)
Jul 17 17:50:15 kaka2 nmbd[4488]: WARNING: no network interfaces found
Jul 17 17:50:15 kaka2 nmbd[4488]: [2007/07/17 17:50:15, 0]
nmbd/nmbd.c:reload_interfaces(265)
Jul 17 17:50:15 kaka2 nmbd[4488]: reload_interfaces: No subnets to listen
to. Shutting down...
Jul 17 17:50:24 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb stop
Jul 17 17:50:24 kaka2 smb: smbd shutdown failed
Jul 17 17:50:24 kaka2 smb: nmbd shutdown failed
Jul 17 17:50:24 kaka2 clurgmgrd: [3393]: <err> script:smb_script: stop
of /etc/init.d/smb failed (returned 1)
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <notice> stop on script:smb_script
returned 1 (generic error)
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <crit> #12: RG smb-2 failed to stop;
intervention required
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <notice> Service smb-2 is failed
[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
inet6 fe80::20c:29ff:fe24:c72/64 scope link
valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
[root at kaka2 ~]# clustat
Member Status: Quorate
Member Name Status
------ ---- ------
kaka1 Online, rgmanager
kaka2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
smb-1 kaka1 started
smb-2 (kaka2) failed
According to active-active samba cluster, every samba service could ensure
running and must be able to failover to others when it fails. While on my
case, when kaka1 power on again, the samba service "smb-2" on Kaka2 failed
and the float IP has also been removed.
Would you please help me fix this issue? Any suggestion would be appreciated.
Regards,
Phillip
More information about the Linux-cluster
mailing list