[Linux-cluster] Questions about active-active Samba

Tue Jul 17 10:03:39 UTC 2007

Hello,

This thread is long, please pay some patience.

I am building active-active Samba across two nodes, 

nodes(both installed RHEL4.5):
--------------------------
kaka1: 192.168.3.52
kaka2: 192.168.3.249

and here's the "/etc/cluster/cluster.conf":
---------------------------
<cluster alias="seedorf" config_version="159" name="seedorf">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="kaka1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="NPS" nodename="kaka1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="kaka2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="NPS" nodename="kaka2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="NPS"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="failover-1" ordered="1">
                                <failoverdomainnode name="kaka1" 
priority="1"/>
                                <failoverdomainnode name="kaka2" 
priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="failover-2" ordered="1">
                                <failoverdomainnode name="kaka1" 
priority="2"/>
                                <failoverdomainnode name="kaka2" 
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <clusterfs device="/dev/milan/mirror" 
force_unmount="0" fsid="37802" fstype="gfs" mountpoint="/nfsdata" 
name="phillip_gfs" options="acl"/>
                        <smb name="samba_1" workgroup="samba_test"/>
                        <smb name="samba_2" workgroup="samba_test"/>
                        <script file="/etc/init.d/smb" name="smb_script"/>
                        <ip address="192.168.3.143" monitor_link="1"/>
                        <ip address="192.168.3.150" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="failover-1" name="smb-1" 
recovery="relocate">
                        <smb ref="samba_1">
                                <clusterfs ref="phillip_gfs"/>
                                <script ref="smb_script"/>
                        </smb>
                        <ip ref="192.168.3.143"/>
                </service>
                <service autostart="1" domain="failover-2" name="smb-2" 
recovery="relocate">
                       <smb ref="samba_2">
                                <clusterfs ref="phillip_gfs"/>
                                <script ref="smb_script"/>
                        </smb>
                        <ip ref="192.168.3.150"/>
                </service>
        </rm>
</cluster>
-------------------------------------------

When these two nodes are both running, there will automatically 
create /etc/samba/smb.conf.samba_1 in kaka1, and /etc/samba/smb.conf.samba_2 
in kaka2:

On kaka1:
--------------------------
[root at kaka1 samba]# cat smb.conf.samba_1 | grep -v "#"
[global]
        workgroup = samba_test
        pid directory = /var/run/samba/samba_1
        lock directory = /var/cache/samba/samba_1
        log file = /var/log/samba/%m.log
        encrypt passwords = yes
        bind interfaces only = yes
        netbios name = samba_1
        interfaces = 192.168.3.143
[test]
        public = yes
        path = /nfsdata
        read only = no
[root at kaka1 samba]# scp smb.conf.samba_1 kaka2:/etc/samba/

On kaka2:
---------------------------
[root at kaka2 samba]# cat smb.conf.samba_2 |grep -v "#"
[global]
        workgroup = samba_test
        pid directory = /var/run/samba/samba_2
        lock directory = /var/cache/samba/samba_2
        log file = /var/log/samba/%m.log
        encrypt passwords = yes
        bind interfaces only = yes
        netbios name = samba_2
        interfaces = 192.168.3.150
[test2]
        public = yes
        path = /nfsdata
        read only = no
[root at kaka2 samba]# scp smb.conf.samba_2 kaka1:/etc/samba/

Now, reboot the nodes and check the cluster status:
---------------------------------
[root at kaka2 ~]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  kaka1                                    Online, rgmanager
  kaka2                                    Online, Local, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  smb-1                kaka1                          started
  smb-2                kaka2                          started

and I can see the float IP(s) has been assigned:
----------------------------------
On kaka1:

[root at kaka1 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:e8:11:a1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.52/24 brd 192.168.3.255 scope global eth0
    inet 192.168.3.143/32 scope global eth0
    inet6 fe80::20c:29ff:fee8:11a1/64 scope link
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

On kaka2:

[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
    inet 192.168.3.150/32 scope global eth0
    inet6 fe80::20c:29ff:fe24:c72/64 scope link
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

At this point, poweroff the "kaka1", and kaka1's original float 
IP(192.168.3.143) would be appended to kaka2:
-------------------------------------------------
[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
    inet 192.168.3.150/32 scope global eth0
    inet 192.168.3.143/32 scope global eth0
    inet6 fe80::20c:29ff:fe24:c72/64 scope link
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

Hmm, it seems the samba services still keep running well, and the clients 
accessing "192.168.3.143" do not feel interrupt. 
---------------------------------------
[root at kaka2 ~]# clustat
Member Status: Quorate
  Member Name                              Status
  ------ ----                              ------
  kaka1                                    Offline
  kaka2                                    Online, Local, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  smb-1                kaka2                          started
  smb-2                kaka2                          started

However, when I power on kaka1, the trouble happens, not only "192.168.3.143" 
would be removed, but also kaka2 lost its original float IP "192.168.3.150". 
There're below errors in "/var/log/messages" on kaka2:
---------------------------------------------
[root at kaka2 ~] # tail -f /var/log/messages

Jul 17 17:49:24 kaka2 kernel: CMAN: node kaka1 rejoining
Jul 17 17:49:33 kaka2 clurgmgrd[3393]: <info> Magma Event: Membership Change
Jul 17 17:49:33 kaka2 clurgmgrd[3393]: <info> State change: kaka1 UP
Jul 17 17:49:35 kaka2 clurgmgrd[3393]: <notice> Stopping service smb-1
Jul 17 17:49:36 kaka2 clurgmgrd: [3393]: <info> Removing IPv4 address 
192.168.3.143 from eth0
Jul 17 17:49:44 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb 
status
Jul 17 17:49:46 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb stop
Jul 17 17:49:46 kaka2 smb: smbd shutdown succeeded
Jul 17 17:49:46 kaka2 nmbd[4571]: [2007/07/17 17:49:46, 0] 
nmbd/nmbd.c:terminate(56)
Jul 17 17:49:46 kaka2 nmbd[4571]:   Got SIGTERM: going down...
Jul 17 17:49:46 kaka2 nmbd[4571]: [2007/07/17 17:49:46, 0] 
libsmb/nmblib.c:send_udp(790)
Jul 17 17:49:46 kaka2 nmbd[4571]:   Packet send failed to 192.168.3.255(138) 
ERRNO=Invalid argument
Jul 17 17:49:46 kaka2 smb: nmbd shutdown succeeded
Jul 17 17:49:47 kaka2 clurgmgrd: [3393]: <info> Stopping Samba 
instance "samba_1"
Jul 17 17:49:47 kaka2 nmbd[6736]: [2007/07/17 17:49:47, 0] 
nmbd/nmbd.c:terminate(56)
Jul 17 17:49:47 kaka2 nmbd[6736]:   Got SIGTERM: going down...
Jul 17 17:49:47 kaka2 nmbd[6736]: [2007/07/17 17:49:47, 0] 
libsmb/nmblib.c:send_udp(790)
Jul 17 17:49:47 kaka2 nmbd[6736]:   Packet send failed to 192.168.3.255(138) 
ERRNO=Invalid argument
Jul 17 17:49:47 kaka2 clurgmgrd[3393]: <notice> Service smb-1 is stopped
Jul 17 17:50:14 kaka2 clurgmgrd: [3393]: <err> share_start_stop: nmbd for 
service  died!
Jul 17 17:50:14 kaka2 clurgmgrd[3393]: <notice> status on smb:samba_2 returned 
255 (unspecified)
Jul 17 17:50:14 kaka2 clurgmgrd[3393]: <notice> Stopping service smb-2
Jul 17 17:50:14 kaka2 clurgmgrd: [3393]: <info> Removing IPv4 address 
192.168.3.150 from eth0
Jul 17 17:50:15 kaka2 nmbd[4488]: [2007/07/17 17:50:15, 0] 
lib/interface.c:load_interfaces(220)
Jul 17 17:50:15 kaka2 nmbd[4488]:   WARNING: no network interfaces found
Jul 17 17:50:15 kaka2 nmbd[4488]: [2007/07/17 17:50:15, 0] 
nmbd/nmbd.c:reload_interfaces(265)
Jul 17 17:50:15 kaka2 nmbd[4488]:   reload_interfaces: No subnets to listen 
to. Shutting down...
Jul 17 17:50:24 kaka2 clurgmgrd: [3393]: <info> Executing /etc/init.d/smb stop
Jul 17 17:50:24 kaka2 smb: smbd shutdown failed
Jul 17 17:50:24 kaka2 smb: nmbd shutdown failed
Jul 17 17:50:24 kaka2 clurgmgrd: [3393]: <err> script:smb_script: stop 
of /etc/init.d/smb failed (returned 1)
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <notice> stop on script:smb_script 
returned 1 (generic error)
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <crit> #12: RG smb-2 failed to stop; 
intervention required
Jul 17 17:50:24 kaka2 clurgmgrd[3393]: <notice> Service smb-2 is failed

[root at kaka2 ~]# ip addr list
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:24:0c:72 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.249/24 brd 192.168.3.255 scope global eth0
    inet6 fe80::20c:29ff:fe24:c72/64 scope link
       valid_lft forever preferred_lft forever
3: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

[root at kaka2 ~]# clustat
Member Status: Quorate
  Member Name                              Status
  ------ ----                              ------
  kaka1                                    Online, rgmanager
  kaka2                                    Online, Local, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  smb-1                kaka1                          started
  smb-2                (kaka2)                        failed

According to active-active samba cluster, every samba service could ensure 
running and must be able to failover to others when it fails.  While on my 
case, when kaka1 power on again, the samba service "smb-2" on Kaka2 failed 
and the float IP has also been removed.

Would you please help me fix this issue? Any suggestion would be appreciated. 

Regards,
Phillip