[Linux-cluster] Clustered NFS problem

Thu Sep 22 10:08:36 UTC 2005

It would be better for you to officially open a call with Red Hat
Support directly. They will be able to help you with the issue.

When the call is open, make sure you provide with both sysreport of the
cluster nodes.

On Wed, 2005-09-21 at 11:03 -0400, Abbes Bettahar wrote:
> Hi,
> 
> We have 2 servers HP Proliant 380 G3 (RedHat Advanced Server 3) attached
> by fiber optic to the storage area network SAN HP MSA1000 and we want to  
> install and configure The RedHat Cluster Suite.
> 
> I setuped and configured a clustered NFS on the 2 servers RAC1 and RACGFS.
> 
> clumanager-1.2.26.1-1
> redhat-config-cluster-1.0.7-1
> 
> I have created 2 quorum partitions /dev/sdd2 and /dev/sdd3  (100MB each).
> 
> I created another huge partition /dev/sdd4 (over 600GB) and formatted it in 
> ext3 file system.
> 
> I installed the cluster suite on the 1st node (RAC1) and 2nd node RACGFS and 
> I started the rawdevices on the two nodes RAC1 and RACGFS (it's OK).
> 
> This the hosts file /etc/host on the node1 (RAC1) and node2 RACGFS
> 
> Do not remove the following line, or various programs
> # that require network functionality will fail.
> #127.0.0.1 rac1 localhost.localdomain localhost
> 127.0.0.1              localhost.localdomain localhost
> #
> # Private hostnames
> #
> 192.168.253.3           rac1.project.net     rac1
> 192.168.253.4           rac2.project.net     rac2
> 192.168.253.10          racgfs.project.net     racgfs
> 192.168.253.20          raclu_nfs.project.net   raclu_nfs
> #
> # Hostnames used for Interconnect
> #
> 1.1.1.1                 rac1i.project.net    rac1i
> 1.1.1.2                 rac2i.project.net    rac2i
> 1.1.1.3                 racgfsi.project.net    racgfsi
> #
> 192.168.253.5           infra.project.net       infra
> 192.168.253.7 ractest.project.net     ractest
> #
> 
> I generated a /etc/cluster.xml on the 1st node RAC1 and the 2nd node RACGFS.
> 
> <?xml version="1.0"?>
> <cluconfig version="3.0">
>   <clumembd broadcast="no" interval="750000" loglevel="5" multicast="yes" 
> multicast_ipaddress="225.0.0.11" thread="yes" tko_count="20"/>
>   <cluquorumd loglevel="5" pinginterval="1" tiebreaker_ip=""/>
>   <clurmtabd loglevel="5" pollinterval="4"/>
>   <clusvcmgrd loglevel="5" use_netlink="yes"/>
>   <clulockd loglevel="5"/>
>   <cluster config_viewnumber="24" key="978dcd78e05c5961cf1aaaa03b41209b" 
> name="cisn"/>
>   <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" 
> rawshadow="/dev/raw/raw2" type="raw"/>
>   <members>
>     <member id="0" name="192.168.253.3" watchdog="no"/>
>     <member id="1" name="192.168.253.10" watchdog="no"/>
>   </members>
>   <services>
>     <service checkinterval="5" failoverdomain="cisncluster" id="0" 
> maxfalsestarts="0" maxrestarts="0" name="nfs_cisn" userscript="None">
>       <service_ipaddresses>
>         <service_ipaddress broadcast="None" id="0" 
> ipaddress="192.168.253.20" monitor_link="0" netmask="255.255.255.0"/>
>       </service_ipaddresses>
>       <device id="0" name="/dev/sdd4">
>         <mount forceunmount="yes" mountpoint="/u04"/>
>         <nfsexport id="0" name="/u04">
>           <client id="0" name="*" options="rw"/>
>         </nfsexport>
>       </device>
>     </service>
>   </services>
>   <failoverdomains>
>     <failoverdomain id="0" name="cisncluster" ordered="yes" restricted="no">
>       <failoverdomainnode id="0" name="192.168.253.3"/>
>       <failoverdomainnode id="1" name="192.168.253.10"/>
>     </failoverdomain>
>   </failoverdomains>
> </cluconfig>
> 
> I created a NFS share on /u04 (mount on /dev/sdd4) using the Cluster GUI 
> manager on RAC1.
> I launched on the 2 nodes Rac1 and RACgfs the following command:
> service clumanager start
> 
> I checked the result on the 2 nodes, on RAC1:
> 
> clustat  results :
> 
> Cluster Status - project                                                  
> 09:04:34
> Cluster Quorum Incarnation #1
> Shared State: Shared Raw Device Driver v1.2
> 
>   Member             Status
>   ------------------ ----------
>   192.168.253.3      Active     <-- You are here
>   192.168.253.10     Active
> 
>   Service        Status   Owner (Last)     Last Transition Chk Restarts
>   -------------- -------- ---------------- --------------- --- --------
>   nfs_cisn       started  192.168.253.3    09:07:59 Sep 21   5        0
> 
> 
> on RacGfs: clustat  results :
> 
> Cluster Status - cisn                                                  
> 09:07:39
> Cluster Quorum Incarnation #3
> Shared State: Shared Raw Device Driver v1.2
> 
>   Member             Status
>   ------------------ ----------
>   192.168.253.3      Active
>   192.168.253.10     Active     <-- You are here
> 
>   Service        Status   Owner (Last)     Last Transition Chk Restarts
>   -------------- -------- ---------------- --------------- --- --------
>   nfs_cisn       started  192.168.253.3    09:07:59 Sep 21   5        0
> 
> 
> 
> When I launched ifconfig on RAC1, we saw that the service IP address 
> 192.168.253.20 is generated on eth2:0.
> 
> And I launched on  other servers the following command:
> mount t nfs 192.168.253.20:/u04 /u04
> 
> And all are OK, I can list the /u04 content from any server.
> 
> But my only problem is:
> 
> When I want to try a test if the clustered NFS will work fine, I rebooted 
> RAC1 frequently and RACGFS continue to work as the failover server and when 
> I launched ifconfig on RACGFS, we saw that the service IP address 
> 192.168.253.20 is generated on eth0:0 .
> We can list /u04 content (clustered NFS mount) on the other servers after 
> few seconds of RAC1 rebooting:
> 
> But after many reboots, I expect a big problem, the both cluster node 
> servers  cannot obtain the service IP address 192.168.253.20 when I launch 
> ifconfig on the both nodes.
> 
> On Rac1:
> 
> eth0      Link encap:Ethernet  HWaddr 00:0B:CD:EF:2B:C1
>           inet addr:1.1.1.1  Bcast:1.1.1.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:89170 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:87405 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:17288193 (16.4 Mb)  TX bytes:14452757 (13.7 Mb)
>           Interrupt:15
> 
> eth2      Link encap:Ethernet  HWaddr 00:0B:CD:FF:44:02
>           inet addr:192.168.253.3  Bcast:192.168.253.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1349991 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:435450 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1592635536 (1518.8 Mb)  TX bytes:162026101 (154.5 Mb)
>           Interrupt:7
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:1001181 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1001181 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:76097441 (72.5 Mb)  TX bytes:76097441 (72.5 Mb)
> 
> On RACGFS:
> 
> eth0      Link encap:Ethernet  HWaddr 00:14:38:50:D3:E4
>           inet addr:192.168.253.10  Bcast:192.168.253.255  
> Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:211223 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:160026 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:14917480 (14.2 Mb)  TX bytes:13886063 (13.2 Mb)
>           Interrupt:25
> 
> eth1      Link encap:Ethernet  HWaddr 00:14:38:50:D3:E3
>           inet addr:1.1.1.3  Bcast:1.1.1.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:0 (0.0 b)  TX bytes:256 (256.0 b)
>           Interrupt:26
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:184529 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:184529 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:10971489 (10.4 Mb)  TX bytes:10971489 (10.4 Mb)
> 
> I tried many commands, I stopped the cluster services On both nodes and 
> restart it but unfortunately it doesnt work and we cannot obtain the 
> clustered NFS mount.
> 
> 
> Have you any idea to fix this problem?
> 
> Thanks for your replies and help
> 
> Abbes Bettahar
> 514-296-0756
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster