From gianpietro.sella at unipd.it  Sun May 10 09:28:25 2015
From: gianpietro.sella at unipd.it (gianpietro.sella at unipd.it)
Date: Sun, 10 May 2015 11:28:25 +0200
Subject: [Linux-cluster] nfs cluster,
	problem with delete file in the failover case
Message-ID: <8c343e26ef810ff47bdc10ae3ece85ce.squirrel@webmail.unipd.it>

Hi, sorry for my bad english.
I testing nfs cluster active/passsive (2 nodes).
I use the next instruction for nfs:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.html

I use centos 7.1 on the nodes.
The 2 node of the cluster share the same iscsi volume.
The nfs cluster is very good.
I have only one problem.
I mount the nfs cluster exported folder on my client node (nfsv3 protocol).
I write on the nfs folder an big data file (70GB):
dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
Before write is finished I put the active node in standby status.
then the resource migrate in the other node.
when the dd write finish the file is ok.
I delete the file output.dat.
Now the file output.dat is not present in the nfs folder, it is correctly
erased.
but the space in the nfs volume is not free.
If I execute an df command on the client (and on the new active node) I
see 70GB on used space in the exported volume disk.
Now if I put the new active node in standby status (migrate the resource
in the first node where start writing file), and the other node is now the
active node, the space of the deleted output.dat file is now free.
It is very strange.


From bfields at fieldses.org  Mon May 11 21:20:15 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Mon, 11 May 2015 17:20:15 -0400
Subject: [Linux-cluster] nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <8c343e26ef810ff47bdc10ae3ece85ce.squirrel@webmail.unipd.it>
References: <8c343e26ef810ff47bdc10ae3ece85ce.squirrel@webmail.unipd.it>
Message-ID: <20150511212015.GB23754@fieldses.org>

On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella at unipd.it wrote:
> Hi, sorry for my bad english.
> I testing nfs cluster active/passsive (2 nodes).
> I use the next instruction for nfs:
> 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.html
> 
> I use centos 7.1 on the nodes.
> The 2 node of the cluster share the same iscsi volume.
> The nfs cluster is very good.
> I have only one problem.
> I mount the nfs cluster exported folder on my client node (nfsv3 protocol).
> I write on the nfs folder an big data file (70GB):
> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
> Before write is finished I put the active node in standby status.
> then the resource migrate in the other node.
> when the dd write finish the file is ok.
> I delete the file output.dat.

So, the dd and the later rm are both run on the client, and the rm after
the dd has completed and exited?  And the rm doesn't happen till after
the first migration is completely finished?  What version of NFS are you
using?

It sounds like a sillyrename problem, but I don't see the explanation.

--b.

> Now the file output.dat is not present in the nfs folder, it is correctly
> erased.
> but the space in the nfs volume is not free.
> If I execute an df command on the client (and on the new active node) I
> see 70GB on used space in the exported volume disk.
> Now if I put the new active node in standby status (migrate the resource
> in the first node where start writing file), and the other node is now the
> active node, the space of the deleted output.dat file is now free.
> It is very strange.
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From gianpietro.sella at unipd.it  Mon May 11 22:37:10 2015
From: gianpietro.sella at unipd.it (gianpietro.sella at unipd.it)
Date: Tue, 12 May 2015 00:37:10 +0200
Subject: [Linux-cluster] nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <20150511212015.GB23754@fieldses.org>
References: <8c343e26ef810ff47bdc10ae3ece85ce.squirrel@webmail.unipd.it>
	<20150511212015.GB23754@fieldses.org>
Message-ID: <cd98290f6221c6bce440dd56a6670e25.squirrel@webmail.unipd.it>

> On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella at unipd.it wrote:
>> Hi, sorry for my bad english.
>> I testing nfs cluster active/passsive (2 nodes).
>> I use the next instruction for nfs:
>>
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.html
>>
>> I use centos 7.1 on the nodes.
>> The 2 node of the cluster share the same iscsi volume.
>> The nfs cluster is very good.
>> I have only one problem.
>> I mount the nfs cluster exported folder on my client node (nfsv3
>> protocol).
>> I write on the nfs folder an big data file (70GB):
>> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
>> Before write is finished I put the active node in standby status.
>> then the resource migrate in the other node.
>> when the dd write finish the file is ok.
>> I delete the file output.dat.
>
> So, the dd and the later rm are both run on the client, and the rm after
> the dd has completed and exited?  And the rm doesn't happen till after
> the first migration is completely finished?  What version of NFS are you
> using?
>
> It sounds like a sillyrename problem, but I don't see the explanation.
>
> --b.


Hi Bruce, thank for your answer.
yes the dd command and the rm command (all on the client node) finish
without error.
I use nfsv3, but is the same with nfsv4 protocol.
the s.o. is centos 7.1, the nfs package is nfs-utils-1.3.0-0.8.el7.x86_64.
the pacemaker configuration is:

pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg
exclusive=true --group nfsclusterha

pcs resource create nfsclusterdata Filesystem
device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
fstype="ext4" --group nfsclusterha

pcs resource create nfsclusterserver nfsserver
nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group
nfsclusterha

pcs resource create nfsclusterroot exportfs
clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
directory=/nfscluster/exports fsid=0 --group
 nfsclusterha

pcs resource create nfsclusternova exportfs
clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
directory=/nfscluster/exports/nova fsid=1 --
group nfsclusterha

pcs resource create nfsclusterglance exportfs
clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
directory=/nfscluster/exports/glance fsid=
2 --group nfsclusterha

pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180 cidr_netmask=24
--group nfsclusterha

pcs resource create nfsclusternotify nfsnotify source_host=192.168.61.180
--group nfsclusterha

now I have done the next test.
nfs cluster with 2 node.
the first node in standby state.
the second node in active state.
I mount the empty (not used space) exported volume in the client with nfsv3
protocol (with nfs4 protocol is the same).
I write on the client an big file (70GB) in the mount directory with dd (but
is the same with cp command).
while the command write the file I disable nfsnotify, Iaddr2, exportfs and
nfsserver resource in this order (pcs resource disable ...) and next I
enable the resource (pcs resource enable ...) in the reverse order.
when disable resource writing freeze, when enable resource writing restart
without error.
when the writing command is finished I delete the file.
the mount directory is empty and the used space of exported volume is 0,
this is ok.
now i repead the test.
but now I disable/enable even the Filesystem resource:
disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource
(writing freeze) then enable in the reverse order (writing restart without
error).
when writing command is finished I delete the file.
now the mounted directory is empty (not file) but the used space is not 0
but is 70GB.
this is not ok.
now I execute the next command on the active node of the cluster where the
volume is exported with nfs:
mount -o remount /dev/nfsclustervg/nfsclusterlv
where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi volume
configured with lvm).
after this command the used space in the mounted directory of the client is
0, this is ok.
I think that the problem is the Filesystem resource on the active node of
the cluster.
but is very strange.


>
>> Now the file output.dat is not present in the nfs folder, it is
>> correctly
>> erased.
>> but the space in the nfs volume is not free.
>> If I execute an df command on the client (and on the new active node) I
>> see 70GB on used space in the exported volume disk.
>> Now if I put the new active node in standby status (migrate the resource
>> in the first node where start writing file), and the other node is now
>> the
>> active node, the space of the deleted output.dat file is now free.
>> It is very strange.
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>


From bfields at fieldses.org  Tue May 12 15:25:17 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Tue, 12 May 2015 11:25:17 -0400
Subject: [Linux-cluster] nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <cd98290f6221c6bce440dd56a6670e25.squirrel@webmail.unipd.it>
References: <8c343e26ef810ff47bdc10ae3ece85ce.squirrel@webmail.unipd.it>
	<20150511212015.GB23754@fieldses.org>
	<cd98290f6221c6bce440dd56a6670e25.squirrel@webmail.unipd.it>
Message-ID: <20150512152517.GB6370@fieldses.org>

On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella at unipd.it wrote:
> > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella at unipd.it wrote:
> >> Hi, sorry for my bad english.
> >> I testing nfs cluster active/passsive (2 nodes).
> >> I use the next instruction for nfs:
> >>
> >> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.html
> >>
> >> I use centos 7.1 on the nodes.
> >> The 2 node of the cluster share the same iscsi volume.
> >> The nfs cluster is very good.
> >> I have only one problem.
> >> I mount the nfs cluster exported folder on my client node (nfsv3
> >> protocol).
> >> I write on the nfs folder an big data file (70GB):
> >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
> >> Before write is finished I put the active node in standby status.
> >> then the resource migrate in the other node.
> >> when the dd write finish the file is ok.
> >> I delete the file output.dat.
> >
> > So, the dd and the later rm are both run on the client, and the rm after
> > the dd has completed and exited?  And the rm doesn't happen till after
> > the first migration is completely finished?  What version of NFS are you
> > using?
> >
> > It sounds like a sillyrename problem, but I don't see the explanation.
> >
> > --b.
> 
> 
> Hi Bruce, thank for your answer.
> yes the dd command and the rm command (all on the client node) finish
> without error.
> I use nfsv3, but is the same with nfsv4 protocol.
> the s.o. is centos 7.1, the nfs package is nfs-utils-1.3.0-0.8.el7.x86_64.
> the pacemaker configuration is:
> 
> pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg
> exclusive=true --group nfsclusterha
> 
> pcs resource create nfsclusterdata Filesystem
> device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
> fstype="ext4" --group nfsclusterha
> 
> pcs resource create nfsclusterserver nfsserver
> nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group
> nfsclusterha
> 
> pcs resource create nfsclusterroot exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports fsid=0 --group
>  nfsclusterha
> 
> pcs resource create nfsclusternova exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports/nova fsid=1 --
> group nfsclusterha
> 
> pcs resource create nfsclusterglance exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports/glance fsid=
> 2 --group nfsclusterha
> 
> pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180 cidr_netmask=24
> --group nfsclusterha
> 
> pcs resource create nfsclusternotify nfsnotify source_host=192.168.61.180
> --group nfsclusterha
> 
> now I have done the next test.
> nfs cluster with 2 node.
> the first node in standby state.
> the second node in active state.
> I mount the empty (not used space) exported volume in the client with nfsv3
> protocol (with nfs4 protocol is the same).
> I write on the client an big file (70GB) in the mount directory with dd (but
> is the same with cp command).
> while the command write the file I disable nfsnotify, Iaddr2, exportfs and
> nfsserver resource in this order (pcs resource disable ...) and next I
> enable the resource (pcs resource enable ...) in the reverse order.
> when disable resource writing freeze, when enable resource writing restart
> without error.
> when the writing command is finished I delete the file.
> the mount directory is empty and the used space of exported volume is 0,
> this is ok.
> now i repead the test.
> but now I disable/enable even the Filesystem resource:
> disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource
> (writing freeze) then enable in the reverse order (writing restart without
> error).
> when writing command is finished I delete the file.
> now the mounted directory is empty (not file) but the used space is not 0
> but is 70GB.
> this is not ok.
> now I execute the next command on the active node of the cluster where the
> volume is exported with nfs:
> mount -o remount /dev/nfsclustervg/nfsclusterlv
> where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi volume
> configured with lvm).
> after this command the used space in the mounted directory of the client is
> 0, this is ok.
> I think that the problem is the Filesystem resource on the active node of
> the cluster.
> but is very strange.

So, the only difference between the "good" and "bad" cases was the
addition of the stop/start of the filesystem resource?  I assume that's
equivalent to an umount/mount.

I guess the server's dentry for that file is hanging around for a little
while for some reason.  We've run across at least one problem of that
sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to
return unhashed dentries").

In both cases after the restart the first operation the server will get
for that file is a write with a filehandle, and it will have to look up
that filehandle to find the file.  (Whereas without the restart the
initial discovery of the file will be a lookup by name.)

In the "good" case the server already has a dentry cached for that file,
in the "bad" case the umount/mount means that we'll be doing a
cold-cache lookup of that filehandle.

I wonder if the test case can be narrowed down any further....  Is the
large file necessary?  If it's needed only to ensure the writes are
actually sent to the server promptly then it might be enough to do the
nfs mount with -osync.

Instead of the cluster migration or restart, it might be possible to
reproduce the bug just with a

	echo 2 >/proc/sys/vm/drop_caches

run on the server side while the dd is in progress--I don't know if that
will reliably drop the one dentry, though.  Maybe do a few of those in a
row.

--b.


From gianpietro.sella at unipd.it  Wed May 13 09:51:13 2015
From: gianpietro.sella at unipd.it (sella gianpietro)
Date: Wed, 13 May 2015 11:51:13 +0200
Subject: [Linux-cluster] R:  nfs cluster,
	problem with delete file in the failover case
In-Reply-To: <20150512152517.GB6370@fieldses.org>
Message-ID: <20150513100053.25D711F3C@mydoom.unipd.it>

J. Bruce Fields <bfields <at> fieldses.org> writes:

> 
> On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella <at> unipd.it
wrote:
> > > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella <at>
unipd.it wrote:
> > >> Hi, sorry for my bad english.
> > >> I testing nfs cluster active/passsive (2 nodes).
> > >> I use the next instruction for nfs:
> > >>
> > >>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm
l/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.htm
l
> > >>
> > >> I use centos 7.1 on the nodes.
> > >> The 2 node of the cluster share the same iscsi volume.
> > >> The nfs cluster is very good.
> > >> I have only one problem.
> > >> I mount the nfs cluster exported folder on my client node (nfsv3
> > >> protocol).
> > >> I write on the nfs folder an big data file (70GB):
> > >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
> > >> Before write is finished I put the active node in standby status.
> > >> then the resource migrate in the other node.
> > >> when the dd write finish the file is ok.
> > >> I delete the file output.dat.
> > >
> > > So, the dd and the later rm are both run on the client, and the rm
after
> > > the dd has completed and exited?  And the rm doesn't happen till after
> > > the first migration is completely finished?  What version of NFS are
you
> > > using?
> > >
> > > It sounds like a sillyrename problem, but I don't see the explanation.
> > >
> > > --b.
> > 
> > 
> > Hi Bruce, thank for your answer.
> > yes the dd command and the rm command (all on the client node) finish
> > without error.
> > I use nfsv3, but is the same with nfsv4 protocol.
> > the s.o. is centos 7.1, the nfs package is
nfs-utils-1.3.0-0.8.el7.x86_64.
> > the pacemaker configuration is:
> > 
> > pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg
> > exclusive=true --group nfsclusterha
> > 
> > pcs resource create nfsclusterdata Filesystem
> > device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
> > fstype="ext4" --group nfsclusterha
> > 
> > pcs resource create nfsclusterserver nfsserver
> > nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group
> > nfsclusterha
> > 
> > pcs resource create nfsclusterroot exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports fsid=0 --group
> >  nfsclusterha
> > 
> > pcs resource create nfsclusternova exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports/nova fsid=1 --
> > group nfsclusterha
> > 
> > pcs resource create nfsclusterglance exportfs
> > clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> > directory=/nfscluster/exports/glance fsid=
> > 2 --group nfsclusterha
> > 
> > pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180
cidr_netmask=24
> > --group nfsclusterha
> > 
> > pcs resource create nfsclusternotify nfsnotify
source_host=192.168.61.180
> > --group nfsclusterha
> > 
> > now I have done the next test.
> > nfs cluster with 2 node.
> > the first node in standby state.
> > the second node in active state.
> > I mount the empty (not used space) exported volume in the client with
nfsv3
> > protocol (with nfs4 protocol is the same).
> > I write on the client an big file (70GB) in the mount directory with dd
(but
> > is the same with cp command).
> > while the command write the file I disable nfsnotify, Iaddr2, exportfs
and
> > nfsserver resource in this order (pcs resource disable ...) and next I
> > enable the resource (pcs resource enable ...) in the reverse order.
> > when disable resource writing freeze, when enable resource writing
restart
> > without error.
> > when the writing command is finished I delete the file.
> > the mount directory is empty and the used space of exported volume is 0,
> > this is ok.
> > now i repead the test.
> > but now I disable/enable even the Filesystem resource:
> > disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource
> > (writing freeze) then enable in the reverse order (writing restart
without
> > error).
> > when writing command is finished I delete the file.
> > now the mounted directory is empty (not file) but the used space is not
0
> > but is 70GB.
> > this is not ok.
> > now I execute the next command on the active node of the cluster where
the
> > volume is exported with nfs:
> > mount -o remount /dev/nfsclustervg/nfsclusterlv
> > where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi
volume
> > configured with lvm).
> > after this command the used space in the mounted directory of the client
is
> > 0, this is ok.
> > I think that the problem is the Filesystem resource on the active node
of
> > the cluster.
> > but is very strange.
> 
> So, the only difference between the "good" and "bad" cases was the
> addition of the stop/start of the filesystem resource?  I assume that's
> equivalent to an umount/mount.

yes is correct

> 
> I guess the server's dentry for that file is hanging around for a little
> while for some reason.  We've run across at least one problem of that
> sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to
> return unhashed dentries").
> 
> In both cases after the restart the first operation the server will get
> for that file is a write with a filehandle, and it will have to look up
> that filehandle to find the file.  (Whereas without the restart the
> initial discovery of the file will be a lookup by name.)
> 
> In the "good" case the server already has a dentry cached for that file,
> in the "bad" case the umount/mount means that we'll be doing a
> cold-cache lookup of that filehandle.
> 
> I wonder if the test case can be narrowed down any further....  Is the
> large file necessary?  If it's needed only to ensure the writes are
> actually sent to the server promptly then it might be enough to do the
> nfs mount with -osync.


I use sync options, is the same problem

> 
> Instead of the cluster migration or restart, it might be possible to
> reproduce the bug just with a
> 
> 	echo 2 >/proc/sys/vm/drop_caches
> 
> run on the server side while the dd is in progress--I don't know if that
> will reliably drop the one dentry, though.  Maybe do a few of those in a
> row.

no, with echo command is not possible reproduce the problem


> 
> --b.
>


From gianpietro.sella at unipd.it  Wed May 13 11:06:17 2015
From: gianpietro.sella at unipd.it (sella gianpietro)
Date: Wed, 13 May 2015 13:06:17 +0200
Subject: [Linux-cluster] R:  nfs cluster,
	problem with delete file in the failover case
In-Reply-To: <20150512152517.GB6370@fieldses.org>
Message-ID: <20150513111531.3CFB11F31@mydoom.unipd.it>

this is the inodes number in the exported folder of the volume 
in the server before write file in the client:

[root at cld-blu-13 nova]# du --inodes
2       .

this is the used block:

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks    Used  Available
Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000 1152845588
1% /nfscluster

after write file in the client with umount/mount during writing:

[root at cld-blu-13 nova]# du --inodes
3       .

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks     Used
Available Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
1131874068   2% /nfscluster

thi is correct.
now delete file:

[root at cld-blu-13 nova]# du --inodes
2       .

the number of the inodes is correct (from 3 to 2).

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks     Used
Available Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
1131874068   2% /nfscluster

the number of used block is not correct.
Do not return to initial value 33000


-----Messaggio originale-----
Da: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] Per conto di J. Bruce Fields
Inviato: marted? 12 maggio 2015 17.25
A: linux clustering
Oggetto: Re: [Linux-cluster] nfs cluster, problem with delete file in the
failover case

On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella at unipd.it wrote:
> > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella at unipd.it
wrote:
> >> Hi, sorry for my bad english.
> >> I testing nfs cluster active/passsive (2 nodes).
> >> I use the next instruction for nfs:
> >>
> >>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm
l/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.htm
l
> >>
> >> I use centos 7.1 on the nodes.
> >> The 2 node of the cluster share the same iscsi volume.
> >> The nfs cluster is very good.
> >> I have only one problem.
> >> I mount the nfs cluster exported folder on my client node (nfsv3
> >> protocol).
> >> I write on the nfs folder an big data file (70GB):
> >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat
> >> Before write is finished I put the active node in standby status.
> >> then the resource migrate in the other node.
> >> when the dd write finish the file is ok.
> >> I delete the file output.dat.
> >
> > So, the dd and the later rm are both run on the client, and the rm after
> > the dd has completed and exited?  And the rm doesn't happen till after
> > the first migration is completely finished?  What version of NFS are you
> > using?
> >
> > It sounds like a sillyrename problem, but I don't see the explanation.
> >
> > --b.
> 
> 
> Hi Bruce, thank for your answer.
> yes the dd command and the rm command (all on the client node) finish
> without error.
> I use nfsv3, but is the same with nfsv4 protocol.
> the s.o. is centos 7.1, the nfs package is nfs-utils-1.3.0-0.8.el7.x86_64.
> the pacemaker configuration is:
> 
> pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg
> exclusive=true --group nfsclusterha
> 
> pcs resource create nfsclusterdata Filesystem
> device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
> fstype="ext4" --group nfsclusterha
> 
> pcs resource create nfsclusterserver nfsserver
> nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group
> nfsclusterha
> 
> pcs resource create nfsclusterroot exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports fsid=0 --group
>  nfsclusterha
> 
> pcs resource create nfsclusternova exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports/nova fsid=1 --
> group nfsclusterha
> 
> pcs resource create nfsclusterglance exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash
> directory=/nfscluster/exports/glance fsid=
> 2 --group nfsclusterha
> 
> pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180
cidr_netmask=24
> --group nfsclusterha
> 
> pcs resource create nfsclusternotify nfsnotify source_host=192.168.61.180
> --group nfsclusterha
> 
> now I have done the next test.
> nfs cluster with 2 node.
> the first node in standby state.
> the second node in active state.
> I mount the empty (not used space) exported volume in the client with
nfsv3
> protocol (with nfs4 protocol is the same).
> I write on the client an big file (70GB) in the mount directory with dd
(but
> is the same with cp command).
> while the command write the file I disable nfsnotify, Iaddr2, exportfs and
> nfsserver resource in this order (pcs resource disable ...) and next I
> enable the resource (pcs resource enable ...) in the reverse order.
> when disable resource writing freeze, when enable resource writing restart
> without error.
> when the writing command is finished I delete the file.
> the mount directory is empty and the used space of exported volume is 0,
> this is ok.
> now i repead the test.
> but now I disable/enable even the Filesystem resource:
> disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource
> (writing freeze) then enable in the reverse order (writing restart without
> error).
> when writing command is finished I delete the file.
> now the mounted directory is empty (not file) but the used space is not 0
> but is 70GB.
> this is not ok.
> now I execute the next command on the active node of the cluster where the
> volume is exported with nfs:
> mount -o remount /dev/nfsclustervg/nfsclusterlv
> where /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi volume
> configured with lvm).
> after this command the used space in the mounted directory of the client
is
> 0, this is ok.
> I think that the problem is the Filesystem resource on the active node of
> the cluster.
> but is very strange.

So, the only difference between the "good" and "bad" cases was the
addition of the stop/start of the filesystem resource?  I assume that's
equivalent to an umount/mount.

I guess the server's dentry for that file is hanging around for a little
while for some reason.  We've run across at least one problem of that
sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to
return unhashed dentries").

In both cases after the restart the first operation the server will get
for that file is a write with a filehandle, and it will have to look up
that filehandle to find the file.  (Whereas without the restart the
initial discovery of the file will be a lookup by name.)

In the "good" case the server already has a dentry cached for that file,
in the "bad" case the umount/mount means that we'll be doing a
cold-cache lookup of that filehandle.

I wonder if the test case can be narrowed down any further....  Is the
large file necessary?  If it's needed only to ensure the writes are
actually sent to the server promptly then it might be enough to do the
nfs mount with -osync.

Instead of the cluster migration or restart, it might be possible to
reproduce the bug just with a

	echo 2 >/proc/sys/vm/drop_caches

run on the server side while the dd is in progress--I don't know if that
will reliably drop the one dentry, though.  Maybe do a few of those in a
row.

--b.

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From vinh.cao at hp.com  Wed May 13 11:38:51 2015
From: vinh.cao at hp.com (Cao, Vinh)
Date: Wed, 13 May 2015 11:38:51 +0000
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <20150513111531.3CFB11F31@mydoom.unipd.it>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
Message-ID: <E277764ADBC61145B70AC4639275B4990474C387@G1W3786.americas.hpqcorp.net>

Sounds like the process that has the file create while you are moving it to another node still open. Meaning you are deleting the file and doing failover at the same time.
This has not things to do with your cluster setup.

I believed , you can run lsof command on the system that you're seeing the disk size is still not clean up. Then grep for deteled arg.
You may see the process number that is still there. Then kill that process and it will clean up the file handle process that is still open.

That is how I see in your problem. I don't think it has any things to do with OS cluster.

Vinh

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of sella gianpietro
Sent: Wednesday, May 13, 2015 7:06 AM
To: 'linux clustering'
Subject: [Linux-cluster] R: nfs cluster, problem with delete file in the failover case

this is the inodes number in the exported folder of the volume in the server before write file in the client:

[root at cld-blu-13 nova]# du --inodes
2       .

this is the used block:

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks    Used  Available
Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000 1152845588
1% /nfscluster

after write file in the client with umount/mount during writing:

[root at cld-blu-13 nova]# du --inodes
3       .

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks     Used
Available Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
1131874068   2% /nfscluster

thi is correct.
now delete file:

[root at cld-blu-13 nova]# du --inodes
2       .

the number of the inodes is correct (from 3 to 2).

[root at cld-blu-13 nova]# df -T
Filesystem                            Type      1K-blocks     Used
Available Use% Mounted on
/dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
1131874068   2% /nfscluster

the number of used block is not correct.
Do not return to initial value 33000


-----Messaggio originale-----
Da: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] Per conto di J. Bruce Fields
Inviato: marted? 12 maggio 2015 17.25
A: linux clustering
Oggetto: Re: [Linux-cluster] nfs cluster, problem with delete file in the failover case

On Tue, May 12, 2015 at 12:37:10AM +0200, gianpietro.sella at unipd.it wrote:
> > On Sun, May 10, 2015 at 11:28:25AM +0200, gianpietro.sella at unipd.it
wrote:
> >> Hi, sorry for my bad english.
> >> I testing nfs cluster active/passsive (2 nodes).
> >> I use the next instruction for nfs:
> >>
> >>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/htm
l/High_Availability_Add-On_Administration/s1-resourcegroupcreatenfs-HAAA.htm
l
> >>
> >> I use centos 7.1 on the nodes.
> >> The 2 node of the cluster share the same iscsi volume.
> >> The nfs cluster is very good.
> >> I have only one problem.
> >> I mount the nfs cluster exported folder on my client node (nfsv3 
> >> protocol).
> >> I write on the nfs folder an big data file (70GB):
> >> dd if=/dev/zero bs=1M count=70000 > /Instances/output.dat Before 
> >> write is finished I put the active node in standby status.
> >> then the resource migrate in the other node.
> >> when the dd write finish the file is ok.
> >> I delete the file output.dat.
> >
> > So, the dd and the later rm are both run on the client, and the rm 
> > after the dd has completed and exited?  And the rm doesn't happen 
> > till after the first migration is completely finished?  What version 
> > of NFS are you using?
> >
> > It sounds like a sillyrename problem, but I don't see the explanation.
> >
> > --b.
> 
> 
> Hi Bruce, thank for your answer.
> yes the dd command and the rm command (all on the client node) finish 
> without error.
> I use nfsv3, but is the same with nfsv4 protocol.
> the s.o. is centos 7.1, the nfs package is nfs-utils-1.3.0-0.8.el7.x86_64.
> the pacemaker configuration is:
> 
> pcs resource create nfsclusterlv LVM volgrpname=nfsclustervg 
> exclusive=true --group nfsclusterha
> 
> pcs resource create nfsclusterdata Filesystem 
> device="/dev/nfsclustervg/nfsclusterlv" directory="/nfscluster"
> fstype="ext4" --group nfsclusterha
> 
> pcs resource create nfsclusterserver nfsserver 
> nfs_shared_infodir=/nfscluster/nfsinfo nfs_no_notify=true --group 
> nfsclusterha
> 
> pcs resource create nfsclusterroot exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash 
> directory=/nfscluster/exports fsid=0 --group  nfsclusterha
> 
> pcs resource create nfsclusternova exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash 
> directory=/nfscluster/exports/nova fsid=1 -- group nfsclusterha
> 
> pcs resource create nfsclusterglance exportfs
> clientspec=192.168.61.0/255.255.255.0 options=rw,sync,no_root_squash 
> directory=/nfscluster/exports/glance fsid=
> 2 --group nfsclusterha
> 
> pcs resource create nfsclustervip IPaddr2 ip=192.168.61.180
cidr_netmask=24
> --group nfsclusterha
> 
> pcs resource create nfsclusternotify nfsnotify 
> source_host=192.168.61.180 --group nfsclusterha
> 
> now I have done the next test.
> nfs cluster with 2 node.
> the first node in standby state.
> the second node in active state.
> I mount the empty (not used space) exported volume in the client with
nfsv3
> protocol (with nfs4 protocol is the same).
> I write on the client an big file (70GB) in the mount directory with 
> dd
(but
> is the same with cp command).
> while the command write the file I disable nfsnotify, Iaddr2, exportfs 
> and nfsserver resource in this order (pcs resource disable ...) and 
> next I enable the resource (pcs resource enable ...) in the reverse order.
> when disable resource writing freeze, when enable resource writing 
> restart without error.
> when the writing command is finished I delete the file.
> the mount directory is empty and the used space of exported volume is 
> 0, this is ok.
> now i repead the test.
> but now I disable/enable even the Filesystem resource:
> disable nfsnotify, Iaddr2, exportfs, nfsserver and Filesystem resource 
> (writing freeze) then enable in the reverse order (writing restart 
> without error).
> when writing command is finished I delete the file.
> now the mounted directory is empty (not file) but the used space is 
> not 0 but is 70GB.
> this is not ok.
> now I execute the next command on the active node of the cluster where 
> the volume is exported with nfs:
> mount -o remount /dev/nfsclustervg/nfsclusterlv where 
> /dev/nfsclustervg/nfsclusterlv is the exported volume (iscsi volume 
> configured with lvm).
> after this command the used space in the mounted directory of the 
> client
is
> 0, this is ok.
> I think that the problem is the Filesystem resource on the active node 
> of the cluster.
> but is very strange.

So, the only difference between the "good" and "bad" cases was the addition of the stop/start of the filesystem resource?  I assume that's equivalent to an umount/mount.

I guess the server's dentry for that file is hanging around for a little while for some reason.  We've run across at least one problem of that sort before (see d891eedbc3b1 "fs/dcache: allow d_obtain_alias() to return unhashed dentries").

In both cases after the restart the first operation the server will get for that file is a write with a filehandle, and it will have to look up that filehandle to find the file.  (Whereas without the restart the initial discovery of the file will be a lookup by name.)

In the "good" case the server already has a dentry cached for that file, in the "bad" case the umount/mount means that we'll be doing a cold-cache lookup of that filehandle.

I wonder if the test case can be narrowed down any further....  Is the large file necessary?  If it's needed only to ensure the writes are actually sent to the server promptly then it might be enough to do the nfs mount with -osync.

Instead of the cluster migration or restart, it might be possible to reproduce the bug just with a

	echo 2 >/proc/sys/vm/drop_caches

run on the server side while the dd is in progress--I don't know if that will reliably drop the one dentry, though.  Maybe do a few of those in a row.

--b.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


From bfields at fieldses.org  Wed May 13 15:46:52 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Wed, 13 May 2015 11:46:52 -0400
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <E277764ADBC61145B70AC4639275B4990474C387@G1W3786.americas.hpqcorp.net>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
	<E277764ADBC61145B70AC4639275B4990474C387@G1W3786.americas.hpqcorp.net>
Message-ID: <20150513154652.GB2070@fieldses.org>

On Wed, May 13, 2015 at 11:38:51AM +0000, Cao, Vinh wrote:
> Sounds like the process that has the file create while you are moving
> it to another node still open.

If I understand correctly, the filesystem is still unmountable.  If a
process held a file on the filesystem open, an unmount attempt would
return -EBUSY.

--b.

> Meaning you are deleting the file and
> doing failover at the same time.  This has not things to do with your
> cluster setup.
> 
> I believed , you can run lsof command on the system that you're seeing
> the disk size is still not clean up. Then grep for deteled arg.  You
> may see the process number that is still there. Then kill that process
> and it will clean up the file handle process that is still open.
> 
> That is how I see in your problem. I don't think it has any things to
> do with OS cluster.


From bfields at fieldses.org  Wed May 13 15:45:19 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Wed, 13 May 2015 11:45:19 -0400
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <20150513111531.3CFB11F31@mydoom.unipd.it>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
Message-ID: <20150513154519.GA2070@fieldses.org>

On Wed, May 13, 2015 at 01:06:17PM +0200, sella gianpietro wrote:
> this is the inodes number in the exported folder of the volume 
> in the server before write file in the client:
> 
> [root at cld-blu-13 nova]# du --inodes
> 2       .
> 
> this is the used block:
> 
> [root at cld-blu-13 nova]# df -T
> Filesystem                            Type      1K-blocks    Used  Available
> Use% Mounted on
> /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000 1152845588
> 1% /nfscluster
> 
> after write file in the client with umount/mount during writing:
> 
> [root at cld-blu-13 nova]# du --inodes
> 3       .
> 
> [root at cld-blu-13 nova]# df -T
> Filesystem                            Type      1K-blocks     Used
> Available Use% Mounted on
> /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> 1131874068   2% /nfscluster
> 
> thi is correct.
> now delete file:
> 
> [root at cld-blu-13 nova]# du --inodes
> 2       .
> 
> the number of the inodes is correct (from 3 to 2).
> 
> [root at cld-blu-13 nova]# df -T
> Filesystem                            Type      1K-blocks     Used
> Available Use% Mounted on
> /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> 1131874068   2% /nfscluster
> 
> the number of used block is not correct.
> Do not return to initial value 33000

If you try "df -i", you'll probably also find that it gives the "wrong"
result.  (So, probably 3 inodes, though "du --inodes" is still only
finding 2).

--b.


From gianpietro.sella at unipd.it  Wed May 13 19:38:03 2015
From: gianpietro.sella at unipd.it (gianpietro sella)
Date: Wed, 13 May 2015 19:38:03 +0000 (UTC)
Subject: [Linux-cluster] R:  nfs cluster,
	problem with delete file in the failover case
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
	<20150513154519.GA2070@fieldses.org>
Message-ID: <loom.20150513T213656-587@post.gmane.org>

J. Bruce Fields <bfields <at> fieldses.org> writes:

> 
> On Wed, May 13, 2015 at 01:06:17PM +0200, sella gianpietro wrote:
> > this is the inodes number in the exported folder of the volume 
> > in the server before write file in the client:
> > 
> > [root <at> cld-blu-13 nova]# du --inodes
> > 2       .
> > 
> > this is the used block:
> > 
> > [root <at> cld-blu-13 nova]# df -T
> > Filesystem                            Type      1K-blocks    Used  Available
> > Use% Mounted on
> > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000 1152845588
> > 1% /nfscluster
> > 
> > after write file in the client with umount/mount during writing:
> > 
> > [root <at> cld-blu-13 nova]# du --inodes
> > 3       .
> > 
> > [root <at> cld-blu-13 nova]# df -T
> > Filesystem                            Type      1K-blocks     Used
> > Available Use% Mounted on
> > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> > 1131874068   2% /nfscluster
> > 
> > thi is correct.
> > now delete file:
> > 
> > [root <at> cld-blu-13 nova]# du --inodes
> > 2       .
> > 
> > the number of the inodes is correct (from 3 to 2).
> > 
> > [root <at> cld-blu-13 nova]# df -T
> > Filesystem                            Type      1K-blocks     Used
> > Available Use% Mounted on
> > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> > 1131874068   2% /nfscluster
> > 
> > the number of used block is not correct.
> > Do not return to initial value 33000
> 
> If you try "df -i", you'll probably also find that it gives the "wrong"
> result.  (So, probably 3 inodes, though "du --inodes" is still only
> finding 2).
> 
> --b.
> 


the problem is that after delete file the inode go in the orphaned state:

[root at cld-blu-13 nova]# tune2fs -l /dev/nfsclustervg/nfsclusterlv |grep -i inode
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Inode count:              72097792
Free inodes:              72097754
Inodes per group:         8192
Inode blocks per group:   512
First inode:              11
Inode size:               256
Journal inode:            8
First orphan inode:       53067783
Journal backup:           inode blocks


From unix.co at gmail.com  Thu May 21 11:01:59 2015
From: unix.co at gmail.com (Umar Draz)
Date: Thu, 21 May 2015 16:01:59 +0500
Subject: [Linux-cluster] iscsi-stonith-device stopped
Message-ID: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>

Hi

I have created 2 node clvm cluster, everything apparently running file, but
when I did

*pcs status*

it always display this

 Clone Set: dlm-clone [dlm]
     Started: [ clvm-1 clvm-2 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ clvm-1 clvm-2 ]
* iscsi-stonith-device   (stonith:fence_scsi):   Stopped*

Failed actions:
    iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1): call=40,
status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:23
2015', queued=0ms, exec=1154ms
    iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1): call=38,
status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:26
2015', queued=0ms, exec=1161ms


PCSD Status:
  clvm-1: Online
  clvm-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Would you please help me why iscsi-stonith device stopped, and how I can
solve this issue.

Br.

Umar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150521/467cbeb1/attachment.htm>

From gianpietro.sella at unipd.it  Thu May 21 12:10:15 2015
From: gianpietro.sella at unipd.it (sella gianpietro)
Date: Thu, 21 May 2015 14:10:15 +0200
Subject: [Linux-cluster] R:  iscsi-stonith-device stopped
In-Reply-To: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>
Message-ID: <20150521122011.D7CCBF706@kletz.unipd.it>

that operating system do you use?
I used fence_scsi with centos 7.1 but do not start.
https://access.redhat.com/solutions/1421063
 
 
  _____  

Da: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] Per conto di Umar Draz
Inviato: gioved? 21 maggio 2015 13.02
A: linux-cluster at redhat.com
Oggetto: [Linux-cluster] iscsi-stonith-device stopped
 
Hi 
 
I have created 2 node clvm cluster, everything apparently running file, but
when I did 
 
pcs status
 
it always display this
 
 Clone Set: dlm-clone [dlm]
     Started: [ clvm-1 clvm-2 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ clvm-1 clvm-2 ]
 iscsi-stonith-device   (stonith:fence_scsi):   Stopped
 
Failed actions:
    iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1): call=40,
status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:23 2015',
queued=0ms, exec=1154ms
    iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1): call=38,
status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:26 2015',
queued=0ms, exec=1161ms
 
 
PCSD Status:
  clvm-1: Online
  clvm-2: Online
 
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
 
 
Would you please help me why iscsi-stonith device stopped, and how I can
solve this issue.
 
Br.
 
Umar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150521/3004291d/attachment.htm>

From unix.co at gmail.com  Thu May 21 12:28:18 2015
From: unix.co at gmail.com (Umar Draz)
Date: Thu, 21 May 2015 17:28:18 +0500
Subject: [Linux-cluster] R: iscsi-stonith-device stopped
In-Reply-To: <20150521122011.D7CCBF706@kletz.unipd.it>
References: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>
	<20150521122011.D7CCBF706@kletz.unipd.it>
Message-ID: <CAAKRE71R=D4Of8Tegw0Ra93GQ9m=t9hvS2t3LXvJD_XEooE3kA@mail.gmail.com>

Hi,

Yes I am using Centos 7, so it will not work on CentOS 7?

Br.

Umar

On Thu, May 21, 2015 at 5:10 PM, sella gianpietro <gianpietro.sella at unipd.it
> wrote:

>  that operating system do you use?
>
> I used fence_scsi with centos 7.1 but do not start.
>
> https://access.redhat.com/solutions/1421063
>
>
>
>
>
>
>
>
>  ------------------------------
>
> *Da:* linux-cluster-bounces at redhat.com [mailto:
> linux-cluster-bounces at redhat.com] *Per conto di *Umar Draz
> *Inviato:* gioved? 21 maggio 2015 13.02
> *A:* linux-cluster at redhat.com
> *Oggetto:* [Linux-cluster] iscsi-stonith-device stopped
>
>
>
> Hi
>
>
>
> I have created 2 node clvm cluster, everything apparently running file,
> but when I did
>
>
>
> *pcs** status*
>
>
>
> it always display this
>
>
>
>  Clone Set: dlm-clone [dlm]
>
>      Started: [ clvm-1 clvm-2 ]
>
>  Clone Set: clvmd-clone [clvmd]
>
>      Started: [ clvm-1 clvm-2 ]
>
> * iscsi-stonith-device   (stonith:fence_scsi):   Stopped*
>
>
>
> Failed actions:
>
>     iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1): call=40,
> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:23
> 2015', queued=0ms, exec=1154ms
>
>     iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1): call=38,
> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:26
> 2015', queued=0ms, exec=1161ms
>
>
>
>
>
> PCSD Status:
>
>   clvm-1: Online
>
>   clvm-2: Online
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
>
>
>
>
> Would you please help me why iscsi-stonith device stopped, and how I can
> solve this issue.
>
>
>
> Br.
>
>
>
> Umar
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Umar Draz
Network Architect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150521/a933e6df/attachment.htm>

From emi2fast at gmail.com  Thu May 21 12:33:23 2015
From: emi2fast at gmail.com (emmanuel segura)
Date: Thu, 21 May 2015 14:33:23 +0200
Subject: [Linux-cluster] R: iscsi-stonith-device stopped
In-Reply-To: <CAAKRE71R=D4Of8Tegw0Ra93GQ9m=t9hvS2t3LXvJD_XEooE3kA@mail.gmail.com>
References: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>
	<20150521122011.D7CCBF706@kletz.unipd.it>
	<CAAKRE71R=D4Of8Tegw0Ra93GQ9m=t9hvS2t3LXvJD_XEooE3kA@mail.gmail.com>
Message-ID: <CAE7pJ3CWBkYzHD3H91cpGk_K4ATRkL85vERhRQwnf-=c_0+B4Q@mail.gmail.com>

I don't know if the fence_scsi is the same used in redhat cluster 5,
but i think that works with scsi reservation, so your disk need to
support scsi 3

2015-05-21 14:28 GMT+02:00 Umar Draz <unix.co at gmail.com>:
> Hi,
>
> Yes I am using Centos 7, so it will not work on CentOS 7?
>
> Br.
>
> Umar
>
> On Thu, May 21, 2015 at 5:10 PM, sella gianpietro
> <gianpietro.sella at unipd.it> wrote:
>>
>> that operating system do you use?
>>
>> I used fence_scsi with centos 7.1 but do not start.
>>
>> https://access.redhat.com/solutions/1421063
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>>
>> Da: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] Per conto di Umar Draz
>> Inviato: gioved? 21 maggio 2015 13.02
>> A: linux-cluster at redhat.com
>> Oggetto: [Linux-cluster] iscsi-stonith-device stopped
>>
>>
>>
>> Hi
>>
>>
>>
>> I have created 2 node clvm cluster, everything apparently running file,
>> but when I did
>>
>>
>>
>> pcs status
>>
>>
>>
>> it always display this
>>
>>
>>
>>  Clone Set: dlm-clone [dlm]
>>
>>      Started: [ clvm-1 clvm-2 ]
>>
>>  Clone Set: clvmd-clone [clvmd]
>>
>>      Started: [ clvm-1 clvm-2 ]
>>
>>  iscsi-stonith-device   (stonith:fence_scsi):   Stopped
>>
>>
>>
>> Failed actions:
>>
>>     iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1): call=40,
>> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:23 2015',
>> queued=0ms, exec=1154ms
>>
>>     iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1): call=38,
>> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:26 2015',
>> queued=0ms, exec=1161ms
>>
>>
>>
>>
>>
>> PCSD Status:
>>
>>   clvm-1: Online
>>
>>   clvm-2: Online
>>
>>
>>
>> Daemon Status:
>>
>>   corosync: active/enabled
>>
>>   pacemaker: active/enabled
>>
>>   pcsd: active/enabled
>>
>>
>>
>>
>>
>> Would you please help me why iscsi-stonith device stopped, and how I can
>> solve this issue.
>>
>>
>>
>> Br.
>>
>>
>>
>> Umar
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Umar Draz
> Network Architect
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^


From bfields at fieldses.org  Thu May 21 18:01:42 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Thu, 21 May 2015 14:01:42 -0400
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <loom.20150513T213656-587@post.gmane.org>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
	<20150513154519.GA2070@fieldses.org>
	<loom.20150513T213656-587@post.gmane.org>
Message-ID: <20150521180142.GA29163@fieldses.org>

On Wed, May 13, 2015 at 07:38:03PM +0000, gianpietro sella wrote:
> J. Bruce Fields <bfields <at> fieldses.org> writes:
> 
> > 
> > On Wed, May 13, 2015 at 01:06:17PM +0200, sella gianpietro wrote:
> > > this is the inodes number in the exported folder of the volume 
> > > in the server before write file in the client:
> > > 
> > > [root <at> cld-blu-13 nova]# du --inodes
> > > 2       .
> > > 
> > > this is the used block:
> > > 
> > > [root <at> cld-blu-13 nova]# df -T
> > > Filesystem                            Type      1K-blocks    Used  Available
> > > Use% Mounted on
> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000 1152845588
> > > 1% /nfscluster
> > > 
> > > after write file in the client with umount/mount during writing:
> > > 
> > > [root <at> cld-blu-13 nova]# du --inodes
> > > 3       .
> > > 
> > > [root <at> cld-blu-13 nova]# df -T
> > > Filesystem                            Type      1K-blocks     Used
> > > Available Use% Mounted on
> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> > > 1131874068   2% /nfscluster
> > > 
> > > thi is correct.
> > > now delete file:
> > > 
> > > [root <at> cld-blu-13 nova]# du --inodes
> > > 2       .
> > > 
> > > the number of the inodes is correct (from 3 to 2).
> > > 
> > > [root <at> cld-blu-13 nova]# df -T
> > > Filesystem                            Type      1K-blocks     Used
> > > Available Use% Mounted on
> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> > > 1131874068   2% /nfscluster
> > > 
> > > the number of used block is not correct.
> > > Do not return to initial value 33000
> > 
> > If you try "df -i", you'll probably also find that it gives the "wrong"
> > result.  (So, probably 3 inodes, though "du --inodes" is still only
> > finding 2).
> > 
> > --b.
> > 
> 
> 
> the problem is that after delete file the inode go in the orphaned state:

Yeah, that's consistent with everything else--we're not removing a
dentry when we should for some reason, so the inode's staying
referenced.

--b.

> 
> [root at cld-blu-13 nova]# tune2fs -l /dev/nfsclustervg/nfsclusterlv |grep -i inode
> Filesystem features:      has_journal ext_attr resize_inode dir_index
> filetype needs_recovery sparse_super large_file
> Inode count:              72097792
> Free inodes:              72097754
> Inodes per group:         8192
> Inode blocks per group:   512
> First inode:              11
> Inode size:               256
> Journal inode:            8
> First orphan inode:       53067783
> Journal backup:           inode blocks
> 
> 
> 
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


From gianpietro.sella at unipd.it  Thu May 21 19:05:36 2015
From: gianpietro.sella at unipd.it (gianpietro.sella at unipd.it)
Date: Thu, 21 May 2015 21:05:36 +0200
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <20150521180142.GA29163@fieldses.org>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
	<20150513154519.GA2070@fieldses.org>
	<loom.20150513T213656-587@post.gmane.org>
	<20150521180142.GA29163@fieldses.org>
Message-ID: <8467be50c20c996d71b0412b6e8a9677.squirrel@webmail.unipd.it>

> On Wed, May 13, 2015 at 07:38:03PM +0000, gianpietro sella wrote:
>> J. Bruce Fields <bfields <at> fieldses.org> writes:
>>
>> >
>> > On Wed, May 13, 2015 at 01:06:17PM +0200, sella gianpietro wrote:
>> > > this is the inodes number in the exported folder of the volume
>> > > in the server before write file in the client:
>> > >
>> > > [root <at> cld-blu-13 nova]# du --inodes
>> > > 2       .
>> > >
>> > > this is the used block:
>> > >
>> > > [root <at> cld-blu-13 nova]# df -T
>> > > Filesystem                            Type      1K-blocks    Used
>> Available
>> > > Use% Mounted on
>> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000
>> 1152845588
>> > > 1% /nfscluster
>> > >
>> > > after write file in the client with umount/mount during writing:
>> > >
>> > > [root <at> cld-blu-13 nova]# du --inodes
>> > > 3       .
>> > >
>> > > [root <at> cld-blu-13 nova]# df -T
>> > > Filesystem                            Type      1K-blocks     Used
>> > > Available Use% Mounted on
>> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
>> > > 1131874068   2% /nfscluster
>> > >
>> > > thi is correct.
>> > > now delete file:
>> > >
>> > > [root <at> cld-blu-13 nova]# du --inodes
>> > > 2       .
>> > >
>> > > the number of the inodes is correct (from 3 to 2).
>> > >
>> > > [root <at> cld-blu-13 nova]# df -T
>> > > Filesystem                            Type      1K-blocks     Used
>> > > Available Use% Mounted on
>> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
>> > > 1131874068   2% /nfscluster
>> > >
>> > > the number of used block is not correct.
>> > > Do not return to initial value 33000
>> >
>> > If you try "df -i", you'll probably also find that it gives the
>> "wrong"
>> > result.  (So, probably 3 inodes, though "du --inodes" is still only
>> > finding 2).
>> >
>> > --b.
>> >
>>
>>
>> the problem is that after delete file the inode go in the orphaned
>> state:
>
> Yeah, that's consistent with everything else--we're not removing a
> dentry when we should for some reason, so the inode's staying
> referenced.
>
> --b.
>

tanks Bruce.
yes this is true.
I use nfs cluster on 2 node for nova instances in openstack (the instamces
are stored on nfs folder).
the probability that I create an file before an failover and then I delete
the file file after failover is very little.
In this case I can execute an "mount -o remount" after the failover and
delete command and the orpahned inode is deleted and the free disk space
is ok.
I do not understand who use the file after failover and delete command.
After I delete the file I do not see process that use the deleted file.
this is very strange.
But my is just an curiosity.
I think that the cause is the unmount operation on the failover node.


>>
>> [root at cld-blu-13 nova]# tune2fs -l /dev/nfsclustervg/nfsclusterlv |grep
>> -i inode
>> Filesystem features:      has_journal ext_attr resize_inode dir_index
>> filetype needs_recovery sparse_super large_file
>> Inode count:              72097792
>> Free inodes:              72097754
>> Inodes per group:         8192
>> Inode blocks per group:   512
>> First inode:              11
>> Inode size:               256
>> Journal inode:            8
>> First orphan inode:       53067783
>> Journal backup:           inode blocks
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>


From unix.co at gmail.com  Fri May 22 06:22:45 2015
From: unix.co at gmail.com (Umar Draz)
Date: Fri, 22 May 2015 11:22:45 +0500
Subject: [Linux-cluster] R: iscsi-stonith-device stopped
In-Reply-To: <CAE7pJ3CWBkYzHD3H91cpGk_K4ATRkL85vERhRQwnf-=c_0+B4Q@mail.gmail.com>
References: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>
	<20150521122011.D7CCBF706@kletz.unipd.it>
	<CAAKRE71R=D4Of8Tegw0Ra93GQ9m=t9hvS2t3LXvJD_XEooE3kA@mail.gmail.com>
	<CAE7pJ3CWBkYzHD3H91cpGk_K4ATRkL85vERhRQwnf-=c_0+B4Q@mail.gmail.com>
Message-ID: <CAAKRE71dG1TF_Lr4cM=3D7t8Y=T8zHR-Nj4+rZrtnJQQrO9hkQ@mail.gmail.com>

Hi

Thanks for your response, I will check about this scsi 3 supprot, Now I
another question

How i can remove the dead node from my cluster, I used this

pcs cluster node remove leftnode

but it wasn't working due to this error *(Error: pcsd is not running on
leftnode)*

Br.

Umar

On Thu, May 21, 2015 at 5:33 PM, emmanuel segura <emi2fast at gmail.com> wrote:

> I don't know if the fence_scsi is the same used in redhat cluster 5,
> but i think that works with scsi reservation, so your disk need to
> support scsi 3
>
> 2015-05-21 14:28 GMT+02:00 Umar Draz <unix.co at gmail.com>:
> > Hi,
> >
> > Yes I am using Centos 7, so it will not work on CentOS 7?
> >
> > Br.
> >
> > Umar
> >
> > On Thu, May 21, 2015 at 5:10 PM, sella gianpietro
> > <gianpietro.sella at unipd.it> wrote:
> >>
> >> that operating system do you use?
> >>
> >> I used fence_scsi with centos 7.1 but do not start.
> >>
> >> https://access.redhat.com/solutions/1421063
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >>
> >> Da: linux-cluster-bounces at redhat.com
> >> [mailto:linux-cluster-bounces at redhat.com] Per conto di Umar Draz
> >> Inviato: gioved? 21 maggio 2015 13.02
> >> A: linux-cluster at redhat.com
> >> Oggetto: [Linux-cluster] iscsi-stonith-device stopped
> >>
> >>
> >>
> >> Hi
> >>
> >>
> >>
> >> I have created 2 node clvm cluster, everything apparently running file,
> >> but when I did
> >>
> >>
> >>
> >> pcs status
> >>
> >>
> >>
> >> it always display this
> >>
> >>
> >>
> >>  Clone Set: dlm-clone [dlm]
> >>
> >>      Started: [ clvm-1 clvm-2 ]
> >>
> >>  Clone Set: clvmd-clone [clvmd]
> >>
> >>      Started: [ clvm-1 clvm-2 ]
> >>
> >>  iscsi-stonith-device   (stonith:fence_scsi):   Stopped
> >>
> >>
> >>
> >> Failed actions:
> >>
> >>     iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1): call=40,
> >> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:23
> 2015',
> >> queued=0ms, exec=1154ms
> >>
> >>     iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1): call=38,
> >> status=Error, exit-reason='none', last-rc-change='Thu May 21 05:52:26
> 2015',
> >> queued=0ms, exec=1161ms
> >>
> >>
> >>
> >>
> >>
> >> PCSD Status:
> >>
> >>   clvm-1: Online
> >>
> >>   clvm-2: Online
> >>
> >>
> >>
> >> Daemon Status:
> >>
> >>   corosync: active/enabled
> >>
> >>   pacemaker: active/enabled
> >>
> >>   pcsd: active/enabled
> >>
> >>
> >>
> >>
> >>
> >> Would you please help me why iscsi-stonith device stopped, and how I can
> >> solve this issue.
> >>
> >>
> >>
> >> Br.
> >>
> >>
> >>
> >> Umar
> >>
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
> > --
> > Umar Draz
> > Network Architect
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
> --
>   .~.
>   /V\
>  //  \\
> /(   )\
> ^`~'^
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Umar Draz
Network Architect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150522/3bb06731/attachment.htm>

From tojeline at redhat.com  Wed May 27 10:54:59 2015
From: tojeline at redhat.com (Tomas Jelinek)
Date: Wed, 27 May 2015 12:54:59 +0200
Subject: [Linux-cluster] R: iscsi-stonith-device stopped
In-Reply-To: <CAAKRE71dG1TF_Lr4cM=3D7t8Y=T8zHR-Nj4+rZrtnJQQrO9hkQ@mail.gmail.com>
References: <CAAKRE73cYQmEsVNPwvEy00eaNYNKkhX+FAXs0oJtQJBNDPfqfQ@mail.gmail.com>	<20150521122011.D7CCBF706@kletz.unipd.it>	<CAAKRE71R=D4Of8Tegw0Ra93GQ9m=t9hvS2t3LXvJD_XEooE3kA@mail.gmail.com>	<CAE7pJ3CWBkYzHD3H91cpGk_K4ATRkL85vERhRQwnf-=c_0+B4Q@mail.gmail.com>
	<CAAKRE71dG1TF_Lr4cM=3D7t8Y=T8zHR-Nj4+rZrtnJQQrO9hkQ@mail.gmail.com>
Message-ID: <5565A283.9030505@redhat.com>

Dne 22.5.2015 v 08:22 Umar Draz napsal(a):
> Hi
>
> Thanks for your response, I will check about this scsi 3 supprot, Now I
> another question
>
> How i can remove the dead node from my cluster, I used this
>
> pcs cluster node remove leftnode
>
> but it wasn't working due to this error *(Error: pcsd is not running on
> leftnode)*

Hi,

You can remove it like this:
1. run 'pcs cluster localnode remove leftnode' on all remaining nodes
2. run 'pcs cluster reload corosync' on one remaining node
3. run 'crm_node -R leftnode --force' on one remaining node

Tomas

>
> Br.
>
> Umar
>
> On Thu, May 21, 2015 at 5:33 PM, emmanuel segura <emi2fast at gmail.com
> <mailto:emi2fast at gmail.com>> wrote:
>
>     I don't know if the fence_scsi is the same used in redhat cluster 5,
>     but i think that works with scsi reservation, so your disk need to
>     support scsi 3
>
>     2015-05-21 14:28 GMT+02:00 Umar Draz <unix.co at gmail.com
>     <mailto:unix.co at gmail.com>>:
>      > Hi,
>      >
>      > Yes I am using Centos 7, so it will not work on CentOS 7?
>      >
>      > Br.
>      >
>      > Umar
>      >
>      > On Thu, May 21, 2015 at 5:10 PM, sella gianpietro
>      > <gianpietro.sella at unipd.it <mailto:gianpietro.sella at unipd.it>> wrote:
>      >>
>      >> that operating system do you use?
>      >>
>      >> I used fence_scsi with centos 7.1 but do not start.
>      >>
>      >> https://access.redhat.com/solutions/1421063
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >> ________________________________
>      >>
>      >> Da: linux-cluster-bounces at redhat.com
>     <mailto:linux-cluster-bounces at redhat.com>
>      >> [mailto:linux-cluster-bounces at redhat.com
>     <mailto:linux-cluster-bounces at redhat.com>] Per conto di Umar Draz
>      >> Inviato: gioved? 21 maggio 2015 13.02
>      >> A: linux-cluster at redhat.com <mailto:linux-cluster at redhat.com>
>      >> Oggetto: [Linux-cluster] iscsi-stonith-device stopped
>      >>
>      >>
>      >>
>      >> Hi
>      >>
>      >>
>      >>
>      >> I have created 2 node clvm cluster, everything apparently
>     running file,
>      >> but when I did
>      >>
>      >>
>      >>
>      >> pcs status
>      >>
>      >>
>      >>
>      >> it always display this
>      >>
>      >>
>      >>
>      >>  Clone Set: dlm-clone [dlm]
>      >>
>      >>      Started: [ clvm-1 clvm-2 ]
>      >>
>      >>  Clone Set: clvmd-clone [clvmd]
>      >>
>      >>      Started: [ clvm-1 clvm-2 ]
>      >>
>      >>  iscsi-stonith-device   (stonith:fence_scsi):   Stopped
>      >>
>      >>
>      >>
>      >> Failed actions:
>      >>
>      >>     iscsi-stonith-device_start_0 on clvm-1 'unknown error' (1):
>     call=40,
>      >> status=Error, exit-reason='none', last-rc-change='Thu May 21
>     05:52:23 2015',
>      >> queued=0ms, exec=1154ms
>      >>
>      >>     iscsi-stonith-device_start_0 on clvm-2 'unknown error' (1):
>     call=38,
>      >> status=Error, exit-reason='none', last-rc-change='Thu May 21
>     05:52:26 2015',
>      >> queued=0ms, exec=1161ms
>      >>
>      >>
>      >>
>      >>
>      >>
>      >> PCSD Status:
>      >>
>      >>   clvm-1: Online
>      >>
>      >>   clvm-2: Online
>      >>
>      >>
>      >>
>      >> Daemon Status:
>      >>
>      >>   corosync: active/enabled
>      >>
>      >>   pacemaker: active/enabled
>      >>
>      >>   pcsd: active/enabled
>      >>
>      >>
>      >>
>      >>
>      >>
>      >> Would you please help me why iscsi-stonith device stopped, and
>     how I can
>      >> solve this issue.
>      >>
>      >>
>      >>
>      >> Br.
>      >>
>      >>
>      >>
>      >> Umar
>      >>
>      >>
>      >> --
>      >> Linux-cluster mailing list
>      >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>      >> https://www.redhat.com/mailman/listinfo/linux-cluster
>      >
>      >
>      >
>      >
>      > --
>      > Umar Draz
>      > Network Architect
>      >
>      > --
>      > Linux-cluster mailing list
>      > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>      > https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>     --
>        .~.
>        /V\
>       //  \\
>     /(   )\
>     ^`~'^
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> Umar Draz
> Network Architect
>
>