From nagemnna at gmail.com  Wed Jul  1 02:51:16 2015
From: nagemnna at gmail.com (Megan .)
Date: Tue, 30 Jun 2015 22:51:16 -0400
Subject: [Linux-cluster] new cluster setup error
Message-ID: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>

Good Evening!


Anyone seen this before?  I just setup these boxes and i'm trying to create
a new cluster.  I set the ricci password on all of the nodes, started
ricci.  I try to create cluster and i get the below.

Thanks!


Centos 6.6

 2.6.32-504.23.4.el6.x86_64


ccs-0.16.2-75.el6_6.2.x86_64

ricci-0.16.2-75.el6_6.1.x86_64

cman-3.0.12.1-68.el6.x86_64




[root at admin1-dit cluster]# ccs --createcluster test

Traceback (most recent call last):

  File "/usr/sbin/ccs", line 2450, in <module>

    main(sys.argv[1:])

  File "/usr/sbin/ccs", line 286, in main

    if (createcluster): create_cluster(clustername)

  File "/usr/sbin/ccs", line 939, in create_cluster

    elif get_cluster_conf_xml() != f.read():

  File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml

    xml = send_ricci_command("cluster", "get_cluster.conf")

  File "/usr/sbin/ccs", line 2340, in send_ricci_command

    dom = minidom.parseString(res[1].replace('\t',''))

  File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in parseString

    return expatbuilder.parseString(string)

  File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
parseString

    return builder.parseString(string)

  File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
parseString

    parser.Parse(string, True)

xml.parsers.expat.ExpatError: no element found: line 1, column 0

[root at admin1-dit cluster]#
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150630/73560530/attachment.htm>

From lists at alteeve.ca  Wed Jul  1 02:55:10 2015
From: lists at alteeve.ca (Digimer)
Date: Tue, 30 Jun 2015 22:55:10 -0400
Subject: [Linux-cluster] new cluster setup error
In-Reply-To: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
References: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
Message-ID: <5593568E.2020708@alteeve.ca>

On 30/06/15 10:51 PM, Megan . wrote:
> Good Evening!
> 
> Anyone seen this before?  I just setup these boxes and i'm trying to
> create a new cluster.  I set the ricci password on all of the nodes,
> started ricci.  I try to create cluster and i get the below.
> 
> Thanks!
> 
> 
> Centos 6.6
>  2.6.32-504.23.4.el6.x86_64 
> 
> ccs-0.16.2-75.el6_6.2.x86_64
> ricci-0.16.2-75.el6_6.1.x86_64
> cman-3.0.12.1-68.el6.x86_64
> 
> [root at admin1-dit cluster]# ccs --createcluster test
> 
> Traceback (most recent call last):
>   File "/usr/sbin/ccs", line 2450, in <module>
>     main(sys.argv[1:])
>   File "/usr/sbin/ccs", line 286, in main
>     if (createcluster): create_cluster(clustername)
>   File "/usr/sbin/ccs", line 939, in create_cluster
>     elif get_cluster_conf_xml() != f.read():
>   File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml
>     xml = send_ricci_command("cluster", "get_cluster.conf")
>   File "/usr/sbin/ccs", line 2340, in send_ricci_command
>     dom = minidom.parseString(res[1].replace('\t',''))
>   File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in parseString
>     return expatbuilder.parseString(string)
>   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
> parseString
>     return builder.parseString(string)
>   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
> parseString
>     parser.Parse(string, True)
> xml.parsers.expat.ExpatError: no element found: line 1, column 0

Are the ricci and modclusterd daemons running? Does your firewall allow
TCP ports 11111 and 16851 between nodes? Does the file
/etc/cluster/cluster.conf exist and, if so, does 'ls -lahZ' show:

-rw-r-----. root root system_u:object_r:cluster_conf_t:s0
/etc/cluster/cluster.conf

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?



From swhiteho at redhat.com  Wed Jul  1 08:06:21 2015
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 01 Jul 2015 09:06:21 +0100
Subject: [Linux-cluster] Finding the bottleneck between SAN and GFS2
In-Reply-To: <8761653s3s.fsf@hati.baby-gnu.org>
References: <8761653s3s.fsf@hati.baby-gnu.org>
Message-ID: <55939F7D.2060201@redhat.com>

Hi,

On 30/06/15 20:37, Daniel Dehennin wrote:
> Hello,
>
> We are experiencing slow VMs on our OpenNebula architecture:
>
> - two Dell PowerEdge M620
>    + Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
>    + 96GB RAM
>    + 2x146Go SAS drives
>
> - 2TB SAN LUN to store qcow2 images with GFS2 over cLVM
>
> We made some tests, installing Linux OS in parallel and we did not find
> any issues with performance.
>
> Since 3 weeks, 17 users use ?60 VMs and everything became slow.
>
> The SAN administrator complain about very high IO/s so we limited each
> VM to 80 IO/s with the libvirt configuration
>
> #+begin_src xml
> <total_iops_sec>80</total_bytes_sec>
> #+end_src
>
> But it did not get better
>
> Today I ran some benchmark to try to find out what happens.
>
> Checking plocks/s
> =================
>
> I started with ping_pong[1] to see how many locks per second the GFS2
> can sustain.
>
> I use it as describe on the samba wiki[2], here are the results:
>
> - starting ?ping_pong /var/lib/one/datastores/test_plock 3? on first
>    node display around 4k plocks/s
>
> - then starting ?ping_pong /var/lib/one/datastores/test_plock 3? on the
>    second node display around 2k on each node
>
> For the single node process, I was expecting an much higher rate, they
> speak about 500k to 1M locks/s.
>
> Do my numbers looks strange?
>
> Checking fileio
> ===============
>
> I use ?sysbench --test=fileio? to check inside the VM and outside (on
> bare metal node), with files in cache or cache dropped.
>
> The short result is that bare metal access to the GFS2 without any cache
> is terribly slow, around 2Mb/s and 90 requests/s.
>
> Is there a way to find out if the problem comes from my
> GFS2/corosync/pacemaker configuration or from the SAN?
>
> Regards.
>
>
>
> Following are the full sysbench results
>
> In the VM, qemu disk cache disabled, total_iops_sec = 0
> -------------------------------------------------------
>
> I try with the IO limit but the difference is minimal:
>
> - the request/s drop to ?80
> - the Mb/s is around 1.2Mb/s
>
>      root at vm:~# sysbench --num-threads=16 --test=fileio --file-total-size=9G --file-test-mode=rndrw prepare
>      sysbench 0.4.12:  multi-threaded system evaluation benchmark
>      
>      128 files, 73728Kb each, 9216Mb total
>      Creating files for the test...
>
>      root at vm:~# sysbench --num-threads=16 --test=fileio --file-total-size=9G --file-test-mode=rndrw run
>      sysbench 0.4.12:  multi-threaded system evaluation benchmark
>      
>      Running the test with following options:
>      Number of threads: 16
>      
>      Extra file open flags: 0
>      128 files, 72Mb each
>      9Gb total file size
>      Block size 16Kb
>      Number of random requests for random IO: 10000
>      Read/Write ratio for combined random IO test: 1.50
>      Periodic FSYNC enabled, calling fsync() each 100 requests.
>      Calling fsync() at the end of test, Enabled.
>      Using synchronous I/O mode
>      Doing random r/w test
>      Threads started!
>      Done.
>      
>      Operations performed:  6034 Read, 4019 Write, 12808 Other = 22861 Total
>      Read 94.281Mb  Written 62.797Mb  Total transferred 157.08Mb  (1.4318Mb/sec)
>         91.64 Requests/sec executed
>      
>      Test execution summary:
>          total time:                          109.7050s
>          total number of events:              10053
>          total time taken by event execution: 464.7600
>          per-request statistics:
>               min:                                  0.01ms
>               avg:                                 46.23ms
>               max:                              11488.59ms
>               approx.  95 percentile:             125.81ms
>      
>      Threads fairness:
>          events (avg/stddev):           628.3125/59.81
>          execution time (avg/stddev):   29.0475/6.34
>
> On the bare metal node, with the caches dropped
> -----------------------------------------------
>
> After creating the 128 files, I drop the caches to get ?from SAN? results.
>
>      root at nebula1:/var/lib/one/datastores/bench# sysbench --num-threads=16 --test=fileio --file-total-size=9G --file-test-mode=rndrw prepare
>      sysbench 0.4.12:  multi-threaded system evaluation benchmark
>      
>      128 files, 73728Kb each, 9216Mb total
>      Creating files for the test...
>
>      # DROP CACHES
>      root at nebula1: echo 3 > /proc/sys/vm/drop_caches
>      
>      root at nebula1:/var/lib/one/datastores/bench# sysbench --num-threads=16 --test=fileio --file-total-size=9G --file-test-mode=rndrw run
>      sysbench 0.4.12:  multi-threaded system evaluation benchmark
>      
>      Running the test with following options:
>      Number of threads: 16
>      
>      Extra file open flags: 0
>      128 files, 72Mb each
>      9Gb total file size
>      Block size 16Kb
>      Number of random requests for random IO: 10000
>      Read/Write ratio for combined random IO test: 1.50
>      Periodic FSYNC enabled, calling fsync() each 100 requests.
>      Calling fsync() at the end of test, Enabled.
>      Using synchronous I/O mode
>      Doing random r/w test
>      Threads started!
>      Done.
>      
>      Operations performed:  6013 Read, 3999 Write, 12800 Other = 22812 Total
>      Read 93.953Mb  Written 62.484Mb  Total transferred 156.44Mb  (1.5465Mb/sec)
>         98.98 Requests/sec executed
>      
>      Test execution summary:
>          total time:                          101.1559s
>          total number of events:              10012
>          total time taken by event execution: 1109.0862
>          per-request statistics:
>               min:                                  0.01ms
>               avg:                                110.78ms
>               max:                              13098.27ms
>               approx.  95 percentile:             164.52ms
>      
>      Threads fairness:
>          events (avg/stddev):           625.7500/114.50
>          execution time (avg/stddev):   69.3179/6.54
>
>
> On the bare metal node, with the test files filled in the cache
> ---------------------------------------------------------------
>
> I run md5sum on all the files to let the kernel cache them.
>
>      # Load files in cache
>      root at nebula1:/var/lib/one/datastores/bench# md5sum test*
>
>      root at nebula1:/var/lib/one/datastores/bench# sysbench --num-threads=16 --test=fileio --file-total-size=9G --file-test-mode=rndrw run
>      sysbench 0.4.12:  multi-threaded system evaluation benchmark
>      
>      Running the test with following options:
>      Number of threads: 16
>      
>      Extra file open flags: 0
>      128 files, 72Mb each
>      9Gb total file size
>      Block size 16Kb
>      Number of random requests for random IO: 10000
>      Read/Write ratio for combined random IO test: 1.50
>      Periodic FSYNC enabled, calling fsync() each 100 requests.
>      Calling fsync() at the end of test, Enabled.
>      Using synchronous I/O mode
>      Doing random r/w test
>      Threads started!
>      Done.
>      
>      Operations performed:  6069 Read, 4061 Write, 12813 Other = 22943 Total
>      Read 94.828Mb  Written 63.453Mb  Total transferred 158.28Mb  (54.896Mb/sec)
>       3513.36 Requests/sec executed
>      
>      Test execution summary:
>          total time:                          2.8833s
>          total number of events:              10130
>          total time taken by event execution: 16.3824
>          per-request statistics:
>               min:                                  0.01ms
>               avg:                                  1.62ms
>               max:                                760.53ms
>               approx.  95 percentile:               5.51ms
>      
>      Threads fairness:
>          events (avg/stddev):           633.1250/146.90
>          execution time (avg/stddev):   1.0239/0.33
>
>
> Footnotes:
> [1]  https://git.samba.org/?p=ctdb.git;a=blob;f=utils/ping_pong/ping_pong.c
>
> [2]  https://wiki.samba.org/index.php/Ping_pong
>
>
>

Why are you using the ping_pong test? Does qemu use fcntl locks? Are you 
trying to share any of those images across nodes? (i.e. mounted on more 
than one node at once?)

What is the raw speed of the block device? I'd also suggest checking the 
files that are created to see if they are being fragmented (the filefrag 
tool will tell you) in case that is the problem?

Steve.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/ae7f96a6/attachment.htm>

From daniel.dehennin at baby-gnu.org  Wed Jul  1 09:02:52 2015
From: daniel.dehennin at baby-gnu.org (Daniel Dehennin)
Date: Wed, 01 Jul 2015 11:02:52 +0200
Subject: [Linux-cluster] Finding the bottleneck between SAN and GFS2
In-Reply-To: <55939F7D.2060201@redhat.com> (Steven Whitehouse's message of
	"Wed, 01 Jul 2015 09:06:21 +0100")
References: <8761653s3s.fsf@hati.baby-gnu.org> <55939F7D.2060201@redhat.com>
Message-ID: <87zj3g2qtf.fsf@hati.baby-gnu.org>

Steven Whitehouse <swhiteho at redhat.com> writes:

> Why are you using the ping_pong test? Does qemu use fcntl locks? Are
> you trying to share any of those images across nodes? (i.e. mounted on
> more than one node at once?)

No, but I didn't know what and where to look, I test it if it could be
relevant.

> What is the raw speed of the block device? I'd also suggest checking
> the files that are created to see if they are being fragmented (the
> filefrag tool will tell you) in case that is the problem?

Erf, I didn't check before before using it as LVM PV.

Now I can only test through GFS2-cLVM-multipath.

I'll see if I can have another LUN to check the raw speed of the block
device.

Regards.

-- 
Daniel Dehennin
R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 342 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/a1ddd95a/attachment.sig>

From nagemnna at gmail.com  Wed Jul  1 10:53:44 2015
From: nagemnna at gmail.com (Megan .)
Date: Wed, 1 Jul 2015 06:53:44 -0400
Subject: [Linux-cluster] new cluster setup error
In-Reply-To: <5593568E.2020708@alteeve.ca>
References: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
	<5593568E.2020708@alteeve.ca>
Message-ID: <CACMA5-xbBU1PUEi_c-E0U4Lk9hOtkUYpmyarWVd1V257eL6Vmg@mail.gmail.com>

Thanks for your help.  I shutdown iptables, turned down selinux for now,
ricci is up, modclusterd is up.  Still get the same error as before.  Right
now i do not have a cluster.conf but if i put one that i generate manually
there i still get the same error.

[root at admin1-dit ~]# getenforce

Permissive

[root at admin1-dit ~]# service iptables status

iptables: Firewall is not running.

[root at admin1-dit ~]# pstree -paul | grep ricci

  |-ricci,6779,ricci -u ricci

  |           |-grep,6853 ricci

[root at admin1-dit ~]# pstree -paul | grep mod

  |-modclusterd,6815

  |   |-{modclusterd},6816

  |   `-{modclusterd},6817

  |           |-grep,6855 mod

[root at admin1-dit ~]#

On Tue, Jun 30, 2015 at 10:55 PM, Digimer <lists at alteeve.ca> wrote:

> On 30/06/15 10:51 PM, Megan . wrote:
> > Good Evening!
> >
> > Anyone seen this before?  I just setup these boxes and i'm trying to
> > create a new cluster.  I set the ricci password on all of the nodes,
> > started ricci.  I try to create cluster and i get the below.
> >
> > Thanks!
> >
> >
> > Centos 6.6
> >  2.6.32-504.23.4.el6.x86_64
> >
> > ccs-0.16.2-75.el6_6.2.x86_64
> > ricci-0.16.2-75.el6_6.1.x86_64
> > cman-3.0.12.1-68.el6.x86_64
> >
> > [root at admin1-dit cluster]# ccs --createcluster test
> >
> > Traceback (most recent call last):
> >   File "/usr/sbin/ccs", line 2450, in <module>
> >     main(sys.argv[1:])
> >   File "/usr/sbin/ccs", line 286, in main
> >     if (createcluster): create_cluster(clustername)
> >   File "/usr/sbin/ccs", line 939, in create_cluster
> >     elif get_cluster_conf_xml() != f.read():
> >   File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml
> >     xml = send_ricci_command("cluster", "get_cluster.conf")
> >   File "/usr/sbin/ccs", line 2340, in send_ricci_command
> >     dom = minidom.parseString(res[1].replace('\t',''))
> >   File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in
> parseString
> >     return expatbuilder.parseString(string)
> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
> > parseString
> >     return builder.parseString(string)
> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
> > parseString
> >     parser.Parse(string, True)
> > xml.parsers.expat.ExpatError: no element found: line 1, column 0
>
> Are the ricci and modclusterd daemons running? Does your firewall allow
> TCP ports 11111 and 16851 between nodes? Does the file
> /etc/cluster/cluster.conf exist and, if so, does 'ls -lahZ' show:
>
> -rw-r-----. root root system_u:object_r:cluster_conf_t:s0
> /etc/cluster/cluster.conf
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/227df6c9/attachment.htm>

From nagemnna at gmail.com  Wed Jul  1 12:35:53 2015
From: nagemnna at gmail.com (Megan .)
Date: Wed, 1 Jul 2015 08:35:53 -0400
Subject: [Linux-cluster] new cluster setup error
In-Reply-To: <CACMA5-xbBU1PUEi_c-E0U4Lk9hOtkUYpmyarWVd1V257eL6Vmg@mail.gmail.com>
References: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
	<5593568E.2020708@alteeve.ca>
	<CACMA5-xbBU1PUEi_c-E0U4Lk9hOtkUYpmyarWVd1V257eL6Vmg@mail.gmail.com>
Message-ID: <CACMA5-yCPMc==FQXDBHnf=r6FSnHTs-VZOtULB9D=Qyjj5QKLA@mail.gmail.com>

I started ricci in debug mode and I get the below error.  Any idea where
its trying to open a temp file?  as far as i can see everything in
/var/lib/ricci is good

[root at admin1-dit init.d]# ricci -u ricci -df

failed to load authorized CAs

failed to load authorized CAs

client added

ClientInstance.cpp:145: exception: unable to open temp file

request completed in 8 milliseconds

client removed


On Wed, Jul 1, 2015 at 6:53 AM, Megan . <nagemnna at gmail.com> wrote:

> Thanks for your help.  I shutdown iptables, turned down selinux for now,
> ricci is up, modclusterd is up.  Still get the same error as before.  Right
> now i do not have a cluster.conf but if i put one that i generate manually
> there i still get the same error.
>
> [root at admin1-dit ~]# getenforce
>
> Permissive
>
> [root at admin1-dit ~]# service iptables status
>
> iptables: Firewall is not running.
>
> [root at admin1-dit ~]# pstree -paul | grep ricci
>
>   |-ricci,6779,ricci -u ricci
>
>   |           |-grep,6853 ricci
>
> [root at admin1-dit ~]# pstree -paul | grep mod
>
>   |-modclusterd,6815
>
>   |   |-{modclusterd},6816
>
>   |   `-{modclusterd},6817
>
>   |           |-grep,6855 mod
>
> [root at admin1-dit ~]#
>
> On Tue, Jun 30, 2015 at 10:55 PM, Digimer <lists at alteeve.ca> wrote:
>
>> On 30/06/15 10:51 PM, Megan . wrote:
>> > Good Evening!
>> >
>> > Anyone seen this before?  I just setup these boxes and i'm trying to
>> > create a new cluster.  I set the ricci password on all of the nodes,
>> > started ricci.  I try to create cluster and i get the below.
>> >
>> > Thanks!
>> >
>> >
>> > Centos 6.6
>> >  2.6.32-504.23.4.el6.x86_64
>> >
>> > ccs-0.16.2-75.el6_6.2.x86_64
>> > ricci-0.16.2-75.el6_6.1.x86_64
>> > cman-3.0.12.1-68.el6.x86_64
>> >
>> > [root at admin1-dit cluster]# ccs --createcluster test
>> >
>> > Traceback (most recent call last):
>> >   File "/usr/sbin/ccs", line 2450, in <module>
>> >     main(sys.argv[1:])
>> >   File "/usr/sbin/ccs", line 286, in main
>> >     if (createcluster): create_cluster(clustername)
>> >   File "/usr/sbin/ccs", line 939, in create_cluster
>> >     elif get_cluster_conf_xml() != f.read():
>> >   File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml
>> >     xml = send_ricci_command("cluster", "get_cluster.conf")
>> >   File "/usr/sbin/ccs", line 2340, in send_ricci_command
>> >     dom = minidom.parseString(res[1].replace('\t',''))
>> >   File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in
>> parseString
>> >     return expatbuilder.parseString(string)
>> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
>> > parseString
>> >     return builder.parseString(string)
>> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
>> > parseString
>> >     parser.Parse(string, True)
>> > xml.parsers.expat.ExpatError: no element found: line 1, column 0
>>
>> Are the ricci and modclusterd daemons running? Does your firewall allow
>> TCP ports 11111 and 16851 between nodes? Does the file
>> /etc/cluster/cluster.conf exist and, if so, does 'ls -lahZ' show:
>>
>> -rw-r-----. root root system_u:object_r:cluster_conf_t:s0
>> /etc/cluster/cluster.conf
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/de0dc5c6/attachment.htm>

From rpeterso at redhat.com  Wed Jul  1 12:54:19 2015
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 1 Jul 2015 08:54:19 -0400 (EDT)
Subject: [Linux-cluster] Finding the bottleneck between SAN and GFS2
In-Reply-To: <8761653s3s.fsf@hati.baby-gnu.org>
References: <8761653s3s.fsf@hati.baby-gnu.org>
Message-ID: <2128530162.28648539.1435755259681.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Hello,
> 
> We are experiencing slow VMs on our OpenNebula architecture:
(snip)
> The short result is that bare metal access to the GFS2 without any cache
> is terribly slow, around 2Mb/s and 90 requests/s.
> 
> Is there a way to find out if the problem comes from my
> GFS2/corosync/pacemaker configuration or from the SAN?
> 
> Regards.

Hi Daniel,

Diagnosing and solving GFS2 performance problems is a long and complex
thing. Many things can be causing the slowdown, but there's usually a
bottleneck or two that need to be identified and solved. We have a
lot of tools in our arsenal to do this, so it's not something that can
be easily explained. There are several articles on the Red Hat Customer
Portal if you're a customer. For example: 

https://access.redhat.com/articles/628093

If you don't have access to the portal, you can try to determine your
bottleneck by performing some tests yourself, such as:

1. Use filefrag to see if your file system is severely fragmented.
2. Raw speed of the device without any file system (as Steve Whitehouse suggested)
3. Test the throughput of the same device using a different file system.
4. Test the network throughput
5. DLM throughput (via dlm_klock)
6. See if you're running out of memory and swapping (via top)
7. See if there is GFS2 glock contention (via glocktop)
   http://people.redhat.com/rpeterso/Experimental/*/glocktop
   Source code in:
   http://people.redhat.com/rpeterso/Tools/
8. NUMA related problems.
   There's a good recent two-hour talk about performance tuning and NUMA
   from this year's Red Hat Summit: https://www.youtube.com/watch?v=ckarvGJE8Qc
9. Slowdowns due to small block sizes (4K is recommended)
10. Slowdowns due to journals being too small (128MB is recommended)
11. Slowdowns due to resource groups being too big (avoid 2GB rgrps for now).
12. Slowdowns due to backups pushing all the dlm lock masters to one node.
   (We have specific backup recommendations to avoid this. Or you can unmount
   after doing a backup).
13. Check how many glocks are in slab (slabtop and such)
14. Check if you're CPU bound ("perf top -g" and such)

I'd like to add that Red Hat has done a tremendous amount of work to speed
up GFS2 in the past couple years. We drastically reduced fragmentation. We've
added an Orlov block allocator. We've done a lot of things and newer is better.
Newer versions of RHEL6 are going to be faster than older RHEL6. Older RHEL6
is going to be faster than RHEL5. RHEL7 is faster than RHEL6, and so on.
That's because we tend to focus our development efforts on the newest release
and don't often port performance improvements back to older releases. But
like many things, GFS2 is only as strong as its weakest link, so you need to
identify what the weakest link is. I hope this helps.

Regards,

Bob Peterson
Red Hat File Systems



From nagemnna at gmail.com  Wed Jul  1 12:58:40 2015
From: nagemnna at gmail.com (Megan .)
Date: Wed, 1 Jul 2015 08:58:40 -0400
Subject: [Linux-cluster] new cluster setup error
In-Reply-To: <CACMA5-yCPMc==FQXDBHnf=r6FSnHTs-VZOtULB9D=Qyjj5QKLA@mail.gmail.com>
References: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
	<5593568E.2020708@alteeve.ca>
	<CACMA5-xbBU1PUEi_c-E0U4Lk9hOtkUYpmyarWVd1V257eL6Vmg@mail.gmail.com>
	<CACMA5-yCPMc==FQXDBHnf=r6FSnHTs-VZOtULB9D=Qyjj5QKLA@mail.gmail.com>
Message-ID: <CACMA5-w7=Mx6y7LM=QkNvL7uT1dtU5D-Jo2GjpR6AqzNTK1k3Q@mail.gmail.com>

Got it.  didn't have the correct perms on /tmp


On Wed, Jul 1, 2015 at 8:35 AM, Megan . <nagemnna at gmail.com> wrote:

> I started ricci in debug mode and I get the below error.  Any idea where
> its trying to open a temp file?  as far as i can see everything in
> /var/lib/ricci is good
>
> [root at admin1-dit init.d]# ricci -u ricci -df
>
> failed to load authorized CAs
>
> failed to load authorized CAs
>
> client added
>
> ClientInstance.cpp:145: exception: unable to open temp file
>
> request completed in 8 milliseconds
>
> client removed
>
>
> On Wed, Jul 1, 2015 at 6:53 AM, Megan . <nagemnna at gmail.com> wrote:
>
>> Thanks for your help.  I shutdown iptables, turned down selinux for now,
>> ricci is up, modclusterd is up.  Still get the same error as before.  Right
>> now i do not have a cluster.conf but if i put one that i generate manually
>> there i still get the same error.
>>
>> [root at admin1-dit ~]# getenforce
>>
>> Permissive
>>
>> [root at admin1-dit ~]# service iptables status
>>
>> iptables: Firewall is not running.
>>
>> [root at admin1-dit ~]# pstree -paul | grep ricci
>>
>>   |-ricci,6779,ricci -u ricci
>>
>>   |           |-grep,6853 ricci
>>
>> [root at admin1-dit ~]# pstree -paul | grep mod
>>
>>   |-modclusterd,6815
>>
>>   |   |-{modclusterd},6816
>>
>>   |   `-{modclusterd},6817
>>
>>   |           |-grep,6855 mod
>>
>> [root at admin1-dit ~]#
>>
>> On Tue, Jun 30, 2015 at 10:55 PM, Digimer <lists at alteeve.ca> wrote:
>>
>>> On 30/06/15 10:51 PM, Megan . wrote:
>>> > Good Evening!
>>> >
>>> > Anyone seen this before?  I just setup these boxes and i'm trying to
>>> > create a new cluster.  I set the ricci password on all of the nodes,
>>> > started ricci.  I try to create cluster and i get the below.
>>> >
>>> > Thanks!
>>> >
>>> >
>>> > Centos 6.6
>>> >  2.6.32-504.23.4.el6.x86_64
>>> >
>>> > ccs-0.16.2-75.el6_6.2.x86_64
>>> > ricci-0.16.2-75.el6_6.1.x86_64
>>> > cman-3.0.12.1-68.el6.x86_64
>>> >
>>> > [root at admin1-dit cluster]# ccs --createcluster test
>>> >
>>> > Traceback (most recent call last):
>>> >   File "/usr/sbin/ccs", line 2450, in <module>
>>> >     main(sys.argv[1:])
>>> >   File "/usr/sbin/ccs", line 286, in main
>>> >     if (createcluster): create_cluster(clustername)
>>> >   File "/usr/sbin/ccs", line 939, in create_cluster
>>> >     elif get_cluster_conf_xml() != f.read():
>>> >   File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml
>>> >     xml = send_ricci_command("cluster", "get_cluster.conf")
>>> >   File "/usr/sbin/ccs", line 2340, in send_ricci_command
>>> >     dom = minidom.parseString(res[1].replace('\t',''))
>>> >   File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in
>>> parseString
>>> >     return expatbuilder.parseString(string)
>>> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
>>> > parseString
>>> >     return builder.parseString(string)
>>> >   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
>>> > parseString
>>> >     parser.Parse(string, True)
>>> > xml.parsers.expat.ExpatError: no element found: line 1, column 0
>>>
>>> Are the ricci and modclusterd daemons running? Does your firewall allow
>>> TCP ports 11111 and 16851 between nodes? Does the file
>>> /etc/cluster/cluster.conf exist and, if so, does 'ls -lahZ' show:
>>>
>>> -rw-r-----. root root system_u:object_r:cluster_conf_t:s0
>>> /etc/cluster/cluster.conf
>>>
>>> --
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without
>>> access to education?
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/7355fca6/attachment.htm>

From jpokorny at redhat.com  Wed Jul  1 18:41:13 2015
From: jpokorny at redhat.com (Jan =?utf-8?Q?Pokorn=C3=BD?=)
Date: Wed, 1 Jul 2015 20:41:13 +0200
Subject: [Linux-cluster] new cluster setup error
In-Reply-To: <CACMA5-w7=Mx6y7LM=QkNvL7uT1dtU5D-Jo2GjpR6AqzNTK1k3Q@mail.gmail.com>
References: <CACMA5-zt9T7VnF8+i5okvrZj7DRrFVSBtyqtvSONG7OnkDJMtw@mail.gmail.com>
	<5593568E.2020708@alteeve.ca>
	<CACMA5-xbBU1PUEi_c-E0U4Lk9hOtkUYpmyarWVd1V257eL6Vmg@mail.gmail.com>
	<CACMA5-yCPMc==FQXDBHnf=r6FSnHTs-VZOtULB9D=Qyjj5QKLA@mail.gmail.com>
	<CACMA5-w7=Mx6y7LM=QkNvL7uT1dtU5D-Jo2GjpR6AqzNTK1k3Q@mail.gmail.com>
Message-ID: <20150701184113.GC16450@redhat.com>

Hello Megan once again,

On 01/07/15 08:58 -0400, Megan . wrote:
> Got it.  didn't have the correct perms on /tmp

glad to hear you've found out the cause.

> On Wed, Jul 1, 2015 at 8:35 AM, Megan . <nagemnna at gmail.com> wrote:
>> On Wed, Jul 1, 2015 at 6:53 AM, Megan . <nagemnna at gmail.com> wrote:
>>> On Tue, Jun 30, 2015 at 10:55 PM, Digimer <lists at alteeve.ca> wrote:
>>>> On 30/06/15 10:51 PM, Megan . wrote:
>>>>> Anyone seen this before?  I just setup these boxes and i'm trying to
>>>>> create a new cluster.  I set the ricci password on all of the nodes,
>>>>> started ricci.  I try to create cluster and i get the below.
>>>>> [root at admin1-dit cluster]# ccs --createcluster test

Best idea in general is to attempt running the failing command with
increased/debug verbosity, with logging enabled and the like.

For ccs, it is as easy as adding "-d" or "--debug" option.
Admittedly, the traceback provides good hints in this case.

>>>>> Traceback (most recent call last):
>>>>>   File "/usr/sbin/ccs", line 2450, in <module>
>>>>>     main(sys.argv[1:])
>>>>>   File "/usr/sbin/ccs", line 286, in main
>>>>>     if (createcluster): create_cluster(clustername)
>>>>>   File "/usr/sbin/ccs", line 939, in create_cluster
>>>>>     elif get_cluster_conf_xml() != f.read():
>>>>>   File "/usr/sbin/ccs", line 884, in get_cluster_conf_xml
>>>>>     xml = send_ricci_command("cluster", "get_cluster.conf")
>>>>>   File "/usr/sbin/ccs", line 2340, in send_ricci_command
>>>>>     dom = minidom.parseString(res[1].replace('\t',''))
>>>>>   File "/usr/lib64/python2.6/xml/dom/minidom.py", line 1928, in
>>>> parseString
>>>>>     return expatbuilder.parseString(string)
>>>>>   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 940, in
>>>>> parseString
>>>>>     return builder.parseString(string)
>>>>>   File "/usr/lib64/python2.6/xml/dom/expatbuilder.py", line 223, in
>>>>> parseString
>>>>>     parser.Parse(string, True)
>>>>> xml.parsers.expat.ExpatError: no element found: line 1, column 0

I've filed a bug to avoid unnecessary dying with a traceback:
    https://bugzilla.redhat.com/show_bug.cgi?id=1238392
The probability of being worked on is low at this point, though.

-- 
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150701/7fd449d4/attachment.sig>

From bfields at fieldses.org  Wed Jul  8 18:15:30 2015
From: bfields at fieldses.org (J. Bruce Fields)
Date: Wed, 8 Jul 2015 14:15:30 -0400
Subject: [Linux-cluster] R:  nfs cluster,
 problem with delete file in the failover case
In-Reply-To: <8467be50c20c996d71b0412b6e8a9677.squirrel@webmail.unipd.it>
References: <20150512152517.GB6370@fieldses.org>
	<20150513111531.3CFB11F31@mydoom.unipd.it>
	<20150513154519.GA2070@fieldses.org>
	<loom.20150513T213656-587@post.gmane.org>
	<20150521180142.GA29163@fieldses.org>
	<8467be50c20c996d71b0412b6e8a9677.squirrel@webmail.unipd.it>
Message-ID: <20150708181530.GA19084@fieldses.org>

On Thu, May 21, 2015 at 09:05:36PM +0200, gianpietro.sella at unipd.it wrote:
> > On Wed, May 13, 2015 at 07:38:03PM +0000, gianpietro sella wrote:
> >> J. Bruce Fields <bfields <at> fieldses.org> writes:
> >>
> >> >
> >> > On Wed, May 13, 2015 at 01:06:17PM +0200, sella gianpietro wrote:
> >> > > this is the inodes number in the exported folder of the volume
> >> > > in the server before write file in the client:
> >> > >
> >> > > [root <at> cld-blu-13 nova]# du --inodes
> >> > > 2       .
> >> > >
> >> > > this is the used block:
> >> > >
> >> > > [root <at> cld-blu-13 nova]# df -T
> >> > > Filesystem                            Type      1K-blocks    Used
> >> Available
> >> > > Use% Mounted on
> >> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588   33000
> >> 1152845588
> >> > > 1% /nfscluster
> >> > >
> >> > > after write file in the client with umount/mount during writing:
> >> > >
> >> > > [root <at> cld-blu-13 nova]# du --inodes
> >> > > 3       .
> >> > >
> >> > > [root <at> cld-blu-13 nova]# df -T
> >> > > Filesystem                            Type      1K-blocks     Used
> >> > > Available Use% Mounted on
> >> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> >> > > 1131874068   2% /nfscluster
> >> > >
> >> > > thi is correct.
> >> > > now delete file:
> >> > >
> >> > > [root <at> cld-blu-13 nova]# du --inodes
> >> > > 2       .
> >> > >
> >> > > the number of the inodes is correct (from 3 to 2).
> >> > >
> >> > > [root <at> cld-blu-13 nova]# df -T
> >> > > Filesystem                            Type      1K-blocks     Used
> >> > > Available Use% Mounted on
> >> > > /dev/mapper/nfsclustervg-nfsclusterlv xfs      1152878588 21004520
> >> > > 1131874068   2% /nfscluster
> >> > >
> >> > > the number of used block is not correct.
> >> > > Do not return to initial value 33000
> >> >
> >> > If you try "df -i", you'll probably also find that it gives the
> >> "wrong"
> >> > result.  (So, probably 3 inodes, though "du --inodes" is still only
> >> > finding 2).
> >> >
> >> > --b.
> >> >
> >>
> >>
> >> the problem is that after delete file the inode go in the orphaned
> >> state:
> >
> > Yeah, that's consistent with everything else--we're not removing a
> > dentry when we should for some reason, so the inode's staying
> > referenced.
> >
> > --b.
> >
> 
> tanks Bruce.
> yes this is true.
> I use nfs cluster on 2 node for nova instances in openstack (the instamces
> are stored on nfs folder).
> the probability that I create an file before an failover and then I delete
> the file file after failover is very little.
> In this case I can execute an "mount -o remount" after the failover and
> delete command and the orpahned inode is deleted and the free disk space
> is ok.
> I do not understand who use the file after failover and delete command.
> After I delete the file I do not see process that use the deleted file.
> this is very strange.
> But my is just an curiosity.
> I think that the cause is the unmount operation on the failover node.

Apologies for the delayed response.

In the process of debugging something else we ran into a problem that
looks like yours.  Is it possible for you to test a kernel patch?

If so, it would be interesting to know if your problem is still
reproduceable after this patch:

	http://marc.info/?l=linux-fsdevel&m=143631977822355&w=2

also appended.

--b.
diff --git a/fs/dcache.c b/fs/dcache.c
index 7a3f3e5..5c8ea15 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -642,7 +642,7 @@ static inline bool fast_dput(struct dentry *dentry)
 
 	/*
 	 * If we have a d_op->d_delete() operation, we sould not
-	 * let the dentry count go to zero, so use "put__or_lock".
+	 * let the dentry count go to zero, so use "put_or_lock".
 	 */
 	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE))
 		return lockref_put_or_lock(&dentry->d_lockref);
@@ -697,7 +697,7 @@ static inline bool fast_dput(struct dentry *dentry)
 	 */
 	smp_rmb();
 	d_flags = ACCESS_ONCE(dentry->d_flags);
-	d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST;
+	d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED;
 
 	/* Nothing to do? Dropping the reference was all we needed? */
 	if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry))
@@ -776,6 +776,9 @@ repeat:
 	if (unlikely(d_unhashed(dentry)))
 		goto kill_it;
 
+	if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
+		goto kill_it;
+
 	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
 		if (dentry->d_op->d_delete(dentry))
 			goto kill_it;