[Linux-cluster] Linux-cluster Digest, Vol 92, Issue 19

Wed Dec 28 03:42:58 UTC 2011

Hi  Yevheniy,

I am interested in applying this patch for my 2 node clustered configured in
RHEL 6.2 with CLVMD + GFS2 + CTDB + CMAN. Can you please guide me on how to
download and apply this patch for the said environment. 

Thanks

Sathya Narayanan V
Solution Architect	
M +91 9940680173 |T +91 44 42199500  | Service Desk +91 44 42199521
SERVICE - In PRECISION IT is a PASSION
----------------------------------------------------------------------------
-----------------------------
Precision Infomatic (M) Pvt Ltd
22, 1st Floor, Habibullah Road, T. Nagar, Chennai - 600 017. India.
www.precisionit.co.in

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Tuesday, December 27, 2011 10:30 PM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 92, Issue 19

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. [PATCH] dlm: faster dlm recovery (Yevheniy Demchenko)
   2. Re: Corosync memory problem (Steven Dake)


----------------------------------------------------------------------

Message: 1
Date: Mon, 26 Dec 2011 23:52:29 +0100
From: Yevheniy Demchenko <zheka at uvt.cz>
To: linux-cluster at redhat.com
Subject: [Linux-cluster] [PATCH] dlm: faster dlm recovery
Message-ID: <4EF8FAAD.50504 at uvt.cz>
Content-Type: text/plain; charset=ISO-8859-1

Avoid running find_rsb_root by storing last recovered rsb address for each
node.
Makes dlm recovery much faster for FS with large number of files.

Signed-off-by: Yevheniy Demchenko <zheka at uvt.cz>
---
Current dlm recovery uses small (4096 bytes) buffer to communicate between
dlm_copy_master_names and dlm_directory_recovery. This leads to running
find_rsb_root
N*32/4096 times, where N - number of locks to recover and 32 -
DLM_RESNAME_MAXLEN+1.
find_rsb_root itself takes N*c to complete, where c is some constant.
Eventually, dlm recovery
time is proportional to N*N. For an ocfs2 fs with one directory consisting
of 300000 small files every mount on other node takes more than 2.5 minutes
and umount more than 5 minutes on a fairly modern HW with 10Gb interconnect.
During dlm recovery FS is not available on any node.
 This patch makes mounts and umounts on non-locking-master nodes to take
less than a 2 seconds.
It is not limited to ocfs2 and might make dlm recovery faster in general
(i.e. for gfs2).

Test case:
2 node RHCS cluster, OCFS2 with cman cluster stack.
/sys/kernel/config/dlm/cluster/{lkbtbl_size,dirtbl_size,rsbtbl_size} =
16384 on both nodes

On node 1:
#mkfs.ocfs2
--fs-features=backup-super,sparse,inline-data,extended-slotmap,indexed-dirs,
refcount,xattr,usrquota,grpquota,unwritten
/dev/vg1/test1
#mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime #mkdir /mnt/temp/test1
#for i in $(seq 1 300000) ; do dd if=/dev/urandom bs=4096 count=1
of=/mnt/temp/test1/$i ; done #umount /mnt/temp  #-----leave dlm and destroy
locks #mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime
#time (ls -l /mnt/temp/test1 | wc -l )    #-------create 300000 RR locks
on node 1

On node 2:
#mount /dev/vg1/test1 /mnt/temp -o noatime,nodiratime #--- dlm recovery
starts and takes a looooong time if dlm is not patched #umount /mnt/temp
#----- even looooooonger, FS is not available on any node while recovery is
running After patching, both operations on node2 take less than a 2 seconds.

For now, patch tries to detect inconsistences and reverts to the previous
behaviour if there are any.
These tests can be dropped together with find_rsb_root and some excessive
code in the future.


diff -uNr vanilla/fs/dlm/dir.c v1.0/fs/dlm/dir.c

--- vanilla/fs/dlm/dir.c        2011-09-29 15:29:00.000000000 +0200
+++ v1.0/fs/dlm/dir.c   2011-12-26 22:00:21.068403493 +0100
@@ -196,6 +196,16 @@
        }
 }
 
+static int nodeid2index (struct dlm_ls *ls, int nodeid) {
+       int i;
+       for (i = 0; i < ls->ls_num_nodes ; i++) {
+               if (ls->ls_node_array[i] == nodeid)
+                       return (i);
+       }
+       log_debug(ls, "index not found for nodeid %d", nodeid);
+       return (-1);
+}
+
 int dlm_recover_directory(struct dlm_ls *ls)  {
        struct dlm_member *memb;
@@ -375,11 +385,28 @@
        struct dlm_rsb *r;
        int offset = 0, dir_nodeid;
        __be16 be_namelen;
+       int index;
 
        down_read(&ls->ls_root_sem);
 
+       index = nodeid2index(ls, nodeid);
+
        if (inlen > 1) {
-               r = find_rsb_root(ls, inbuf, inlen);
+               if ((index > -1) && (ls->ls_recover_last_rsb[index])) {
+                       if (inlen ==
ls->ls_recover_last_rsb[index]->res_length &&
+                           !memcmp(inbuf,
ls->ls_recover_last_rsb[index]->res_name, inlen)) {
+                               r = ls->ls_recover_last_rsb[index];
+                       } else {
+                               /* This should never happen! */
+                               log_error(ls, "copy_master_names: rsb
cache failed 1: node %d: cached rsb %1.31s, needed rsb %1.31s;", nodeid,
+                                        
ls->ls_recover_last_rsb[index]->res_name, inbuf);
+                               r = find_rsb_root(ls, inbuf, inlen);
+                       }
+               } else {
+                       /* Left for safety reasons, we should never get
here */
+                       r = find_rsb_root(ls, inbuf, inlen);
+                       log_error(ls, "copy_master_names: rsb cache
failed 2: ,searching for %1.31s, node %d", inbuf, nodeid);
+               }
                if (!r) {
                        inbuf[inlen - 1] = '\0';
                        log_error(ls, "copy_master_names from %d start %d
%s", @@ -421,6 +448,7 @@
                offset += sizeof(__be16);
                memcpy(outbuf + offset, r->res_name, r->res_length);
                offset += r->res_length;
+               ls->ls_recover_last_rsb[index] = r;
        }
 
        /*
diff -uNr vanilla/fs/dlm/dlm_internal.h v1.0/fs/dlm/dlm_internal.h
--- vanilla/fs/dlm/dlm_internal.h       2011-09-29 15:32:00.000000000 +0200
+++ v1.0/fs/dlm/dlm_internal.h  2011-12-22 23:51:00.000000000 +0100
@@ -526,6 +526,7 @@
        int                     ls_recover_list_count;
        wait_queue_head_t       ls_wait_general;
        struct mutex            ls_clear_proc_locks;
+       struct dlm_rsb          **ls_recover_last_rsb;
 
        struct list_head        ls_root_list;   /* root resources */
        struct rw_semaphore     ls_root_sem;    /* protect root_list */
diff -uNr vanilla/fs/dlm/member.c v1.0/fs/dlm/member.c
--- vanilla/fs/dlm/member.c     2011-09-29 15:29:00.000000000 +0200
+++ v1.0/fs/dlm/member.c        2011-12-23 19:55:00.000000000 +0100
@@ -128,6 +128,9 @@
 
        kfree(ls->ls_node_array);
        ls->ls_node_array = NULL;
+
+       kfree(ls->ls_recover_last_rsb);
+       ls->ls_recover_last_rsb = NULL;
 
        list_for_each_entry(memb, &ls->ls_nodes, list) {
                if (memb->weight)
@@ -146,6 +149,11 @@
        array = kmalloc(sizeof(int) * total, GFP_NOFS);
        if (!array)
                return;
+
+       ls->ls_recover_last_rsb = kcalloc(ls->ls_num_nodes+1,
sizeof(struct dlm_rsb *), GFP_NOFS);
+
+       if (!ls->ls_recover_last_rsb)
+               return;
 
        list_for_each_entry(memb, &ls->ls_nodes, list) {
                if (!all_zero && !memb->weight)

--
Ing. Yevheniy Demchenko
Senior Linux Administrator
UVT s.r.o. 



------------------------------

Message: 2
Date: Tue, 27 Dec 2011 09:00:25 -0700
From: Steven Dake <sdake at redhat.com>
To: linux clustering <linux-cluster at redhat.com>
Subject: Re: [Linux-cluster] Corosync memory problem
Message-ID: <4EF9EB99.1050001 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1

On 12/21/2011 11:04 AM, Chris Alexander wrote:
> An update in case anyone ever runs into something like this - we had
> corosync-notify running on the servers and once we removed that and
> restarted the cluster stack, corosync seemed to return to normal.
> 
> Additionally, according to the corosync mailing list, the cluster 1.2.3
> version is basically very similar to (if not the same as) the 1.4 that
> they currently have released, someone's been backporting.
> 

The upstream 1.2.3 version hasn't had any backports applied to it.  Only
the RHEL 1.2.3-z versions have been backported.

Regards
-steve

> Cheers
> 
> Chris
> 
> On 19 December 2011 19:01, Chris Alexander <chris.alexander at kusiri.com
> <mailto:chris.alexander at kusiri.com>> wrote:
> 
>     Hi all,
> 
>     You may remember our recent issue, I believe this is being worsened
>     if not caused by another problem we have encountered.
> 
>     Every few days our nodes are (non-simultaneously) being fenced due
>     to corosync taking up vast amounts of memory (i.e. 100% of the box).
>     Please see a sample log message, we have several just like this, [1]
>     which occurs when this happens. Note that it is not always corosync
>     being killed - but it is clearly corosync eating all the memory (see
>     top output from three servers at various times since their last
>     reboot, [2] [3] [4]).
> 
>     The corosync version is 1.2.3:
>     [g at cluster1 ~]$ corosync -v
>     Corosync Cluster Engine, version '1.2.3'
>     Copyright (c) 2006-2009 Red Hat, Inc.
> 
>     We had a bit of a dig around and there are a significant number of
>     bugfix updates which address various segfaults, crashes, memory
>     leaks etc. in this minor as well as subsequent minor versions. [5] [6]
> 
>     We're trialling the Fedora 14 (fc14) RPMs for corosync and
>     corosynclib (v1.4.2) to see if it fixes the particular issue we are
>     seeing (i.e. whether or not the memory keeps spiralling way out of
>     control).
> 
>     Has anyone else seen an issue like this, and is there any known way
>     to debug or fix it? If we can assist debugging by providing further
>     information, please specify what this is (and, if non-obvious, how
>     to get it).
> 
>     Thanks again for your help
> 
>     Chris
> 
>     [1] http://pastebin.com/CbyERaRT
>     [2] http://pastebin.com/uk9ZGL7H
>     [3] http://pastebin.com/H4w5Zg46
>     [4] http://pastebin.com/KPZxL6UB
>     [5] http://rhn.redhat.com/errata/RHBA-2011-1361.html
>     [6] http://rhn.redhat.com/errata/RHBA-2011-1515.html
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 92, Issue 19
*********************************************

This communication may contain confidential information. 
If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication.. 
Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email. 
Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group. 
All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use.