From mgrac at redhat.com Mon Apr 6 17:27:12 2015 From: mgrac at redhat.com (Marek "marx" Grac) Date: Mon, 06 Apr 2015 19:27:12 +0200 Subject: [Linux-cluster] fence-agents-4.0.17 stable release Message-ID: <5522C1F0.4010401@redhat.com> Welcome to the fence-agents 4.0.17 release This release includes several bugfixes and features: * HP iLO2 with firmware 2.27 has broken implementation of TLS negotation and SSLv3 is disabled by default (POODLE attack). Options --tls1.0 (tls1.0 on stdin) was added to force using TLS v1.0. This options allows users to use that firmware with fence agents. * Fence agent for AMT password was not put correctly into environment. * Fix login process on bladecenter where 'last login' can occur in message of the day what mislead fence agent. * Cipher for fence_ipmilan was previously set to 0. It was found out that this not good default value, we will use default value (3) of ipmitool instead. Git repository can be found at https://github.com/ClusterLabs/fence-agents/ The new source tarball can be downloaded here: https://github.com/ClusterLabs/fence-agents/archive/v4.0.17.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this great milestone. m, From anprice at redhat.com Tue Apr 7 17:03:41 2015 From: anprice at redhat.com (Andrew Price) Date: Tue, 07 Apr 2015 18:03:41 +0100 Subject: [Linux-cluster] gfs2-utils 3.1.8 released Message-ID: <55240DED.5010608@redhat.com> Hi, I am happy to announce the 3.1.8 release of gfs2-utils. This release includes the following visible changes: * Performance improvements in fsck.gfs2, mkfs.gfs2 and gfs2_edit savemeta. * Better checking of journals, the jindex, system inodes and inode 'goal' values in fsck.gfs2 * gfs2_jadd and gfs2_grow are now separate programs instead of symlinks to mkfs.gfs2. * Improved test suite and related documentation. * No longer clobbers the configure script's --sbindir option. * No longer depends on perl. * Various minor bug fixes and enhancements. See below for a complete list of changes. The source tarball is available from: https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.8.tar.gz Please test, and report bugs against the gfs2-utils component of Fedora rawhide: https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=gfs2-utils&version=rawhide Regards, Andy Changes since version 3.1.7: Abhi Das (6): fsck.gfs2: fix broken i_goal values in inodes gfs2_convert: use correct i_goal values instead of zeros for inodes tests: test for incorrect inode i_goal values mkfs.gfs2: addendum to fix broken i_goal values in inodes gfs2_utils: more gfs2_convert i_goal fixes gfs2-utils: more fsck.gfs2 i_goal fixes Andrew Price (58): gfs2-utils tests: Build unit tests with consistent cpp flags libgfs2: Move old rgrp layout functions into fsck.gfs2 gfs2-utils build: Add test coverage option fsck.gfs2: Fix memory leak in pass2 gfs2_convert: Fix potential memory leaks in adjust_inode gfs2_edit: Fix signed value used as array index in print_ld_blks gfs2_edit: Set umask before calling mkstemp in savemetaopen() gfs2_edit: Fix use-after-free in find_wrap_pt libgfs2: Clean up broken rgrp length check libgfs2: Remove superfluous NULL check from gfs2_rgrp_free libgfs2: Fail fd comparison if the fds are negative libgfs2: Fix check for O_RDONLY fsck.gfs2: Remove dead code from scan_inode_list mkfs.gfs2: Terminate lockproto and locktable strings explicitly libgfs2: Add generic field assignment and print functions gfs2_edit: Use metadata description to print and assign fields gfs2l: Switch to lgfs2_field_assign libgfs2: Remove device_name from struct gfs2_sbd libgfs2: Remove path_name from struct gfs2_sbd libgfs2: metafs_path improvements gfs2_grow: Don't use PATH_MAX in main_grow gfs2_jadd: Don't use fixed size buffers for paths libgfs2: Remove orig_journals from struct gfs2_sbd gfs2l: Check unchecked returns in openfs gfs2-utils configure: Fix exit with failure condition gfs2-utils configure: Remove checks for non-existent -W flags gfs2_convert: Don't use a fixed sized buffer for device path gfs2_edit: Add bounds checking for the journalN keyword libgfs2: Make find_good_lh and jhead_scan static Build gfs2_grow, gfs2_jadd and mkfs.gfs2 separately gfs2-utils: Honour --sbindir gfs2-utils configure: Use AC_HELP_STRING in help messages fsck.gfs2: Improve reporting of pass timings mkfs.gfs2: Revert default resource group size gfs2-utils tests: Add keywords to tests gfs2-utils tests: Shorten TESTSUITEFLAGS to TOPTS gfs2-utils tests: Improve docs gfs2-utils tests: Skip unit tests if check is not found gfs2-utils tests: Document usage of convenience macros fsck.gfs2: Fix 'initializer element is not constant' build error fsck.gfs2: Simplify bad_journalname gfs2-utils build: Add a configure script summary mkfs.gfs2: Remove unused declarations gfs2-utils/tests: Fix unit tests for older check libraries fsck.gfs2: Fix memory leaks in pass1_process_rgrp libgfs2: Use the correct parent for rgrp tree insertion libgfs2: Remove some obsolete function declarations gfs2-utils: Move metafs handling into gfs2/mkfs/ gfs2_grow/jadd: Use a matching context mount option in mount_gfs2_meta gfs2_edit savemeta: Don't read rgrps twice fsck.gfs2: Fetch directory inodes early in pass2() libgfs2: Remove some unused data structures gfs2-utils: Tidy up Makefile.am files gfs2-utils build: Remove superfluous passive header checks gfs2-utils: Consolidate some "bad constants" strings gfs2-utils: Update translation template libgfs2: Fix potential NULL deref in linked_leaf_search() gfs2_grow: Put back the definition of FALLOC_FL_KEEP_SIZE Bob Peterson (15): fsck.gfs2: Detect and correct corrupt journals fsck.gfs2: Change basic dentry checks for too long of file names fsck.gfs2: Print out block number when pass3 finds a bad directory fsck.gfs2: Adjust when hash table is doubled fsck.gfs2: Revise "undo" processing fsck.gfs2: remove duplicate designation during undo fsck.gfs2: Fix a use-after-free in pass2 fsck.gfs2: fix double-free bug fsck.gfs2: Reprocess nodes if anything changed fsck.gfs2: Rebuild system files if they don't have the SYS bit set fsck.gfs2: Check the integrity of the journal index fsck.gfs2: rgrp block count reform fsck.gfs2: Change block_map to match bitmap fsck.gfs2: Fix journal sequence number reporting problem fsck.gfs2: Fix coverity error in pass4.c From daniel.dehennin at baby-gnu.org Wed Apr 1 12:47:30 2015 From: daniel.dehennin at baby-gnu.org (Daniel Dehennin) Date: Wed, 01 Apr 2015 14:47:30 +0200 Subject: [Linux-cluster] [ClusterLabs] dlm_controld and fencing issue Message-ID: <87h9srlv48.fsf@hati.baby-gnu.org> Hello, On a 4 nodes OpenNebula cluster, running Ubuntu Trusty 14.04.2, with: - corosync 2.3.3-1ubuntu1 - pacemaker 1.1.10+git20130802-1ubuntu2.3 - dlm 4.0.1-0ubuntu1 Here is the node list with their IDs, to follow the logs: - 1084811137 nebula1 - 1084811138 nebula2 - 1084811139 nebula3 - 1084811140 nebula4 (the actual DC) I have an issue where fencing is working but dlm always wait for fencing, I needed to manually run ?dlm_tool fence_ack 1084811138? this morning, here are the logs: Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759 Apr 1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138 Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759 Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759 Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor [...] Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing The stonith actually worked: Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)' Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0) Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list Apr 1 01:29:32 nebula4 stonith-ng[6486]: notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490 at nebula4.39eaf3a2: OK I attache the logs of the DC nebula4 around from 01:29:03, where everything worked fine (Got 4 replies, expecting: 4) to a little bit after. To me, it looks like: - dlm ask for fencing directly at 01:29:29, the node was fenced since it had garbage in its /var/log/syslog exactely at 01:29.29, plus its uptime, but did not get a good response - pacemaker fence nebula2 at 01:29:30 because it's not part of the cluster anymore (since 01:29:26 [TOTEM ] ... Members left: 1084811138) This fencing works. Do you have any idea? Regards. -- Daniel Dehennin R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: nebula2-down-2015-01-04.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 342 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: not available URL: From lists at alteeve.ca Wed Apr 8 00:45:13 2015 From: lists at alteeve.ca (Digimer) Date: Tue, 07 Apr 2015 20:45:13 -0400 Subject: [Linux-cluster] gfs2-utils 3.1.8 released In-Reply-To: <55240DED.5010608@redhat.com> References: <55240DED.5010608@redhat.com> Message-ID: <55247A19.1000206@alteeve.ca> Hi Andrew, Congrats!! Want to add the cluster labs mailing list to your list of release announcement locations? digimer On 07/04/15 01:03 PM, Andrew Price wrote: > Hi, > > I am happy to announce the 3.1.8 release of gfs2-utils. This release > includes the following visible changes: > > * Performance improvements in fsck.gfs2, mkfs.gfs2 and gfs2_edit > savemeta. > * Better checking of journals, the jindex, system inodes and inode > 'goal' values in fsck.gfs2 > * gfs2_jadd and gfs2_grow are now separate programs instead of > symlinks to mkfs.gfs2. > * Improved test suite and related documentation. > * No longer clobbers the configure script's --sbindir option. > * No longer depends on perl. > * Various minor bug fixes and enhancements. > > See below for a complete list of changes. The source tarball is > available from: > https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.8.tar.gz > > Please test, and report bugs against the gfs2-utils component of Fedora > rawhide: > > https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=gfs2-utils&version=rawhide > > > Regards, > Andy > > Changes since version 3.1.7: > > Abhi Das (6): > fsck.gfs2: fix broken i_goal values in inodes > gfs2_convert: use correct i_goal values instead of zeros for inodes > tests: test for incorrect inode i_goal values > mkfs.gfs2: addendum to fix broken i_goal values in inodes > gfs2_utils: more gfs2_convert i_goal fixes > gfs2-utils: more fsck.gfs2 i_goal fixes > > Andrew Price (58): > gfs2-utils tests: Build unit tests with consistent cpp flags > libgfs2: Move old rgrp layout functions into fsck.gfs2 > gfs2-utils build: Add test coverage option > fsck.gfs2: Fix memory leak in pass2 > gfs2_convert: Fix potential memory leaks in adjust_inode > gfs2_edit: Fix signed value used as array index in print_ld_blks > gfs2_edit: Set umask before calling mkstemp in savemetaopen() > gfs2_edit: Fix use-after-free in find_wrap_pt > libgfs2: Clean up broken rgrp length check > libgfs2: Remove superfluous NULL check from gfs2_rgrp_free > libgfs2: Fail fd comparison if the fds are negative > libgfs2: Fix check for O_RDONLY > fsck.gfs2: Remove dead code from scan_inode_list > mkfs.gfs2: Terminate lockproto and locktable strings explicitly > libgfs2: Add generic field assignment and print functions > gfs2_edit: Use metadata description to print and assign fields > gfs2l: Switch to lgfs2_field_assign > libgfs2: Remove device_name from struct gfs2_sbd > libgfs2: Remove path_name from struct gfs2_sbd > libgfs2: metafs_path improvements > gfs2_grow: Don't use PATH_MAX in main_grow > gfs2_jadd: Don't use fixed size buffers for paths > libgfs2: Remove orig_journals from struct gfs2_sbd > gfs2l: Check unchecked returns in openfs > gfs2-utils configure: Fix exit with failure condition > gfs2-utils configure: Remove checks for non-existent -W flags > gfs2_convert: Don't use a fixed sized buffer for device path > gfs2_edit: Add bounds checking for the journalN keyword > libgfs2: Make find_good_lh and jhead_scan static > Build gfs2_grow, gfs2_jadd and mkfs.gfs2 separately > gfs2-utils: Honour --sbindir > gfs2-utils configure: Use AC_HELP_STRING in help messages > fsck.gfs2: Improve reporting of pass timings > mkfs.gfs2: Revert default resource group size > gfs2-utils tests: Add keywords to tests > gfs2-utils tests: Shorten TESTSUITEFLAGS to TOPTS > gfs2-utils tests: Improve docs > gfs2-utils tests: Skip unit tests if check is not found > gfs2-utils tests: Document usage of convenience macros > fsck.gfs2: Fix 'initializer element is not constant' build error > fsck.gfs2: Simplify bad_journalname > gfs2-utils build: Add a configure script summary > mkfs.gfs2: Remove unused declarations > gfs2-utils/tests: Fix unit tests for older check libraries > fsck.gfs2: Fix memory leaks in pass1_process_rgrp > libgfs2: Use the correct parent for rgrp tree insertion > libgfs2: Remove some obsolete function declarations > gfs2-utils: Move metafs handling into gfs2/mkfs/ > gfs2_grow/jadd: Use a matching context mount option in > mount_gfs2_meta > gfs2_edit savemeta: Don't read rgrps twice > fsck.gfs2: Fetch directory inodes early in pass2() > libgfs2: Remove some unused data structures > gfs2-utils: Tidy up Makefile.am files > gfs2-utils build: Remove superfluous passive header checks > gfs2-utils: Consolidate some "bad constants" strings > gfs2-utils: Update translation template > libgfs2: Fix potential NULL deref in linked_leaf_search() > gfs2_grow: Put back the definition of FALLOC_FL_KEEP_SIZE > > Bob Peterson (15): > fsck.gfs2: Detect and correct corrupt journals > fsck.gfs2: Change basic dentry checks for too long of file names > fsck.gfs2: Print out block number when pass3 finds a bad directory > fsck.gfs2: Adjust when hash table is doubled > fsck.gfs2: Revise "undo" processing > fsck.gfs2: remove duplicate designation during undo > fsck.gfs2: Fix a use-after-free in pass2 > fsck.gfs2: fix double-free bug > fsck.gfs2: Reprocess nodes if anything changed > fsck.gfs2: Rebuild system files if they don't have the SYS bit set > fsck.gfs2: Check the integrity of the journal index > fsck.gfs2: rgrp block count reform > fsck.gfs2: Change block_map to match bitmap > fsck.gfs2: Fix journal sequence number reporting problem > fsck.gfs2: Fix coverity error in pass4.c > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From anprice at redhat.com Wed Apr 8 02:09:37 2015 From: anprice at redhat.com (Andrew Price) Date: Wed, 08 Apr 2015 03:09:37 +0100 Subject: [Linux-cluster] gfs2-utils 3.1.8 released In-Reply-To: <55247A19.1000206@alteeve.ca> References: <55240DED.5010608@redhat.com> <55247A19.1000206@alteeve.ca> Message-ID: <55248DE1.4040003@redhat.com> On 08/04/15 01:45, Digimer wrote: > Hi Andrew, > > Congrats!! > > Want to add the cluster labs mailing list to your list of release > announcement locations? > > digimer That's a great idea, I will. I haven't subscribed to the Cluster Labs list yet but I'm just about to :) Thanks, Andy > > On 07/04/15 01:03 PM, Andrew Price wrote: >> Hi, >> >> I am happy to announce the 3.1.8 release of gfs2-utils. This release >> includes the following visible changes: >> >> * Performance improvements in fsck.gfs2, mkfs.gfs2 and gfs2_edit >> savemeta. >> * Better checking of journals, the jindex, system inodes and inode >> 'goal' values in fsck.gfs2 >> * gfs2_jadd and gfs2_grow are now separate programs instead of >> symlinks to mkfs.gfs2. >> * Improved test suite and related documentation. >> * No longer clobbers the configure script's --sbindir option. >> * No longer depends on perl. >> * Various minor bug fixes and enhancements. >> >> See below for a complete list of changes. The source tarball is >> available from: >> https://fedorahosted.org/released/gfs2-utils/gfs2-utils-3.1.8.tar.gz >> >> Please test, and report bugs against the gfs2-utils component of Fedora >> rawhide: >> >> https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=gfs2-utils&version=rawhide >> >> >> Regards, >> Andy >> >> Changes since version 3.1.7: >> >> Abhi Das (6): >> fsck.gfs2: fix broken i_goal values in inodes >> gfs2_convert: use correct i_goal values instead of zeros for inodes >> tests: test for incorrect inode i_goal values >> mkfs.gfs2: addendum to fix broken i_goal values in inodes >> gfs2_utils: more gfs2_convert i_goal fixes >> gfs2-utils: more fsck.gfs2 i_goal fixes >> >> Andrew Price (58): >> gfs2-utils tests: Build unit tests with consistent cpp flags >> libgfs2: Move old rgrp layout functions into fsck.gfs2 >> gfs2-utils build: Add test coverage option >> fsck.gfs2: Fix memory leak in pass2 >> gfs2_convert: Fix potential memory leaks in adjust_inode >> gfs2_edit: Fix signed value used as array index in print_ld_blks >> gfs2_edit: Set umask before calling mkstemp in savemetaopen() >> gfs2_edit: Fix use-after-free in find_wrap_pt >> libgfs2: Clean up broken rgrp length check >> libgfs2: Remove superfluous NULL check from gfs2_rgrp_free >> libgfs2: Fail fd comparison if the fds are negative >> libgfs2: Fix check for O_RDONLY >> fsck.gfs2: Remove dead code from scan_inode_list >> mkfs.gfs2: Terminate lockproto and locktable strings explicitly >> libgfs2: Add generic field assignment and print functions >> gfs2_edit: Use metadata description to print and assign fields >> gfs2l: Switch to lgfs2_field_assign >> libgfs2: Remove device_name from struct gfs2_sbd >> libgfs2: Remove path_name from struct gfs2_sbd >> libgfs2: metafs_path improvements >> gfs2_grow: Don't use PATH_MAX in main_grow >> gfs2_jadd: Don't use fixed size buffers for paths >> libgfs2: Remove orig_journals from struct gfs2_sbd >> gfs2l: Check unchecked returns in openfs >> gfs2-utils configure: Fix exit with failure condition >> gfs2-utils configure: Remove checks for non-existent -W flags >> gfs2_convert: Don't use a fixed sized buffer for device path >> gfs2_edit: Add bounds checking for the journalN keyword >> libgfs2: Make find_good_lh and jhead_scan static >> Build gfs2_grow, gfs2_jadd and mkfs.gfs2 separately >> gfs2-utils: Honour --sbindir >> gfs2-utils configure: Use AC_HELP_STRING in help messages >> fsck.gfs2: Improve reporting of pass timings >> mkfs.gfs2: Revert default resource group size >> gfs2-utils tests: Add keywords to tests >> gfs2-utils tests: Shorten TESTSUITEFLAGS to TOPTS >> gfs2-utils tests: Improve docs >> gfs2-utils tests: Skip unit tests if check is not found >> gfs2-utils tests: Document usage of convenience macros >> fsck.gfs2: Fix 'initializer element is not constant' build error >> fsck.gfs2: Simplify bad_journalname >> gfs2-utils build: Add a configure script summary >> mkfs.gfs2: Remove unused declarations >> gfs2-utils/tests: Fix unit tests for older check libraries >> fsck.gfs2: Fix memory leaks in pass1_process_rgrp >> libgfs2: Use the correct parent for rgrp tree insertion >> libgfs2: Remove some obsolete function declarations >> gfs2-utils: Move metafs handling into gfs2/mkfs/ >> gfs2_grow/jadd: Use a matching context mount option in >> mount_gfs2_meta >> gfs2_edit savemeta: Don't read rgrps twice >> fsck.gfs2: Fetch directory inodes early in pass2() >> libgfs2: Remove some unused data structures >> gfs2-utils: Tidy up Makefile.am files >> gfs2-utils build: Remove superfluous passive header checks >> gfs2-utils: Consolidate some "bad constants" strings >> gfs2-utils: Update translation template >> libgfs2: Fix potential NULL deref in linked_leaf_search() >> gfs2_grow: Put back the definition of FALLOC_FL_KEEP_SIZE >> >> Bob Peterson (15): >> fsck.gfs2: Detect and correct corrupt journals >> fsck.gfs2: Change basic dentry checks for too long of file names >> fsck.gfs2: Print out block number when pass3 finds a bad directory >> fsck.gfs2: Adjust when hash table is doubled >> fsck.gfs2: Revise "undo" processing >> fsck.gfs2: remove duplicate designation during undo >> fsck.gfs2: Fix a use-after-free in pass2 >> fsck.gfs2: fix double-free bug >> fsck.gfs2: Reprocess nodes if anything changed >> fsck.gfs2: Rebuild system files if they don't have the SYS bit set >> fsck.gfs2: Check the integrity of the journal index >> fsck.gfs2: rgrp block count reform >> fsck.gfs2: Change block_map to match bitmap >> fsck.gfs2: Fix journal sequence number reporting problem >> fsck.gfs2: Fix coverity error in pass4.c >> > > From andrew at beekhof.net Mon Apr 13 03:19:37 2015 From: andrew at beekhof.net (Andrew Beekhof) Date: Mon, 13 Apr 2015 13:19:37 +1000 Subject: [Linux-cluster] [ClusterLabs] dlm_controld and fencing issue In-Reply-To: <87h9srlv48.fsf@hati.baby-gnu.org> References: <87h9srlv48.fsf@hati.baby-gnu.org> Message-ID: <1A2FDA6D-1295-448C-99D2-F8BF8EC5C5E1@beekhof.net> > On 1 Apr 2015, at 11:47 pm, Daniel Dehennin wrote: > > Hello, > > On a 4 nodes OpenNebula cluster, running Ubuntu Trusty 14.04.2, with: > > - corosync 2.3.3-1ubuntu1 > - pacemaker 1.1.10+git20130802-1ubuntu2.3 > - dlm 4.0.1-0ubuntu1 > > Here is the node list with their IDs, to follow the logs: > > - 1084811137 nebula1 > - 1084811138 nebula2 > - 1084811139 nebula3 > - 1084811140 nebula4 (the actual DC) > > I have an issue where fencing is working but dlm always wait for > fencing, I needed to manually run ?dlm_tool fence_ack 1084811138? this > morning, here are the logs: > > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor > [...] > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing > > > The stonith actually worked: > > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)' > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:32 nebula4 stonith-ng[6486]: notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490 at nebula4.39eaf3a2: OK > > I attache the logs of the DC nebula4 around from 01:29:03, where > everything worked fine (Got 4 replies, expecting: 4) to a little bit > after. > > To me, it looks like: > > - dlm ask for fencing directly at 01:29:29, the node was fenced since it > had garbage in its /var/log/syslog exactely at 01:29.29, plus its > uptime, but did not get a good response > > - pacemaker fence nebula2 at 01:29:30 because it's not part of the > cluster anymore (since 01:29:26 [TOTEM ] ... Members left: 1084811138) > This fencing works. > > Do you have any idea? There were two important fixes in this area since 1.1.10 + David Vossel (1 year, 1 month ago) 054fedf: Fix: stonith_api_time_helper now returns when the most recent fencing operation completed + Andrew Beekhof (1 year, 1 month ago) d9921e5: Fix: Fencing: Pass the correct options when looking up the history by node name Whether 'pacemaker 1.1.10+git20130802-1ubuntu2.3? includes them is anybody?s guess. > > Regards. > -- > Daniel Dehennin > R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF > Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF > > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 31 > Apr 1 01:29:03 nebula4 lvm[6759]: check_all_clvmds_running > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43973, flags=0x0 () > Apr 1 01:29:03 nebula4 lvm[6759]: num_nodes = 4 > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43973 > Apr 1 01:29:03 nebula4 lvm[6759]: Sending message to all cluster nodes > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:03 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e782: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 2 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 3 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 4 replies, expecting: 4 > Apr 1 01:29:03 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-2' at 6 (client=0x218eab0) > Apr 1 01:29:03 nebula4 lvm[6759]: unlock_resource: V_vg-one-2 lockid: 1 > Apr 1 01:29:03 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: distribute command: XID = 43974, flags=0x1 (LOCAL) > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43974 > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:03 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-2', cmd = 0x6 LCK_VG (UNLOCK|VG), flags = 0x0 ( ), critical_section = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-2 > Apr 1 01:29:03 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:03 nebula4 lvm[6759]: Got 1 replies, expecting: 1 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:03 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:03 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:03 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:03 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:03 nebula4 lvm[6759]: Read on local socket 5, len = 0 > Apr 1 01:29:03 nebula4 lvm[6759]: EOF on local socket: inprogress=0 > Apr 1 01:29:03 nebula4 lvm[6759]: Waiting for child thread > Apr 1 01:29:03 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:03 nebula4 lvm[6759]: Subthread finished > Apr 1 01:29:03 nebula4 lvm[6759]: Joined child thread > Apr 1 01:29:03 nebula4 lvm[6759]: ret == 0, errno = 0. removing client > Apr 1 01:29:03 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ebc0. client=0x218eab0, msg=(nil), len=0, csid=(nil), xid=43974 > Apr 1 01:29:03 nebula4 lvm[6759]: process_work_item: free fd -1 > Apr 1 01:29:03 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39602 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44354 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39605 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44357 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 39608 on node 40a8e782 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eab0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44360 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811138. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811138 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 0. len 31 > Apr 1 01:29:16 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218eea0. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:16 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:16 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 44363 on node 40a8e781 > Apr 1 01:29:16 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:16 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:16 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811137. len 18 > Apr 1 01:29:23 nebula4 lvm[6759]: Got new connection on fd 5 > Apr 1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:29:23 nebula4 lvm[6759]: creating pipe, [12, 13] > Apr 1 01:29:23 nebula4 lvm[6759]: Creating pre&post thread > Apr 1 01:29:23 nebula4 lvm[6759]: Created pre&post thread, state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: in sub thread: client = 0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218eab0) > Apr 1 01:29:23 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3 > Apr 1 01:29:23 nebula4 lvm[6759]: lock_resource returning 0, lock_id=1 > Apr 1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43975, flags=0x1 (LOCAL) > Apr 1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x218eab0, msg=0x218ebc0, len=30, csid=(nil), xid=43975 > Apr 1 01:29:23 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:23 nebula4 lvm[6759]: process_local_command: LOCK_VG (0x33) msg=0x218ed40, msglen =30, client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: do_lock_vg: resource 'V_vg-one-0', cmd = 0x1 LCK_VG (READ|VG), flags = 0x0 ( ), critical_section = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: Invalidating cached metadata for VG vg-one-0 > Apr 1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 1 > Apr 1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:23 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: Send local reply > Apr 1 01:29:23 nebula4 lvm[6759]: Read on local socket 5, len = 31 > Apr 1 01:29:23 nebula4 lvm[6759]: check_all_clvmds_running > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811138, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:29:23 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:29:23 nebula4 lvm[6759]: Writing status 0 down pipe 13 > Apr 1 01:29:23 nebula4 lvm[6759]: Waiting to do post command - state = 0 > Apr 1 01:29:23 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:29:23 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: distribute command: XID = 43976, flags=0x0 () > Apr 1 01:29:23 nebula4 lvm[6759]: num_nodes = 4 > Apr 1 01:29:23 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218f100. client=0x218eab0, msg=0x218ebc0, len=31, csid=(nil), xid=43976 > Apr 1 01:29:23 nebula4 lvm[6759]: Sending message to all cluster nodes > Apr 1 01:29:23 nebula4 lvm[6759]: process_work_item: local > Apr 1 01:29:23 nebula4 lvm[6759]: process_local_command: SYNC_NAMES (0x2d) msg=0x218ed00, msglen =31, client=0x218eab0 > Apr 1 01:29:23 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:23 nebula4 lvm[6759]: Reply from node 40a8e784: 0 bytes > Apr 1 01:29:23 nebula4 lvm[6759]: Got 1 replies, expecting: 4 > Apr 1 01:29:23 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:26 nebula4 corosync[6411]: [TOTEM ] A processor failed, forming new configuration. > Apr 1 01:29:29 nebula4 corosync[6411]: [TOTEM ] A new membership (192.168.231.129:1204) was formed. Members left: 1084811138 > Apr 1 01:29:29 nebula4 lvm[6759]: confchg callback. 0 joined, 1 left, 3 members > Apr 1 01:29:29 nebula4 crmd[6490]: warning: match_down_event: No match for shutdown action on 1084811138 > Apr 1 01:29:29 nebula4 crmd[6490]: notice: peer_update_callback: Stonith/shutdown of nebula2 not matched > Apr 1 01:29:29 nebula4 crmd[6490]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Apr 1 01:29:29 nebula4 corosync[6411]: [QUORUM] Members[3]: 1084811137 1084811139 1084811140 > Apr 1 01:29:29 nebula4 corosync[6411]: [MAIN ] Completed service synchronization, ready to provide service. > Apr 1 01:29:29 nebula4 crmd[6490]: notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member) > Apr 1 01:29:29 nebula4 crmd[6490]: warning: match_down_event: No match for shutdown action on 1084811138 > Apr 1 01:29:29 nebula4 pacemakerd[6483]: notice: crm_update_peer_state: pcmk_quorum_notification: Node nebula2[1084811138] - state is now lost (was member) > Apr 1 01:29:29 nebula4 crmd[6490]: notice: peer_update_callback: Stonith/shutdown of nebula2 not matched > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 0. len 31 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811137. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 0. len 31 > Apr 1 01:29:29 nebula4 lvm[6759]: add_to_lvmqueue: cmd=0x218ed00. client=0x6a1d60, msg=0x7fff260dfcac, len=31, csid=0x7fff260de67c, xid=0 > Apr 1 01:29:29 nebula4 lvm[6759]: process_work_item: remote > Apr 1 01:29:29 nebula4 lvm[6759]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x5000000 XID 43802 on node 40a8e783 > Apr 1 01:29:29 nebula4 lvm[6759]: Syncing device names > Apr 1 01:29:29 nebula4 lvm[6759]: LVM thread waiting for work > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811139 for 1084811140. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e783: 0 bytes > Apr 1 01:29:29 nebula4 lvm[6759]: Got 2 replies, expecting: 4 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811140. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: Reply from node 40a8e781: 0 bytes > Apr 1 01:29:29 nebula4 lvm[6759]: Got 3 replies, expecting: 4 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811137 for 1084811139. len 18 > Apr 1 01:29:29 nebula4 lvm[6759]: 1084811140 got message from nodeid 1084811140 for 1084811139. len 18 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811137 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 kernel: [50799.162381] dlm: closing connection to node 1084811138 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811139 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 pid 44527 nodedown time 1427844569 fence_all dlm_stonith > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence result 1084811138 pid 44527 result 1 exit status > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence status 1084811138 receive 1 from 1084811140 walltime 1427844569 local 50759 > Apr 1 01:29:29 nebula4 dlm_controld[6737]: 50759 fence request 1084811138 no actor > Apr 1 01:29:30 nebula4 pengine[6489]: warning: pe_fence_node: Node nebula2 will be fenced because the node is no longer part of the cluster > Apr 1 01:29:30 nebula4 pengine[6489]: warning: determine_online_status: Node nebula2 is unclean > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula2: unknown error (1) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula2 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_dlm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_clvm:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_vg_one:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action p_fs_one-datastores:3_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: custom_action: Action stonith-nebula1-IPMILAN_stop_0 on nebula2 is unrunnable (offline) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: stage6: Scheduling Node nebula2 for STONITH > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_dlm:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_clvm:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_vg_one:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Stop p_fs_one-datastores:3#011(nebula2) > Apr 1 01:29:30 nebula4 pengine[6489]: notice: LogActions: Move stonith-nebula1-IPMILAN#011(Started nebula2 -> nebula3) > Apr 1 01:29:30 nebula4 pengine[6489]: warning: process_pe_message: Calculated Transition 101: /var/lib/pacemaker/pengine/pe-warn-22.bz2 > Apr 1 01:29:30 nebula4 crmd[6490]: notice: te_fence_node: Executing reboot fencing operation (98) on nebula2 (timeout=30000) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: handle_request: Client crmd.6490.2707e557 wants to fence (reboot) 'nebula2' with device '(any)' > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for nebula2: 39eaf3a2-d7e0-417d-8a01-d2f373973d6b (0) > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula1-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula2-IPMILAN can fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-one-frontend can not fence nebula2: static-list > Apr 1 01:29:30 nebula4 stonith-ng[6486]: notice: can_fence_host_with_device: stonith-nebula3-IPMILAN can not fence nebula2: static-list > Apr 1 01:29:32 nebula4 stonith-ng[6486]: notice: remote_op_done: Operation reboot of nebula2 by nebula3 for crmd.6490 at nebula4.39eaf3a2: OK > Apr 1 01:29:32 nebula4 crmd[6490]: notice: tengine_stonith_callback: Stonith operation 2/98:101:0:28913388-04df-49cb-9927-362b21a74014: OK (0) > Apr 1 01:29:32 nebula4 crmd[6490]: notice: tengine_stonith_notify: Peer nebula2 was terminated (reboot) by nebula3 for nebula4: OK (ref=39eaf3a2-d7e0-417d-8a01-d2f373973d6b) by client crmd.6490 > Apr 1 01:29:32 nebula4 crmd[6490]: notice: te_rsc_command: Initiating action 91: start stonith-nebula1-IPMILAN_start_0 on nebula3 > Apr 1 01:29:33 nebula4 crmd[6490]: notice: run_graph: Transition 101 (Complete=13, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-22.bz2): Stopped > Apr 1 01:29:33 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:29:33 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:29:33 nebula4 pengine[6489]: notice: process_pe_message: Calculated Transition 102: /var/lib/pacemaker/pengine/pe-input-129.bz2 > Apr 1 01:29:33 nebula4 crmd[6490]: notice: te_rsc_command: Initiating action 88: monitor stonith-nebula1-IPMILAN_monitor_3600000 on nebula3 > Apr 1 01:29:34 nebula4 crmd[6490]: notice: run_graph: Transition 102 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-129.bz2): Complete > Apr 1 01:29:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 1 01:30:01 nebula4 CRON[44640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 datastores wait for fencing > Apr 1 01:30:25 nebula4 dlm_controld[6737]: 50815 clvmd wait for fencing > Apr 1 01:30:29 nebula4 lvm[6759]: Request timed-out (send: 1427844563, now: 1427844629) > Apr 1 01:30:29 nebula4 lvm[6759]: Request timed-out. padding > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811137, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e781 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811139, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e783 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811138, state = 1 > Apr 1 01:30:29 nebula4 lvm[6759]: down_callback. node 1084811140, state = 3 > Apr 1 01:30:29 nebula4 lvm[6759]: Checking for a reply from 40a8e784 > Apr 1 01:30:29 nebula4 lvm[6759]: Got post command condition... > Apr 1 01:30:29 nebula4 lvm[6759]: Waiting for next pre command > Apr 1 01:30:29 nebula4 lvm[6759]: read on PIPE 12: 4 bytes: status: 0 > Apr 1 01:30:29 nebula4 lvm[6759]: background routine status was 0, sock_client=0x218eab0 > Apr 1 01:30:29 nebula4 lvm[6759]: Send local reply > Apr 1 01:30:29 nebula4 lvm[6759]: Read on local socket 5, len = 30 > Apr 1 01:30:29 nebula4 lvm[6759]: Got pre command condition... > Apr 1 01:30:29 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 6 (client=0x218eab0) > Apr 1 01:30:29 nebula4 lvm[6759]: unlock_resource: V_vg-one-0 lockid: 1 > Apr 1 01:40:01 nebula4 CRON[47640]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:44:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] > Apr 1 01:44:34 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula3: unknown error (1) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: unpack_rsc_op: Processing failed op start for stonith-nebula4-IPMILAN on nebula1: unknown error (1) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula1 after 1000000 failures (max=1000000) > Apr 1 01:44:34 nebula4 pengine[6489]: warning: common_apply_stickiness: Forcing stonith-nebula4-IPMILAN away from nebula3 after 1000000 failures (max=1000000) > Apr 1 01:44:34 nebula4 pengine[6489]: notice: process_pe_message: Calculated Transition 103: /var/lib/pacemaker/pengine/pe-input-130.bz2 > Apr 1 01:44:34 nebula4 crmd[6490]: notice: run_graph: Transition 103 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-130.bz2): Complete > Apr 1 01:44:34 nebula4 crmd[6490]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > Apr 1 01:45:01 nebula4 CRON[49089]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then munin-run apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then munin-run apt update 7200 12 >/dev/null; fi) > Apr 1 01:46:01 nebula4 CRON[570]: (root) CMD (if test -x /usr/sbin/apticron; then /usr/sbin/apticron --cron; else true; fi) > Apr 1 01:49:20 nebula4 lvm[6759]: Got new connection on fd 17 > Apr 1 01:49:20 nebula4 lvm[6759]: Read on local socket 17, len = 30 > Apr 1 01:49:20 nebula4 lvm[6759]: creating pipe, [18, 19] > Apr 1 01:49:20 nebula4 lvm[6759]: Creating pre&post thread > Apr 1 01:49:20 nebula4 lvm[6759]: Created pre&post thread, state = 0 > Apr 1 01:49:20 nebula4 lvm[6759]: in sub thread: client = 0x218f1f0 > Apr 1 01:49:20 nebula4 lvm[6759]: doing PRE command LOCK_VG 'V_vg-one-0' at 1 (client=0x218f1f0) > Apr 1 01:49:20 nebula4 lvm[6759]: lock_resource 'V_vg-one-0', flags=0, mode=3 > _______________________________________________ > Users mailing list: Users at clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From thorvald.hallvardsson at gmail.com Thu Apr 23 10:11:36 2015 From: thorvald.hallvardsson at gmail.com (Thorvald Hallvardsson) Date: Thu, 23 Apr 2015 11:11:36 +0100 Subject: [Linux-cluster] GFS2 over NFS4 Message-ID: Hi guys, I need some help and answers related to share GFS2 file system over NFS. I have read the RH documentation but still some things are a bit unclear to me. First of all I need to build a POC for the shared storage cluster which initially will contain 3 nodes in the storage cluster. This is all going to run as a VM environment on Hyper-V. Generally the idea is to share virtual VHDX across 3 nodes, put LVM and GFS2 on top of it and then share it via NFS to the clients. I have got the initial cluster built on Centos 7 using pacemaker. I generally followed RH docs to build it so I ended up with the simple GFS2 cluster and pacemaker managing fencing and floating VIP resource. Now I'm wondering about the NFS. RedHat documentation is a bit conflicting or rather unclear in some places and I found quite few manuals on the internet about similar configuration and generally some of them suggest to mount the NFS share on the clients with nolock option RH docs mention local flock and I got confused about what supposed to be where. Of course I don't know if my understanding is correct but the reason to "disable" NFS locking is because GFS2 is already doing it anyway via DLM so there is no need for NFS to do same thing what eventually mean that I will have some sort of double locking mechanism in place. So first question is where I suppose to setup locks or rather no locks and how the export should look like ? Second thing is I was thinking about going a step forward and use NFS4 for the exports. However from what I have read about NFS4 it does locking by default and there is no way to disable them. Does that mean NFS4 is not suitable in this case at all ? That's all for now. I appreciate your help. Thank you. TH -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Thu Apr 23 11:51:36 2015 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 23 Apr 2015 13:51:36 +0200 Subject: [Linux-cluster] GFS2 over NFS4 In-Reply-To: References: Message-ID: <5538DCC8.4000207@redhat.com> On 04/23/2015 12:11 PM, Thorvald Hallvardsson wrote: > Hi guys, > > I need some help and answers related to share GFS2 file system over NFS. > I have read the RH documentation but still some things are a bit unclear > to me. > > First of all I need to build a POC for the shared storage cluster which > initially will contain 3 nodes in the storage cluster. This is all going > to run as a VM environment on Hyper-V. Generally the idea is to share > virtual VHDX across 3 nodes, put LVM and GFS2 on top of it and then > share it via NFS to the clients. I have got the initial cluster built on > Centos 7 using pacemaker. I generally followed RH docs to build it so I > ended up with the simple GFS2 cluster and pacemaker managing fencing and > floating VIP resource. Interesting, what fencing solution did you use? Fabio > > Now I'm wondering about the NFS. RedHat documentation is a bit > conflicting or rather unclear in some places and I found quite few > manuals on the internet about similar configuration and generally some > of them suggest to mount the NFS share on the clients with nolock option > RH docs mention local flock and I got confused about what supposed to be > where. Of course I don't know if my understanding is correct but the > reason to "disable" NFS locking is because GFS2 is already doing it > anyway via DLM so there is no need for NFS to do same thing what > eventually mean that I will have some sort of double locking mechanism > in place. So first question is where I suppose to setup locks or rather > no locks and how the export should look like ? > > Second thing is I was thinking about going a step forward and use NFS4 > for the exports. However from what I have read about NFS4 it does > locking by default and there is no way to disable them. Does that mean > NFS4 is not suitable in this case at all ? > > That's all for now. > > I appreciate your help. > > Thank you. > TH > > From mij at irwan.name Thu Apr 23 13:05:57 2015 From: mij at irwan.name (Mohd Irwan Jamaluddin) Date: Thu, 23 Apr 2015 21:05:57 +0800 Subject: [Linux-cluster] GFS2 over NFS4 In-Reply-To: References: Message-ID: On Thu, Apr 23, 2015 at 6:11 PM, Thorvald Hallvardsson < thorvald.hallvardsson at gmail.com> wrote: > Hi guys, > > I need some help and answers related to share GFS2 file system over NFS. I > have read the RH documentation but still some things are a bit unclear to > me. > > First of all I need to build a POC for the shared storage cluster which > initially will contain 3 nodes in the storage cluster. This is all going to > run as a VM environment on Hyper-V. Generally the idea is to share virtual > VHDX across 3 nodes, put LVM and GFS2 on top of it and then share it via > NFS to the clients. I have got the initial cluster built on Centos 7 using > pacemaker. I generally followed RH docs to build it so I ended up with the > simple GFS2 cluster and pacemaker managing fencing and floating VIP > resource. > > Now I'm wondering about the NFS. RedHat documentation is a bit conflicting > or rather unclear in some places and I found quite few manuals on the > internet about similar configuration and generally some of them suggest to > mount the NFS share on the clients with nolock option RH docs mention local > flock and I got confused about what supposed to be where. Of course I don't > know if my understanding is correct but the reason to "disable" NFS locking > is because GFS2 is already doing it anyway via DLM so there is no need for > NFS to do same thing what eventually mean that I will have some sort of > double locking mechanism in place. So first question is where I suppose to > setup locks or rather no locks and how the export should look like ? > > Second thing is I was thinking about going a step forward and use NFS4 for > the exports. However from what I have read about NFS4 it does locking by > default and there is no way to disable them. Does that mean NFS4 is not > suitable in this case at all ? > > That's all for now. > > I appreciate your help. > > This is the latest KB regarding combination of GFS + NFS that I know of, Does Red Hat recommend exporting GFS or GFS2 with NFS or Samba in a RHEL Resilient Storage cluster, and how should I configure it if I do? https://access.redhat.com/solutions/20327 -------------- next part -------------- An HTML attachment was scrubbed... URL: From thorvald.hallvardsson at gmail.com Thu Apr 23 14:16:46 2015 From: thorvald.hallvardsson at gmail.com (Thorvald Hallvardsson) Date: Thu, 23 Apr 2015 15:16:46 +0100 Subject: [Linux-cluster] GFS2 over NFS4 In-Reply-To: References: Message-ID: Hi guys, @Fabio I have just realised that I have no fencing device at all as STONITH is set to false however some of my resources are set to fence when fail :/. There is really no choice for Hyper-V unless I will compile my own version of libvirt :(. @Mohd this is what I actually trying to use. I managed to find out that localflocks needs to be used to mount GFS2 on the exporting nodes and basically my cluster is configured meeting all requirements in that document. So generally the idea is a bit complex to be honest. I'm going to have multiple nodes with shared VHDX mounted on each node on the cluster. However each share will be allocated to separate VIP and each node will export different resources. Resources are going to be linked to the IPs so by doing that all nodes in the cluster will be utilised and at the same time each node from the cluster would be able to take over all resources. Maybe someone has different ideas ? Regards, TH On 23 April 2015 at 14:05, Mohd Irwan Jamaluddin wrote: > On Thu, Apr 23, 2015 at 6:11 PM, Thorvald Hallvardsson < > thorvald.hallvardsson at gmail.com> wrote: > >> Hi guys, >> >> I need some help and answers related to share GFS2 file system over NFS. >> I have read the RH documentation but still some things are a bit unclear to >> me. >> >> First of all I need to build a POC for the shared storage cluster which >> initially will contain 3 nodes in the storage cluster. This is all going to >> run as a VM environment on Hyper-V. Generally the idea is to share virtual >> VHDX across 3 nodes, put LVM and GFS2 on top of it and then share it via >> NFS to the clients. I have got the initial cluster built on Centos 7 using >> pacemaker. I generally followed RH docs to build it so I ended up with the >> simple GFS2 cluster and pacemaker managing fencing and floating VIP >> resource. >> >> Now I'm wondering about the NFS. RedHat documentation is a bit >> conflicting or rather unclear in some places and I found quite few manuals >> on the internet about similar configuration and generally some of them >> suggest to mount the NFS share on the clients with nolock option RH docs >> mention local flock and I got confused about what supposed to be where. Of >> course I don't know if my understanding is correct but the reason to >> "disable" NFS locking is because GFS2 is already doing it anyway via DLM so >> there is no need for NFS to do same thing what eventually mean that I will >> have some sort of double locking mechanism in place. So first question is >> where I suppose to setup locks or rather no locks and how the export should >> look like ? >> >> Second thing is I was thinking about going a step forward and use NFS4 >> for the exports. However from what I have read about NFS4 it does locking >> by default and there is no way to disable them. Does that mean NFS4 is not >> suitable in this case at all ? >> >> That's all for now. >> >> I appreciate your help. >> >> > This is the latest KB regarding combination of GFS + NFS that I know of, > > Does Red Hat recommend exporting GFS or GFS2 with NFS or Samba in a RHEL > Resilient Storage cluster, and how should I configure it if I do? > https://access.redhat.com/solutions/20327 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jashokda at cisco.com Fri Apr 24 12:12:05 2015 From: jashokda at cisco.com (Jatin Davey) Date: Fri, 24 Apr 2015 17:42:05 +0530 Subject: [Linux-cluster] Working of a two-node cluster Message-ID: <553A3315.2050508@cisco.com> Hi I am using a two node cluster using RHEL 6.5. I have a very fundamental question. For the two node cluster to work , Is it mandatory that both the nodes are "online" and communicating with each other ? What i can see is that if there is communication failure between them then either both the nodes are fenced or the cluster gets into a "stopped" state (Seen from output of clustat command). Apologies if my questions are naive. I am just starting to work with RHEL cluster add-on. Thanks Jatin -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Fri Apr 24 12:31:02 2015 From: emi2fast at gmail.com (emmanuel segura) Date: Fri, 24 Apr 2015 14:31:02 +0200 Subject: [Linux-cluster] Working of a two-node cluster In-Reply-To: <553A3315.2050508@cisco.com> References: <553A3315.2050508@cisco.com> Message-ID: please share your cluster config, maybe in this way someone can help you. 2015-04-24 14:12 GMT+02:00 Jatin Davey : > Hi > > I am using a two node cluster using RHEL 6.5. I have a very fundamental > question. > > For the two node cluster to work , Is it mandatory that both the nodes are > "online" and communicating with each other ? > > What i can see is that if there is communication failure between them then > either both the nodes are fenced or the cluster gets into a "stopped" state > (Seen from output of clustat command). > > Apologies if my questions are naive. I am just starting to work with RHEL > cluster add-on. > > Thanks > Jatin > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- esta es mi vida e me la vivo hasta que dios quiera From jashokda at cisco.com Fri Apr 24 12:53:16 2015 From: jashokda at cisco.com (Jatin Davey) Date: Fri, 24 Apr 2015 18:23:16 +0530 Subject: [Linux-cluster] Working of a two-node cluster In-Reply-To: References: <553A3315.2050508@cisco.com> Message-ID: <553A3CBC.2050909@cisco.com> Here is my cluster.conf file ************************