Hi Dominic, Yes the errors are only belongs to passive path. <div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> ------------------------------ Message: 3 Date: Tue, 21 Jun 2011 18:22:49 +0530 From: dOminic <<a href="mailto:share2dom@gmail.com">share2dom@gmail.com</a>> To: linux clustering <<a href="mailto:linux-cluster@redhat.com">linux-cluster@redhat.com</a>> Subject: Re: [Linux-cluster] Cluster Failover Failed Message-ID: <BANLkTi=bAtD8BYp4_T5ksir=<a href="mailto:dRSAO2dq9Q@mail.gmail.com">dRSAO2dq9Q@mail.gmail.com</a>> Content-Type: text/plain; charset="iso-8859-1" Hi, Btw, how many HBAs are present in your box ? . Problem is with scsi3 only ?. Refer <a href="https://access.redhat.com/kb/docs/DOC-2991" target="_blank">https://access.redhat.com/kb/docs/DOC-2991</a> , then set the filter. Also, I would suggest you to open ticket with Linux vendor if IO errors are belongs to Active paths. Pointed IO errors are belongs to disk that in passive paths group ?. you can verify the same in multipath-ll output . regards, On Sun, Jun 19, 2011 at 10:03 PM, dOminic <<a href="mailto:share2dom@gmail.com">share2dom@gmail.com</a>> wrote: > Hi Balaji, > > Yes, the reported message is harmless ... However, you can try following > > 1) I would suggest you to set the filter setting in lvm.conf to properly > scan your mpath* devices and local disks. > 2) Enable blacklist section in multipath.conf eg: > > blacklist { > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > devnode "^hd[a-z]" > } > > # multipath -v2 > > Observe the box. Check whether that helps ... > > > Regards, > > > On Wed, Jun 15, 2011 at 12:16 AM, Balaji S <<a href="mailto:skjbalaji@gmail.com">skjbalaji@gmail.com</a>> wrote: > >> Hi, >> In my setup implemented 10 tow node cluster's which running mysql as >> cluster service, ipmi card as fencing device. >> >> In my /var/log/messages i am keep getting the errors like below, >> >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdm, sector 0 >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:2: Device not ready: <6>: >> Current: sense key: Not Ready >> Jun 14 12:50:48 hostname kernel: Add. Sense: Logical unit not ready, >> manual intervention required >> Jun 14 12:50:48 hostname kernel: >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdn, sector 0 >> Jun 14 12:50:48 hostname kernel: sd 3:0:2:4: Device not ready: <6>: >> Current: sense key: Not Ready >> Jun 14 12:50:48 hostname kernel: Add. Sense: Logical unit not ready, >> manual intervention required >> Jun 14 12:50:48 hostname kernel: >> Jun 14 12:50:48 hostname kernel: end_request: I/O error, dev sdp, sector 0 >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:1: Device not ready: <6>: >> Current: sense key: Not Ready >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, >> manual intervention required >> Jun 14 12:51:10 hostname kernel: >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdc, sector 0 >> Jun 14 12:51:10 hostname kernel: printk: 3 messages suppressed. >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdc, logical >> block 0 >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:2: Device not ready: <6>: >> Current: sense key: Not Ready >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, >> manual intervention required >> Jun 14 12:51:10 hostname kernel: >> Jun 14 12:51:10 hostname kernel: end_request: I/O error, dev sdd, sector 0 >> Jun 14 12:51:10 hostname kernel: Buffer I/O error on device sdd, logical >> block 0 >> Jun 14 12:51:10 hostname kernel: sd 3:0:0:4: Device not ready: <6>: >> Current: sense key: Not Ready >> Jun 14 12:51:10 hostname kernel: Add. Sense: Logical unit not ready, >> manual intervention required >> >> >> when i am checking the multipath -ll , this all devices are in passive >> path. >> >> Environment : >> >> RHEL 5.4 & EMC SAN >> >> Please suggest how to overcome this issue. Support will be highly helpful. >> Thanks in Advance >> >> >> -- >> Thanks, >> BSK >> >> -- >> Linux-cluster mailing list >> <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a> >> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <<a href="https://www.redhat.com/archives/linux-cluster/attachments/20110621/e41e841c/attachment.html" target="_blank">https://www.redhat.com/archives/linux-cluster/attachments/20110621/e41e841c/attachment.html</a>> ------------------------------ Message: 4 Date: Tue, 21 Jun 2011 15:31:13 +0200 From: Miha Valencic <<a href="mailto:miha.valencic@gmail.com">miha.valencic@gmail.com</a>> To: linux clustering <<a href="mailto:linux-cluster@redhat.com">linux-cluster@redhat.com</a>> Subject: Re: [Linux-cluster] Troubleshooting service relocation Message-ID: <BANLkTi=eT93Bv3qeO0+t+EzZP=<a href="mailto:6yDYaV1Q@mail.gmail.com">6yDYaV1Q@mail.gmail.com</a>> Content-Type: text/plain; charset="utf-8" Michael, I've configured the logging on RM and am now waiting for it to switch nodes. Hopefully, I can see a reason why it is relocating. Thanks, Miha. On Sat, Jun 18, 2011 at 11:24 AM, Michael Pye <<a href="mailto:michael@ulimit.org">michael@ulimit.org</a>> wrote: > On 17/06/2011 09:13, Miha Valencic wrote: > > How can I turn on logging or what else can I check? > > Take a look at this knowledgebase article: > <a href="https://access.redhat.com/kb/docs/DOC-53500" target="_blank">https://access.redhat.com/kb/docs/DOC-53500</a> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <<a href="https://www.redhat.com/archives/linux-cluster/attachments/20110621/19a643fd/attachment.html" target="_blank">https://www.redhat.com/archives/linux-cluster/attachments/20110621/19a643fd/attachment.html</a>> ------------------------------ Message: 5 Date: Tue, 21 Jun 2011 09:57:38 -0400 From: "Nicolas Ross" <<a href="mailto:rossnick-lists@cybercat.ca">rossnick-lists@cybercat.ca</a>> To: "linux clustering" <<a href="mailto:linux-cluster@redhat.com">linux-cluster@redhat.com</a>> Subject: [Linux-cluster] GFS2 fatal: filesystem consistency error Message-ID: <AD364AF1E9D94C50B96231FB0320B1DE@versa> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original 8 node cluster, fiber channel hbas and disks access trough a qlogic fabric. I've got hit 3 times with this error on different nodes : GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency error GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267 GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, file = fs/gfs2/inode.c, line = 352 GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file system GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount GFS2: fsid=CyberCluster:GizServer.1: withdrawn Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T 2.6.32-131.2.1.el6.x86_64 #1 Call Trace: [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2] [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2] [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] [<ffffffff8118bf82>] ? iput+0x62/0x70 [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2] [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 [<ffffffff8108dd96>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 [<ffffffff8108dd00>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 no_formal_ino = 9582 no_addr = 6698267 i_disksize = 6838 blocks = 0 i_goal = 6698304 i_diskflags = 0x00000000 i_height = 1 i_depth = 0 i_entries = 0 i_eattr = 0 GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5 gdlm_unlock 5,66351b err=-22 Only, with different inodes each time. After that event, services running on that filesystem are marked failed and not moved over another node. Any access to that fs yields I/O error. Server needed to be rebooted to properly work again. I did ran a fsck last night on that filesystem, and it did find some errors, but nothing serious. Lots (realy lots) of those : Ondisk and fsck bitmaps differ at block 5771602 (0x581152) Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) Metadata type is 0 (free) Fix bitmap for block 5771602 (0x581152) ? (y/n) And after completing the fsck, I started back some services, and I got the same error on another filesystem that is practily empty and used for small utilities used troughout the cluster... What should I do to find the source of this problem ? ------------------------------ Message: 6 Date: Tue, 21 Jun 2011 10:42:40 -0400 (EDT) From: Bob Peterson <<a href="mailto:rpeterso@redhat.com">rpeterso@redhat.com</a>> To: linux clustering <<a href="mailto:linux-cluster@redhat.com">linux-cluster@redhat.com</a>> Subject: Re: [Linux-cluster] GFS2 fatal: filesystem consistency error Message-ID: <<a href="mailto:1036238479.689034.1308667360488.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com">1036238479.689034.1308667360488.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com</a>> Content-Type: text/plain; charset=utf-8 ----- Original Message ----- | 8 node cluster, fiber channel hbas and disks access trough a qlogic | fabric. | | I've got hit 3 times with this error on different nodes : | | GFS2: fsid=CyberCluster:GizServer.1: fatal: filesystem consistency | error | GFS2: fsid=CyberCluster:GizServer.1: inode = 9582 6698267 | GFS2: fsid=CyberCluster:GizServer.1: function = gfs2_dinode_dealloc, | file = | fs/gfs2/inode.c, line = 352 | GFS2: fsid=CyberCluster:GizServer.1: about to withdraw this file | system | GFS2: fsid=CyberCluster:GizServer.1: telling LM to unmount | GFS2: fsid=CyberCluster:GizServer.1: withdrawn | Pid: 2659, comm: delete_workqueu Tainted: G W ---------------- T | 2.6.32-131.2.1.el6.x86_64 #1 | Call Trace: | [<ffffffffa044ffd2>] ? gfs2_lm_withdraw+0x102/0x130 [gfs2] | [<ffffffffa0425209>] ? trunc_dealloc+0xa9/0x130 [gfs2] | [<ffffffffa04501dd>] ? gfs2_consist_inode_i+0x5d/0x60 [gfs2] | [<ffffffffa0435584>] ? gfs2_dinode_dealloc+0x64/0x210 [gfs2] | [<ffffffffa044e1da>] ? gfs2_delete_inode+0x1ba/0x280 [gfs2] | [<ffffffffa044e0ad>] ? gfs2_delete_inode+0x8d/0x280 [gfs2] | [<ffffffffa044e020>] ? gfs2_delete_inode+0x0/0x280 [gfs2] | [<ffffffff8118cfbe>] ? generic_delete_inode+0xde/0x1d0 | [<ffffffffa0432940>] ? delete_work_func+0x0/0x80 [gfs2] | [<ffffffff8118d115>] ? generic_drop_inode+0x65/0x80 | [<ffffffffa044cc4e>] ? gfs2_drop_inode+0x2e/0x30 [gfs2] | [<ffffffff8118bf82>] ? iput+0x62/0x70 | [<ffffffffa0432994>] ? delete_work_func+0x54/0x80 [gfs2] | [<ffffffff810887d0>] ? worker_thread+0x170/0x2a0 | [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40 | [<ffffffff81088660>] ? worker_thread+0x0/0x2a0 | [<ffffffff8108dd96>] ? kthread+0x96/0xa0 | [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 | [<ffffffff8108dd00>] ? kthread+0x0/0xa0 | [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 | no_formal_ino = 9582 | no_addr = 6698267 | i_disksize = 6838 | blocks = 0 | i_goal = 6698304 | i_diskflags = 0x00000000 | i_height = 1 | i_depth = 0 | i_entries = 0 | i_eattr = 0 | GFS2: fsid=CyberCluster:GizServer.1: gfs2_delete_inode: -5 | gdlm_unlock 5,66351b err=-22 | | | Only, with different inodes each time. | | After that event, services running on that filesystem are marked | failed and | not moved over another node. Any access to that fs yields I/O error. | Server | needed to be rebooted to properly work again. | | I did ran a fsck last night on that filesystem, and it did find some | errors, | but nothing serious. Lots (realy lots) of those : | | Ondisk and fsck bitmaps differ at block 5771602 (0x581152) | Ondisk status is 1 (Data) but FSCK thinks it should be 0 (Free) | Metadata type is 0 (free) | Fix bitmap for block 5771602 (0x581152) ? (y/n) | | And after completing the fsck, I started back some services, and I got | the | same error on another filesystem that is practily empty and used for | small | utilities used troughout the cluster... | | What should I do to find the source of this problem ? Hi, I believe this is a GFS2 bug we've already solved. Please contact Red Hat Support. Regards, Bob Peterson Red Hat File Systems ------------------------------ -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> End of Linux-cluster Digest, Vol 86, Issue 19 ********************************************* </blockquote> </div> -- Thanks, Balaji S