[rhelv6-list] Unsubscribe

Brandon Sawyers bsawyers at vt.edu
Fri Dec 22 16:30:52 UTC 2017


On Fri, Dec 22, 2017, 11:29 Brown, Hugh M <hugh-brown at uiowa.edu> wrote:

> Response at bottom
>
> -----Original Message-----
> From: rhelv6-list-bounces at redhat.com [mailto:
> rhelv6-list-bounces at redhat.com] On Behalf Of francis picabia
> Sent: Thursday, December 21, 2017 9:47 AM
> To: Red Hat Enterprise Linux 6 (Santiago) discussion mailing-list <
> rhelv6-list at redhat.com>
> Subject: Re: [rhelv6-list] fsck -n always showing errors
>
> Thanks for the replies...
>
>
> OK, I was expecting there must be some sort of false positive going on.
>
> For the system I listed here, those are not persistent errors.
>
>
> However there is one which does show the same orphaned inode numbers
>
> on each run, so this is likely real.
>
> # fsck -n /var
> fsck from util-linux-ng 2.17.2
> e2fsck 1.41.12 (17-May-2010)
> Warning!  /dev/sda2 is mounted.
> Warning: skipping journal recovery because doing a read-only filesystem
> check.
> /dev/sda2 contains a file system with errors, check forced.
> Pass 1: Checking inodes, blocks, and sizes Deleted inode 1059654 has zero
> dtime.  Fix? no
>
> Inodes that were part of a corrupted orphan linked list found.  Fix? no
>
> Inode 1061014 was part of the orphaned inode list.  IGNORED.
> Inode 1061275 was part of the orphaned inode list.  IGNORED.
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information Block bitmap differences:
> -124293 -130887 -4244999 -4285460 -4979711 -4984408 -4989489 -7052754
> -7052847 -7053693 -7069384 -7069539 -7069657 -7069788 -7074507
> -(7095835--7095839) -7096847 -7097195 -9626336 Fix? no
>
> Free blocks count wrong (6918236, counted=5214069).
> Fix? no
>
> Inode bitmap differences:  -1059654 -1061014 -1061275 Fix? no
>
> Free inodes count wrong (1966010, counted=1878618).
> Fix? no
>
>
> /dev/sda2: ********** WARNING: Filesystem still has errors **********
>
> /dev/sda2: 598086/2564096 files (1.5% non-contiguous), 3321764/10240000
> blocks
>
>
> dmesg shows it had some scsi issues.  I suspect the scsi error
>
> is triggered by operation of VDP backup, which freezes the system
>
> for a second when completing the backup snapshot.
>
> sd 2:0:0:0: [sda] task abort on host 2, ffff880036e618c0 sd 2:0:0:0: [sda]
> task abort on host 2, ffff880036e61ac0 sd 2:0:0:0: [sda] task abort on host
> 2, ffff880036e614c0 sd 2:0:0:0: [sda] task abort on host 2,
> ffff880036e61cc0 sd 2:0:0:0: [sda] task abort on host 2, ffff880036e61dc0
> sd 2:0:0:0: [sda] task abort on host 2, ffff880036e617c0 sd 2:0:0:0: [sda]
> task abort on host 2, ffff880036e616c0 sd 2:0:0:0: [sda] task abort on host
> 2, ffff880036e615c0 sd 2:0:0:0: [sda] task abort on host 2, ffff880036e613c0
> INFO: task jbd2/sda2-8:752 blocked for more than 120 seconds.
>       Not tainted 2.6.32-696.3.2.el6.x86_64 #1 "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> jbd2/sda2-8   D 0000000000000000     0   752      2 0x00000000
>  ffff880037ac7c20 0000000000000046 ffff880037ac7bd0 ffffffff813a27eb
>  ffff880037ac7b80 ffffffff81014b39 ffff880037ac7bd0 ffffffff810b2a4f
>  ffff880036c44138 0000000000000000 ffff880037a69068 ffff880037ac7fd8 Call
> Trace:
>  [<ffffffff813a27eb>] ? scsi_request_fn+0xdb/0x750  [<ffffffff81014b39>] ?
> read_tsc+0x9/0x20  [<ffffffff810b2a4f>] ? ktime_get_ts+0xbf/0x100
> [<ffffffff811d1400>] ? sync_buffer+0x0/0x50  [<ffffffff8154b0e3>]
> io_schedule+0x73/0xc0  [<ffffffff811d1440>] sync_buffer+0x40/0x50
> [<ffffffff8154bbcf>] __wait_on_bit+0x5f/0x90  [<ffffffff811d1400>] ?
> sync_buffer+0x0/0x50  [<ffffffff8154bc78>]
> out_of_line_wait_on_bit+0x78/0x90  [<ffffffff810a69b0>] ?
> wake_bit_function+0x0/0x50  [<ffffffff810a67b7>] ? bit_waitqueue+0x17/0xd0
> [<ffffffff811d13f6>] __wait_on_buffer+0x26/0x30  [<ffffffffa0180146>]
> jbd2_journal_commit_transaction+0xaa6/0x14f0 [jbd2]  [<ffffffff8108fbdb>] ?
> try_to_del_timer_sync+0x7b/0xe0  [<ffffffffa0185a68>] kjournald2+0xb8/0x220
> [jbd2]  [<ffffffff810a6930>] ? autoremove_wake_function+0x0/0x40
> [<ffffffffa01859b0>] ? kjournald2+0x0/0x220 [jbd2]  [<ffffffff810a649e>]
> kthread+0x9e/0xc0  [<ffffffff8100c28a>] child_rip+0xa/0x20
> [<ffffffff810a6400>] ? kthread+0x0/0xc0  [<ffffffff8100c
>  280>] ? child_rip+0x0/0x20
> INFO: task master:1778 blocked for more than 120 seconds.
>       Not tainted 2.6.32-696.3.2.el6.x86_64 #1 "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> master        D 0000000000000000     0  1778      1 0x00000080
>  ffff8800ba0cb948 0000000000000082 0000000000000000 ffff88000003e460
>  00000037ffffffc8 0000004100000000 001744a7cc279bbf 0000000000000001
>  ffff8800ba0c8000 00000002863b16d4 ffff880037a55068 ffff8800ba0cbfd8 Call
> Trace:
>  [<ffffffff811d1400>] ? sync_buffer+0x0/0x50  [<ffffffff8154b0e3>]
> io_schedule+0x73/0xc0  [<ffffffff811d1440>] sync_buffer+0x40/0x50
> [<ffffffff8154b99a>] __wait_on_bit_lock+0x5a/0xc0  [<ffffffff811d1400>] ?
> sync_buffer+0x0/0x50  [<ffffffff8154ba78>]
> out_of_line_wait_on_bit_lock+0x78/0x90
>  [<ffffffff810a69b0>] ? wake_bit_function+0x0/0x50  [<ffffffff811d0999>] ?
> __find_get_block+0xa9/0x200  [<ffffffff811d15e6>] __lock_buffer+0x36/0x40
> [<ffffffffa017f2bb>] do_get_write_access+0x48b/0x520 [jbd2]
> [<ffffffffa017f4a1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
> [<ffffffffa01cd4a8>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
> [<ffffffffa01a6d63>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
> [<ffffffffa01a6ddc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
> [<ffffffffa017e3d5>] ? jbd2_journal_start+0xb5/0x100 [jbd2]
> [<ffffffffa01a70d0>] ext4_dirty_inode+0x40/0x60 [ext4]
> [<ffffffff811c69db>] __mark_inode_dirty+0x3b/0x1c0  [<ffffffff811b7102>]
> file_update_time+0xf2/0x170  [<ffffffff811a4f02>] pipe_write+0x312/0x6b0
> [<ffffffff81199c2a>] do_sync_write+0xfa/0x140  [<ffffffff810a6930>] ?
> autoremove_wake_function+0x0/0x40  [<ffffffff8119f964>] ?
> cp_new_stat+0xe4/0x100  [<ffffffff81014b39>] ? read_tsc+0x9/0x20
> [<ffffffff810b2a4f>] ? ktime_get_ts+0xbf/0x100  [<ffffffff8123a
>  e06>] ? security_file_permission+0x16/0x20
>  [<ffffffff81199f28>] vfs_write+0xb8/0x1a0  [<ffffffff8119b416>] ?
> fget_light_pos+0x16/0x50  [<ffffffff8119aa61>] sys_write+0x51/0xb0
> [<ffffffff810ee4ce>] ? __audit_syscall_exit+0x25e/0x290
> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
> INFO: task pickup:1236 blocked for more than 120 seconds.
>       Not tainted 2.6.32-696.3.2.el6.x86_64 #1 "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> pickup        D 0000000000000001     0  1236   1778 0x00000080
>  ffff880024c6f968 0000000000000086 0000000000000000 ffffea00019e4120
>  ffff880024c6f8e8 ffffffff811456e0 001744a7cc27fe9e ffffea00019e4120
>  ffff8800117ab4a8 00000002863b1637 ffff88003738d068 ffff880024c6ffd8 Call
> Trace:
>  [<ffffffff811456e0>] ? __lru_cache_add+0x40/0x90  [<ffffffff811d1400>] ?
> sync_buffer+0x0/0x50  [<ffffffff8154b0e3>] io_schedule+0x73/0xc0
> [<ffffffff811d1440>] sync_buffer+0x40/0x50  [<ffffffff8154b99a>]
> __wait_on_bit_lock+0x5a/0xc0  [<ffffffff811d1400>] ? sync_buffer+0x0/0x50
> [<ffffffff8154ba78>] out_of_line_wait_on_bit_lock+0x78/0x90
>  [<ffffffff810a69b0>] ? wake_bit_function+0x0/0x50  [<ffffffff811d0999>] ?
> __find_get_block+0xa9/0x200  [<ffffffff811d15e6>] __lock_buffer+0x36/0x40
> [<ffffffffa017f2bb>] do_get_write_access+0x48b/0x520 [jbd2]
> [<ffffffffa017f4a1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
> [<ffffffffa01cd4a8>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
> [<ffffffffa01a6d63>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
> [<ffffffffa01a6ddc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
> [<ffffffffa017e3d5>] ? jbd2_journal_start+0xb5/0x100 [jbd2]
> [<ffffffffa01a70d0>] ext4_dirty_inode+0x40/0x60 [ext4]
> [<ffffffff811c69db>] __mark_inode_dirty+0x3b/0x1c0  [<ffffffff811b7315>]
> touch_atime+0x195/0x1a0  [<ffffffff811a5684>] pipe_read+0x3e4/0x4d0
> [<ffffffff81199d6a>] do_sync_read+0xfa/0x140  [<ffffffff811e2e80>] ?
> ep_send_events_proc+0x0/0x110  [<ffffffff810a6930>] ?
> autoremove_wake_function+0x0/0x40  [<ffffffff8123ae06>] ?
> security_file_permission+0x16/0x20
>  [<ffffffff8119a665>] vfs_read+0xb5/0x1a0  [<ffffffff8119b416>] ?
> fget_light_pos+0x16/0x50  [<ffffffff8119a9b1>] sys_read+0x51/0xb0
> [<ffffffff810ee4ce>] ? __audit_syscall_exit+0x25e/0x290
> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b sd 2:0:0:0: [sda] task
> abort on host 2, ffff880036d7a680 sd 2:0:0:0: [sda] Failed to get
> completion for aborted cmd ffff880036d7a680 sd 2:0:0:0: [sda] SCSI device
> reset on scsi2:0
>
>
> If I just repair systems with that in their runtime history I should be on
> target for any concerns.
>
>
> Thanks for the responses...
>
>
>
>
> I've never really had fsck fail to correct errors when run manually. I
> have had the touch /forcefsck && reboot option decide that a fix was too
> risky and refuse to do it. The manual run would then fix it. Typically
> booting single user mode was enough to sort it out. If the problem disk was
> the root fs, then rescue media was the solution.
>
> We did have an iscsi array reboot which caused the filesystem to go
> read-only and at the time, we ran fsck -n to check for any errors. We did
> get a few errors of the type that you'd expect from a filesystem that is
> mounted, but not any inode or bitmap errors.
>
> We also had a hyper-v vm get in a wedged state because the backup
> mechanism called the filesystem freeze (fsfreeze) and then the backup
> software crashed and never unfroze the filesystem. We had to update the
> backup software and the hyper-v drivers for that.
>
> The only time I couldn't get fsck to behave was when a couple of systems
> had faulty RAM. In those cases the filesystem corruption was severe and it
> was easier to replace memory and reimage/restore from backups.
>
> So, I don't think fsck is showing false positives. You should be able to
> clear the errors with a manual fsck and I would definitely be concerned
> that a number of systems were showing fs errors.
>
> If you can't get the manual fsck to fix all of the errors, it might be
> worth opening a support ticket with RedHat.
>
> Hugh
>
>
>
> _______________________________________________
> rhelv6-list mailing list
> rhelv6-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhelv6-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20171222/5d042dce/attachment.htm>


More information about the rhelv6-list mailing list