[linux-lvm] badblocks

Mon May 22 15:36:26 UTC 2023

 > graeme vetterlein <graeme.lvm at vetterlein.com> writes:
 >
 > > I have a desktop Linux box (Debian Sid) with a clutch of disks in it
 > > (4 or 5)  and have mostly defined each disk as a volume group.
 >
 > Why?  The purpose of a VG is to hold multiple PVs.

One or more I believe :-) I mainly have an interest in being able move and
resize logical volumes, less with moving things between physical 
volumes.  (but
see later). ABTW one SATA port is wired to a "dock" so physical disks 
come and
go.

I was able to "replace" 1 broken 2TB 1 working 2TB drive with a new 4TB 
drive
without any filesystem creation, copying etc, just using LVM commands:

lvm> pvdisplay
..
   --- Physical volume ---
   PV Name               /dev/sdc1
   VG Name               SAMSUNG_2TB
..
   --- Physical volume ---
   PV Name               /dev/sdb5
   VG Name               real-vg
   --- Physical volume ---
   PV Name               /dev/sda1
   VG Name               TOSHIBA_2TB
   "/dev/sdd1" is a new physical volume of "<3.64 TiB"
   --- NEW Physical volume ---
   PV Name               /dev/sdd1
   VG Name

(sda is "bad" disk, sdd is "new" disk)
This is what I did...

    vgextend TOSHIBA_2TB /dev/sdd1      --- adds /dev/sdd1 into the 
existing VG
    pvmove /dev/sda1               --- moves everything off sda1
    vgreduce TOSHIBA_2TB /dev/sda1      --- take sda1 out of the VG
    vgcfgbackup

    vgrename TOSHIBA_2TB BARRACUDA_4TB
    vgcfgbackup
..
     umount /dev/mapper/SAMSUNG_2TB-data
     umount /dev/mapper/SAMSUNG_2TB-vimage
     lvchange -an  SAMSUNG_2TB/data
     lvchange -an  SAMSUNG_2TB/vimage
     vgmerge  BARRACUDA_4TB SAMSUNG_2TB     -- I believe this puts 
everything into BARRACUDA_4TB (oddly right to left)
     pvmove /dev/sdc1             -- Moves everything off sdc1
     vgreduce BARRACUDA_4TB /dev/sdc1    -- Nothing should be in sdc1, 
so drop it from the group

.. no need for renames, just update fstab

This was without taking the system down. I didn't fancy trying to do 
that with fisk(1) and dd(1)  :-)

 > > Now a key point is *some of the disks are NAS grade disks.* This means
 > > they do NOT reallocate bad sectors silently. They report IO errors and
 > > leave it to the OS (i.e. the old fashion way of doing this)
 >
 > Are you sure that you are not confusing the ERC feature here? That lets
 > the drive give up in a reasonable amount of time and report the (read)
 > error rather than keep trying.  Most often there is nothing wrong with
 > the disk physically and writing to the sector will succeed.  If it
 > doesn't, then the drive will remap it.

Reasonably sure. The disk is over 10 years old (smart shows > 19000 hours in
use, it's powered on a few hours/day) and it's only started getting 
errors since
19,000 hours, I get a popup warnings almost every day now. Smart shows 
it has NO
reallocated sectors.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail 
Always       -       0
   2 Throughput_Performance  0x0005   140   140   054    Pre-fail 
Offline      -       69
   3 Spin_Up_Time            0x0007   127   127   024    Pre-fail 
Always       -       296 (Average 299)
   4 Start_Stop_Count        0x0012   100   100   000    Old_age 
Always       -       3554
   5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail 
Always       -       0
   8 Seek_Time_Performance   0x0005   124   124   020    Pre-fail 
Offline      -       33
   9 Power_On_Hours          0x0012   098   098   000    Old_age 
Always       -       19080
  10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       2959
192 Power-Off_Retract_Count 0x0032   097   097   000    Old_age 
Always       -       3616
193 Load_Cycle_Count        0x0012   097   097   000    Old_age 
Always       -       3616
194 Temperature_Celsius     0x0002   250   250   000    Old_age 
Always       -       24 (Min/Max 13/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age 
Always       -       57

Now, I know it's possible that these CRC errors are e.g. 'cable related' but
I've swapped the cable and moved SATA ports to no effect. In the end I 
decided
10 years was enough and bought a new drive.

Not withstanding this, it's still the case that it's perfectly possible 
that I
could, and will, get real permanent errors on a disk. In fact I gather 
SSDs can
suffer failure modes that make a large group of sectors inaccessible (bad).

In the past a filesystems acted as if it was dealing with the real physical
sectors on disk (because it was!) and so could simply map these to an inode
holding bad blocks. Now however only the PV really has any knowledge of the
physical sectors so it needs to do any such mapping. Consider the situation:

   A VG has 5 physical disks, 1 disk has a single bad block on it. If I 
resize
   and move around LVs and filesystems, that single bad block is going 
top crop
   up in various filesystems causing issues all over the place.

Now this particular instance of this problem is only of academic interest,
however (the drive is replaced). However I have several (QNAP) NAS with 
many NAS
grade drives in them. Due to annoying bugs in QTS I plan to reinstall 
these with
Debian. I'm thinking I'll probably use LVM2 and raid striping (so I will 
have VG
with many PV in them :-) )

 > > Then *the penny dropped!* The only component that has a view on the
 > > real physical disk is lvm2 and in particular the PV ...so if anybody
 > > can mark(and avoid) badblocks it's the PV...so I should be looking for
 > > something akin to the -cc option of fsck , applied to a PV command?
 >
 > Theoretically yes, you can create a mapping to relocate the block
 > elsewhere,
Any hints? lvm2 commands? I can RTFM but a pointing finger would help.

 > but there are no tools that I am aware of to help with this.
 > Again, are you sure there is actually something wrong with the disk?
 > Try writing to the "bad block" and see what happens.  Every time I have
 > done this in the last 20 years, the problem just goes away. Usually
 > without even triggering a reallocation.  This is because the data just
 > got a little scrambled even though there is nothing wrong with the
 > medium.

I've certainly met real bad blocks, possibly not in the last 20 years. 
When a
disk costs more than I earned in a year it was worth the effort to remap :-)