From bothie at gmx.de Sun Jun 1 20:45:59 2014 From: bothie at gmx.de (Bodo Thiesen) Date: Sun, 1 Jun 2014 22:45:59 +0200 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140531185607.GA12748@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> Message-ID: <20140601224559.21fa0294@phenom> * Keith Keller hat geschrieben: Hello Keith > Yes, I do have backups. :) > # some 150k messages later > # again over 100k lines of errors > # again over 100k lines of errors > Repeated attempts seem to get farther into repairs, but there's still a > large number of repairs reported, which seems scary, The number of errors and the kind of errors suggest, that there is just bogus data on the file system, either due to hardware errors (i.e. misdirected writes or writes of bogus data) or due to lower level software problems (e.g. in the lvm code). Either way. I suggest, you check whether you can manually save files modified since your last backup and then just restore your backups. After that, check your manually saved files to be clean before incorporating them back into the new live file system. With a backup and the current state of your file system it's hardly worth the efford to even try to get the errors fixed. You will invest more time in fixing the errors than in just restoring the backup. And even after fixing all the errors, many files may be corrupted anyways. And in my experience, most of the time, no data is better than bad data. Regards, Bodo From tytso at mit.edu Mon Jun 2 01:05:09 2014 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 1 Jun 2014 21:05:09 -0400 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140531185607.GA12748@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> Message-ID: <20140602010509.GA8323@thunk.org> Unfortunately, there has been a huge number of bug fixes for ext4's online resize since 2.6.32 and 1.42.11. It's quite possible that you hit one of them. > The 51.8% seems very suspicious to me. A few weeks ago, I did an online > resize2fs, and the original filesystem was about 52% the size of the new > one (from 2.7TB to 5.3TB). The resize2fs didn't report any errors, and > I haven't seen any physical errors in the logs, so this is the first > indication I've had of a problem. Well, actually it's not quite that simple. There are multiple passes to e2fsck, and the first pass is estimated to be 70% of the total e2fsck run. So 51.8% reported by the progress means e2fsck had gotten 74% of the way through pass 1. So that would mean that it had got through about inodes associated to about 3.9TB into the file system. That being said, it's pretty clear that portions of the inode table and block group descriptor was badly corrupted. So I suspect there isn't going to be much that can be done to try to repair the file system completely. If there are specific files you need to recover, I'd suggest trying to recover them first before trying to do anything else. The good news is that probably around 75% of your files can probably be recovered. - Ted From kkeller at wombat.san-francisco.ca.us Mon Jun 2 02:43:12 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Sun, 1 Jun 2014 19:43:12 -0700 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602010509.GA8323@thunk.org> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> Message-ID: <20140602024312.GA6290@wombat.san-francisco.ca.us> Hi Bodo and Ted, Thank you both for your responses; they confirm what I thought might be the case. Knowing that I can try to proceed with your suggestions. I do have some followup questions for you: On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote: > Unfortunately, there has been a huge number of bug fixes for ext4's > online resize since 2.6.32 and 1.42.11. It's quite possible that you > hit one of them. Would this scenario be explained by these bugs? I'd expect that if a resize2fs failed, it would report a problem pretty quickly. (But perhaps that's the nature of some of these bugs.) > Well, actually it's not quite that simple. There are multiple passes > to e2fsck, and the first pass is estimated to be 70% of the total > e2fsck run. So 51.8% reported by the progress means e2fsck had gotten > 74% of the way through pass 1. So that would mean that it had got > through about inodes associated to about 3.9TB into the file system. Aha! Thanks for the clarification. That's certainly well more than the original fs size. > That being said, it's pretty clear that portions of the inode table > and block group descriptor was badly corrupted. So I suspect there > isn't going to be much that can be done to try to repair the file > system completely. If there are specific files you need to recover, > I'd suggest trying to recover them first before trying to do anything > else. The good news is that probably around 75% of your files can > probably be recovered. So, now when I try to mount, I get an error: # mount -o ro -t ext4 /dev/mapper/vg1--sdb-lv_vz /vz/ mount: Stale NFS file handle That's clearly a spurious error, so I checked dmesg: # dmesg|tail [159891.219387] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42252 failed (36703!=0) [159891.219586] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42253 failed (51517!=0) [159891.219786] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42254 failed (51954!=0) [159891.220025] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42496 failed (37296!=0) [159891.220225] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42497 failed (31921!=0) [159891.220451] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42498 failed (2993!=0) [159891.220650] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42499 failed (59056!=0) [159891.220850] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42500 failed (28571!=22299) [159891.225762] EXT4-fs (dm-0): get root inode failed [159891.227436] EXT4-fs (dm-0): mount failed and before that there are many other checksum failed errors. When I try a rw mount I get these messages instead: [160052.031554] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 0 failed (43864!=0) [160052.031782] EXT4-fs (dm-0): group descriptors corrupted! Are there any other options I can try to force the mount so I can try to get to the changed files? If that'll be challenging, I'll just sacrifice those files, but if it'd be relatively straightforward I'd like to make the attempt. Thanks again! --keith -- kkeller at wombat.san-francisco.ca.us From kkeller at wombat.san-francisco.ca.us Mon Jun 2 02:56:30 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Sun, 1 Jun 2014 19:56:30 -0700 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602024312.GA6290@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> Message-ID: <20140602025630.GA6532@wombat.san-francisco.ca.us> Hi again all, I apologize for not asking this in my first message; I just remembered the question after sending. On Sun, Jun 01, 2014 at 07:43:12PM -0700, Keith Keller wrote: > > On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote: > > Unfortunately, there has been a huge number of bug fixes for ext4's > > online resize since 2.6.32 and 1.42.11. It's quite possible that you > > hit one of them. > > Would this scenario be explained by these bugs? I'd expect that if a > resize2fs failed, it would report a problem pretty quickly. (But > perhaps that's the nature of some of these bugs.) I have a very similar second server which has undergone a similar chain of events, an initial ~2.5tb fs followed by a resize later. I believe that it has been fsck'd since the resize (but don't quote me on that). Am I likely to run into this issue with this fs? And if I do, what steps should I do differently (e.g., use the latest e2fsck right away; don't e2fsck, get files off quickly, and mke2fs; something else)? --keith -- kkeller at wombat.san-francisco.ca.us From tytso at mit.edu Mon Jun 2 03:24:51 2014 From: tytso at mit.edu (Theodore Ts'o) Date: Sun, 1 Jun 2014 23:24:51 -0400 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602024312.GA6290@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> Message-ID: <20140602032451.GA14786@thunk.org> On Sun, Jun 01, 2014 at 07:43:12PM -0700, Keith Keller wrote: > > That's clearly a spurious error, so I checked dmesg: > > [159891.225762] EXT4-fs (dm-0): get root inode failed > [159891.227436] EXT4-fs (dm-0): mount failed The "get root inode failed" is rather unfortunate. Try running "debugfs /dev/dm0" and then use the "stat /" command. You can use debugfs to look at the file system and recover individual files without needing to mount it. However, if the root directory has been compromised, that makes using debugfs quite a bit more difficult. You can look at inodes by inode number by surrounding them with angle brackets. i.e., if you want to look at inode 12345, you could say "stat <12345>", and if you inode 12345 is a directory, you could list it via "ls <12345>", etc. See the debugfs man page for more details. - Ted From kkeller at wombat.san-francisco.ca.us Mon Jun 2 03:54:24 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Sun, 1 Jun 2014 20:54:24 -0700 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602032451.GA14786@thunk.org> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> <20140602032451.GA14786@thunk.org> Message-ID: <20140602035424.GA7214@wombat.san-francisco.ca.us> On Sun, Jun 01, 2014 at 11:24:51PM -0400, Theodore Ts'o wrote: > > The "get root inode failed" is rather unfortunate. Heh, I like your understatement. :) I think this helps answer part of my questions in my second email: I should probably try to preserve changes from last backup before getting too deep into a tricky e2fsck. At one point the fs was still mountable, so I could have tried to copy files off first. (In a physical failure scenario it's exactly what I'd have done, but I wasn't thinking of that in this case.) > Try running "debugfs /dev/dm0" > > and then use the "stat /" command. No happiness: # ./e2fsprogs-1.42.10/debugfs/debugfs /dev/dm-0 debugfs 1.42.10 (18-May-2014) debugfs: stat / stat: A block group is missing an inode table while reading inode 2 My hunch is that it would take a large and lucky effort to try to get anything useful off this fs. Does that seem like a reasonable guess? --keith -- kkeller at wombat.san-francisco.ca.us From tytso at mit.edu Mon Jun 2 11:30:25 2014 From: tytso at mit.edu (Theodore Ts'o) Date: Mon, 2 Jun 2014 07:30:25 -0400 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602035424.GA7214@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> <20140602032451.GA14786@thunk.org> <20140602035424.GA7214@wombat.san-francisco.ca.us> Message-ID: <20140602113025.GA14276@thunk.org> On Sun, Jun 01, 2014 at 08:54:24PM -0700, Keith Keller wrote: > > Heh, I like your understatement. :) I think this helps answer part of > my questions in my second email: I should probably try to preserve > changes from last backup before getting too deep into a tricky e2fsck. > At one point the fs was still mountable, so I could have tried to copy > files off first. (In a physical failure scenario it's exactly what I'd > have done, but I wasn't thinking of that in this case.) Yeah, it would have been nice to have preserved the outputs from earlier e2fsck runs just so we could see what e2fsck did that apparently ended up overwriting parts of the block group descriptors. It might be possible to read the block group descriptors associated with one of the backup superblocks to find the portion of the inode table associated with inode #2. (i.e., try using "debugfs -s 32768 /dev/dm0"). One of the things that might have detected the problem sooner, and perhaps allowed you to recover more smoothly, would be to try running e2fsck immediately after running resize2fs. With the vintage kernel and e2fsprogs shipping with the version of CentOS you are apparently using, online resizing is probably safer than off-line --- although if you are using the 1.42.10 version of resize2fs and the 2.6.32 based kernel, I'd probably sugges off-line resizes as being more safe. And either way, running e2fsck on the file system after the resize is probably a good idea. > My hunch is that it would take a large and lucky effort to try to get > anything useful off this fs. Does that seem like a reasonable guess? I'd try using the backup superblock approach, but if that doesn't work, yes, that's probably a reasonable conclusion. Regards, - Ted From sandeen at redhat.com Mon Jun 2 15:51:48 2014 From: sandeen at redhat.com (Eric Sandeen) Date: Mon, 02 Jun 2014 10:51:48 -0500 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602024312.GA6290@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> Message-ID: <538C9D94.40302@redhat.com> On 6/1/14, 9:43 PM, Keith Keller wrote: > Hi Bodo and Ted, > > Thank you both for your responses; they confirm what I thought might be > the case. Knowing that I can try to proceed with your suggestions. I > do have some followup questions for you: > > > On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote: >> Unfortunately, there has been a huge number of bug fixes for ext4's >> online resize since 2.6.32 and 1.42.11. It's quite possible that you >> hit one of them. > > Would this scenario be explained by these bugs? I'd expect that if a > resize2fs failed, it would report a problem pretty quickly. (But > perhaps that's the nature of some of these bugs.) Well, for what it's worth, there have been several resize fixes shipped in RHEL6/Centos6, so it's not just vanilla 1.42.11 or 2.6.32. But we walk a fine line between too much churn and risk, and fixing the serious problems, so it's possible that you hit an unfixed case. I think it's fairly hard to know without a reproducer. Your corruption looks bad enough that I tend to agree with Bodo - that it may be some more fundamental underlying storage problem. However, some semi-recent fixes, for example: resize2fs: reserve all metadata blocks for flex_bg file systems have yet to make it into RHEL6 (they will soon...) -Eric From kkeller at wombat.san-francisco.ca.us Mon Jun 2 17:22:53 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Mon, 2 Jun 2014 10:22:53 -0700 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602113025.GA14276@thunk.org> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> <20140602032451.GA14786@thunk.org> <20140602035424.GA7214@wombat.san-francisco.ca.us> <20140602113025.GA14276@thunk.org> Message-ID: <20140602172253.GA19276@wombat.san-francisco.ca.us> On Mon, Jun 02, 2014 at 07:30:25AM -0400, Theodore Ts'o wrote: > Yeah, it would have been nice to have preserved the outputs from > earlier e2fsck runs just so we could see what e2fsck did that > apparently ended up overwriting parts of the block group descriptors. I think I still have most of my e2fsck outputs. The only ones I don't have, which might be the most helpful, are the initial one or two, when I thought the recovery would be easy. I can post them somewhere if you think it'd be helpful in tracking down a bug (they are probably too long to post to the list). > One of the things that might have detected the problem sooner, and > perhaps allowed you to recover more smoothly, would be to try running > e2fsck immediately after running resize2fs. With the vintage kernel > and e2fsprogs shipping with the version of CentOS you are apparently > using, online resizing is probably safer than off-line --- although if > you are using the 1.42.10 version of resize2fs and the 2.6.32 based > kernel, I'd probably sugges off-line resizes as being more safe. And > either way, running e2fsck on the file system after the resize is > probably a good idea. Well, if you're going to run e2fsck, you may as well do an offline resize, since you have to umount the filesystem anyway. Just to clarify, the kernel I'm running is actually an OpenVZ kernel, so I wouldn't rely totally on the version number. I want to try to find their changelogs to see what sort of bug fixes they've included in their kernel. The resize2fs was done under an older OpenVZ kernel; the current kernel is the latest OpenVZ kernel. Ah, here is the latest changelog: https://openvz.org/Download/kernel/rhel6/042stab090.2 The kernel which was running for the resize was this one (which is definitely old and crufty): https://openvz.org/Download/kernel/rhel6/042stab055.10 > I'd try using the backup superblock approach, but if that doesn't > work, yes, that's probably a reasonable conclusion. Great! Again, thank you so much for all your help. --keith -- kkeller at wombat.san-francisco.ca.us From bothie at gmx.de Mon Jun 2 20:52:56 2014 From: bothie at gmx.de (Bodo Thiesen) Date: Mon, 2 Jun 2014 22:52:56 +0200 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602025630.GA6532@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> <20140602025630.GA6532@wombat.san-francisco.ca.us> Message-ID: <20140602225256.2ca85e78@phenom> * Keith Keller hat geschrieben: Hi Keith > I have a very similar second server which has undergone a similar chain > of events, an initial ~2.5tb fs followed by a resize later. I believe > that it has been fsck'd since the resize (but don't quote me on that). > Am I likely to run into this issue with this fs? And if I do, what > steps should I do differently (e.g., use the latest e2fsck right away; > don't e2fsck, get files off quickly, and mke2fs; something else)? umount and then e2fsck -f -n -C 0 (the -C 0 is only for the progress bar) If it report the fs to be clean (a hand full of errors like b_size wrong or deleted inode has zero dtime and stuff like that in low number is ok - to be sure, you might want to post that output here and ask before removing the -n to fix those errors), you should be save. If it reports tons of errors or includes invalid blocks or checksum errors. mount -o ro and backup everything and then mke2fs. Regards, Bodo From bothie at gmx.de Mon Jun 2 21:04:52 2014 From: bothie at gmx.de (Bodo Thiesen) Date: Mon, 2 Jun 2014 23:04:52 +0200 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602010509.GA8323@thunk.org> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> Message-ID: <20140602230452.135307cd@phenom> * "Theodore Ts'o" hat geschrieben: Hi Theodore. > That being said, it's pretty clear that portions of the inode table > and block group descriptor was badly corrupted. [...] Keith is not the first one with problems of this class and he will probably not be the last one. He later told us, that at first, mounting the file system still worked. And that acually means (taking low level software errors or hardware errors out of the equation), that actually e2fsck created the current situation. In my opinion, e2fsck has one major flaw actually causing this sort of troubles: e2fsck tries to fix the errors as it actually finds them. That's bad, because at that point it's still unclear, whether the problem can be safely fixed yet. So, the thing e2fsck SHOULD do is: 1. Scan the file system for all errors, remember the errors BUT DON'T TOUCH ANYTHING. 2. Once all errors (including differences in allocation bitmaps) have been collected, it should then first summarize the bugs (like: 100 times checksum errors, 56000 times illegal pointers etc) and then ask, what to do. 3. Actually fix the errors one-by-one taking into account calculated allocation bitmaps (instead of the ones stored in the file system). Some errors have to be fixed before other ones, resolving multiple used clusters being the first kind of errors to be fixed). This would not only allow the user to cancel at this point with no changes to the file system being done yet, it would also allow e2fsck to make sure, that newly allocated clusters will always go to clusters, which are actually not in use. What do you think about this? Regards, Bodo From kkeller at wombat.san-francisco.ca.us Mon Jun 2 21:27:07 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Mon, 2 Jun 2014 14:27:07 -0700 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602230452.135307cd@phenom> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602230452.135307cd@phenom> Message-ID: <20140602212707.GA22690@wombat.san-francisco.ca.us> Hi all, On Mon, Jun 02, 2014 at 11:04:52PM +0200, Bodo Thiesen wrote: > > Keith is not the first one with problems of this class and he will > probably not be the last one. He later told us, that at first, mounting > the file system still worked. Is there any value in discussing this issue in keeping this broken filesystem available for debugging purposes? I would like at this point to destroy the filesystem and start my restores, but if it'd be helpful to try to get information from the existing fs I can keep it around a while longer. (I do still have whatever fsck logs that I saved, and can make them available whether I destroy the fs or not.) --keith -- kkeller at wombat.san-francisco.ca.us From ibaldo at adinet.com.uy Fri Jun 6 23:57:09 2014 From: ibaldo at adinet.com.uy (Ivan Baldo) Date: Fri, 06 Jun 2014 20:57:09 -0300 Subject: Recommended minimal amount of free space to keep? Message-ID: <53925555.5070104@adinet.com.uy> Hello. So, LVM is cool, having different partitions for different stuff is cool, and of course Ext4 is cool and *reliable*. So, we create some logical partitions and put ext4 on them, reserving LVM space for growing those partitions or even making new ones later. The thing is, I would like to keep every filesystem as small as it can be, but without degrading the performance too much. I guess that having a filesystem 99% full will create too much fragmentation and many other issues, but having them only 30% full seems like a waste. Currently I try to keep them at 70% full utilization but I have not based that on anything just guess. So, what % hysteresis do you recommend? For example, when they get 70% full then grow them so that they get 50% full? Other values? Thanks for the hints! Good day everyone. -- Ivan Baldo - ibaldo at adinet.com.uy - http://ibaldo.codigolibre.net/ From Montevideo, Uruguay, at the south of South America. Freelance programmer and GNU/Linux system administrator, hire me! Alternatives: ibaldo at codigolibre.net - http://go.to/ibaldo From kkeller at wombat.san-francisco.ca.us Sat Jun 7 04:04:55 2014 From: kkeller at wombat.san-francisco.ca.us (Keith Keller) Date: Fri, 6 Jun 2014 21:04:55 -0700 Subject: [resolved] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602225256.2ca85e78@phenom> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602024312.GA6290@wombat.san-francisco.ca.us> <20140602025630.GA6532@wombat.san-francisco.ca.us> <20140602225256.2ca85e78@phenom> Message-ID: <20140607040455.GA4562@wombat.san-francisco.ca.us> On Mon, Jun 02, 2014 at 10:52:56PM +0200, Bodo Thiesen wrote: > > umount and then e2fsck -f -n -C 0 > (the -C 0 is only for the progress bar) I was able to run this on my other ext4 fs, and it came back clean. That resize2fs occurred under OpenVZ kernel 2.6.32-042stab075. So if the problem was with a bug in ext? resizing, it may have been patched between stab055 and stab075. (It's still not clear that it was a bug; it could still be a hardware issue that I haven't seen errors for, or perhaps an LVM bug that was fixed independent of the OpenVZ kernels.) Finally, once again, thank you all for your help. --keith -- kkeller at wombat.san-francisco.ca.us From bothie at gmx.de Sat Jun 7 19:03:56 2014 From: bothie at gmx.de (Bodo Thiesen) Date: Sat, 7 Jun 2014 21:03:56 +0200 Subject: [long] major problems on fs; e2fsck running out of memory In-Reply-To: <20140602212707.GA22690@wombat.san-francisco.ca.us> References: <20140531185607.GA12748@wombat.san-francisco.ca.us> <20140602010509.GA8323@thunk.org> <20140602230452.135307cd@phenom> <20140602212707.GA22690@wombat.san-francisco.ca.us> Message-ID: <20140607210356.441295db@phenom> * Keith Keller hat geschrieben: > Is there any value in discussing this issue in keeping this broken > filesystem available for debugging purposes? I don't believe so. I have my own broken fs here (totally different story - and I knew, I couldn't trust neither the kernel nor e2fsck - so I used dm-cow for all testing). > I would like at this point > to destroy the filesystem and start my restores, but if it'd be helpful > to try to get information from the existing fs I can keep it around a > while longer. (I do still have whatever fsck logs that I saved, and can > make them available whether I destroy the fs or not.) If you want to keep the important fs structures, run e2image. No need to keep the entire file system. But since Ted didn't respond, I assume, there is no need to do even that. Regards, Bodo From adilger at dilger.ca Sun Jun 8 01:47:27 2014 From: adilger at dilger.ca (Andreas Dilger) Date: Sat, 7 Jun 2014 19:47:27 -0600 Subject: Recommended minimal amount of free space to keep? In-Reply-To: <53925555.5070104@adinet.com.uy> References: <53925555.5070104@adinet.com.uy> Message-ID: <1EF4C142-F8C8-43AB-A780-228CC4467B69@dilger.ca> You will get better long- term performance if you don't fill the filesystems more than about 70% full. Above 90% you are permanently fragmenting the filesystem (depends on total size and workload also). There is the secondary issue that mke2fs can lay out the filesystem better if it is done right at the start rather than resize2fs doing it in small increments. I think as long as you are growing each filesystem in chunks of, say, 16GB or more the performance should stay reasonable. In many cases, you can have a rough guess at how much space you will need in the filesystems, so it doesn't make sense to keep huge amounts of free space around. Cheers, Andreas > On Jun 6, 2014, at 17:57, Ivan Baldo wrote: > > Hello. > So, LVM is cool, having different partitions for different stuff is cool, and of course Ext4 is cool and *reliable*. > So, we create some logical partitions and put ext4 on them, reserving LVM space for growing those partitions or even making new ones later. > The thing is, I would like to keep every filesystem as small as it can be, but without degrading the performance too much. > I guess that having a filesystem 99% full will create too much fragmentation and many other issues, but having them only 30% full seems like a waste. > Currently I try to keep them at 70% full utilization but I have not based that on anything just guess. > So, what % hysteresis do you recommend? For example, when they get 70% full then grow them so that they get 50% full? Other values? > Thanks for the hints! > Good day everyone. > > -- > Ivan Baldo - ibaldo at adinet.com.uy - http://ibaldo.codigolibre.net/ > From Montevideo, Uruguay, at the south of South America. > Freelance programmer and GNU/Linux system administrator, hire me! > Alternatives: ibaldo at codigolibre.net - http://go.to/ibaldo > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From ibaldo at adinet.com.uy Mon Jun 9 14:14:47 2014 From: ibaldo at adinet.com.uy (Ivan Baldo) Date: Mon, 09 Jun 2014 11:14:47 -0300 Subject: Recommended minimal amount of free space to keep? In-Reply-To: <1EF4C142-F8C8-43AB-A780-228CC4467B69@dilger.ca> References: <53925555.5070104@adinet.com.uy> <1EF4C142-F8C8-43AB-A780-228CC4467B69@dilger.ca> Message-ID: <5395C157.1060200@adinet.com.uy> Hello. Thanks a lot for the information and hints! I think some long time ago I have read somewhere that 70% full utilization is a good limit but couldn't find it on the net now though, thats why I asked here, thanks for confirming that. So, is a percentage thing, not a free space absolute value right? If we have a 1000G filesystem then we should have at least 300G free to avoid fragmentation in the long term? Or for that filesystems we could have just 50G free and get away with it long term? Maybe these hints could be in the ext4 manpage or maybe in the wiki. Again: thanks!!! Bye. El 07/06/14 22:47, Andreas Dilger escribi?: > You will get better long- term performance if you don't fill the > filesystems more than about 70% full. Above 90% you are permanently > fragmenting the filesystem (depends on total size and workload also). > > There is the secondary issue that mke2fs can lay out the filesystem > better if it is done right at the start rather than resize2fs doing it in > small increments. I think as long as you are growing each filesystem > in chunks of, say, 16GB or more the performance should stay reasonable. > > In many cases, you can have a rough guess at how much space you will > need in the filesystems, so it doesn't make sense to keep huge amounts > of free space around. > > Cheers, Andreas > >> On Jun 6, 2014, at 17:57, Ivan Baldo wrote: >> >> Hello. >> So, LVM is cool, having different partitions for different stuff is cool, and of course Ext4 is cool and *reliable*. >> So, we create some logical partitions and put ext4 on them, reserving LVM space for growing those partitions or even making new ones later. >> The thing is, I would like to keep every filesystem as small as it can be, but without degrading the performance too much. >> I guess that having a filesystem 99% full will create too much fragmentation and many other issues, but having them only 30% full seems like a waste. >> Currently I try to keep them at 70% full utilization but I have not based that on anything just guess. >> So, what % hysteresis do you recommend? For example, when they get 70% full then grow them so that they get 50% full? Other values? >> Thanks for the hints! >> Good day everyone. >> -- Ivan Baldo - ibaldo at adinet.com.uy - http://ibaldo.codigolibre.net/ From Montevideo, Uruguay, at the south of South America. Freelance programmer and GNU/Linux system administrator, hire me! Alternatives: ibaldo at codigolibre.net - http://go.to/ibaldo