From markjballard at googlemail.com  Tue Sep  3 08:46:39 2013
From: markjballard at googlemail.com (Mark Ballard)
Date: Tue, 3 Sep 2013 09:46:39 +0100
Subject: ext3 / ext4 on USB flash drive?
In-Reply-To: <20130830133224.GB27699@thunk.org>
References: <CAMJSjjuQxzzj0MPG9pwQMnJmDWjT73ks4YaaSqJmgxakqOnwqw@mail.gmail.com>
	<20130829154657.GC30918@thunk.org> <521F7747.2000003@redhat.com>
	<CAMJSjjufBQgF=Vyg5Wj-4=B4i=iD2YELhya_is0OVVKSMoGnrA@mail.gmail.com>
	<20130830133224.GB27699@thunk.org>
Message-ID: <CAMJSjjvKL4aYzbHAZNJM=PEAXJbWRAOd4evzRGBCRj+SgYSbsg@mail.gmail.com>

>From the little I have heard about control systems for cars, which was
some years ago, they were blockhead proprietary. The analogy would
only work if computing was customarily blackbox technology, which it
isn't. I'd be surprised if there were any branded flash drives that
contained less than their advertised amount of storage.

That leaves the question of what is going on under the hood in what is
probably the vast majority of devices where the flash isn't
fraudulent. And whether my system handles it correctly. My system
leaves me with no idea of either (though my hope holds out for some
tools I bookmarked recently). Reference to forums and specialist
websites gives genuine cause for doubt. Yet I thought it was usual for
system software to have a good angle on how its hardware was
constructed and what it was doing. I thought they worked in symbiosis,
and that this maintained by mutual necessity. I thought the symbiosis
was kept unassailably whole by a common purpose: the user.

What you say implies that this symbiosis has been broken by the
commercial greed of flash manufacturers. Or that it is by neglect on
their part, or lazyness, or some other cause of a fissure in industry
relations.

Whatever the reason, it raises another question, and that is what must
be done so that I can simply format my USB without a concern and get
back to my work.

> even for a non-fradulent USB stick or SD card, there is no single way to measure "FTL quality".  ... there are some things (such as the erase block size) which would be useful for tuning file system performance.  And the technical people I've talked to at various Flash manufacturers all agree it's pointless to hide this information, but
the product managers tend to be the roadblock.
>

This is perhaps telling. One would imagine the USB Industry Forum
meeting the Association of (File) System Software Scribes or whatever
at routine collegiate meetings in Las Vegas hotels, and so on.

Which flash manufacturers have refused to collaborate? Why has the
fabled industry forum failed?

mb.


From tytso at mit.edu  Tue Sep  3 12:23:56 2013
From: tytso at mit.edu (Theodore Ts'o)
Date: Tue, 3 Sep 2013 08:23:56 -0400
Subject: ext3 / ext4 on USB flash drive?
In-Reply-To: <CAMJSjjvKL4aYzbHAZNJM=PEAXJbWRAOd4evzRGBCRj+SgYSbsg@mail.gmail.com>
References: <CAMJSjjuQxzzj0MPG9pwQMnJmDWjT73ks4YaaSqJmgxakqOnwqw@mail.gmail.com>
	<20130829154657.GC30918@thunk.org> <521F7747.2000003@redhat.com>
	<CAMJSjjufBQgF=Vyg5Wj-4=B4i=iD2YELhya_is0OVVKSMoGnrA@mail.gmail.com>
	<20130830133224.GB27699@thunk.org>
	<CAMJSjjvKL4aYzbHAZNJM=PEAXJbWRAOd4evzRGBCRj+SgYSbsg@mail.gmail.com>
Message-ID: <20130903122356.GC15457@thunk.org>

On Tue, Sep 03, 2013 at 09:46:39AM +0100, Mark Ballard wrote:
> isn't. I'd be surprised if there were any branded flash drives that
> contained less than their advertised amount of storage.

The vast majority of flash sold, especially the cheap-grade flash
(i.e., SD Cards and USB sticks) sold through retail channels, is
probably unbranded.  Because it's cheaper, and for most users, (a)
price is a feature, and (b) they are only using flash as a temporary
transport medium (e.g., here let me give you my slide presentation;
can I borrow a USB stick?), and (c) they are much more likely to lose
said flash device before it is likely to go bad, or even gets 100%
filled.

It's for the same reason that the quality of experience in airplanes
travel has degraded so badly.  The market has spoken; and consumers
have said, at least by their actions, that price is important than
anything else.

> Whatever the reason, it raises another question, and that is what must
> be done so that I can simply format my USB without a concern and get
> back to my work.

Buy high quality flash which has been explicitly reviewed by a source
you trust.  There isn't much else you can really do....

> This is perhaps telling. One would imagine the USB Industry Forum
> meeting the Association of (File) System Software Scribes or whatever
> at routine collegiate meetings in Las Vegas hotels, and so on.
> 
> Which flash manufacturers have refused to collaborate? Why has the
> fabled industry forum failed?

They are collaborating --- with the mass buyers of their flash.  If
you are a purchasing flash by the millions, then you can get all of
this information (under NDA), and you can dictate the quality of the
flash which is appropriate for your use case.

This even afflicted Microsoft's Windows Phone, where they had some
manufacturers provide an SD Card slot.  This meant that end users
could replace their carefully tested-and-selected-for-performance SD
cards which was shipped with their phone with crap sold at the
checkout counter, and since the phone's root file system was stored on
the SD card, performance when into the crapper, and guess who the
customers blamed?  Not the flash manufacturer, and not the handset
manufacturer for including a removable SD-card slot instead of using a
fixed eMMC flash device, but Microsoft.

As a result, many handset manufacutrers these days do *not* have an SD
card slot, and if they do, they don't allow the root file system to be
stored on the SD card, and the SD card can only be used for auxilary
or media storage (for which even really crappy flash is generally good
enough).

So the market is working; it's just working for the most common use
case, and the most common desire of the customers who are doing the
buying.  And that means there will be high quality stuff that costs
$$$, and really cheap stuff where you get what you pay for, and
hardware manufacters who buy flash devices by the million unit order
will get better deals, and all of the low-level information under NDA.

All hail the free market....  as my libertarian friends would say,
"Huge success".

						- Ted


From prichard at med.wayne.edu  Sat Sep  7 02:46:35 2013
From: prichard at med.wayne.edu (Richards, Paul Franklin)
Date: Sat, 7 Sep 2013 02:46:35 +0000
Subject: Strange fsck.ext3 behavior - infinite loop
In-Reply-To: <20130830182302.GE30385@thunk.org>
References: <90D96432E685D84E9A62AF83962A689737E7AA68@MED-CORE07A.med.wayne.edu>
	<4764C2B2-63C9-4FC5-A99B-3D8BEB004995@dilger.ca>,
	<20130830182302.GE30385@thunk.org>
Message-ID: <90D96432E685D84E9A62AF83962A689737E7EE25@MED-CORE07A.med.wayne.edu>

It appears that the RAID has hardware problems as three of the disks are being detected as "unhealthy".

Thank you all for your help!
________________________________________
From: Theodore Ts'o [tytso at mit.edu]
Sent: Friday, August 30, 2013 2:23 PM
To: Andreas Dilger
Cc: Richards, Paul Franklin; ext3-users at redhat.com
Subject: Re: Strange fsck.ext3 behavior - infinite loop

On Fri, Aug 30, 2013 at 12:07:22PM -0600, Andreas Dilger wrote:
>
> > [root at myhost /]# mkfs.ext3 /dev/sda1
> > mke2fs 1.35 (28-Feb-2004)
>
> First thing I would suggest is to update to a newer version of e2fsprogs, since this one is 9+ years old and that is a lot of
> water under the bridge.

That's definitely good advice, but even with e2fsprogs 1.35, if e2fsck
-f is finding errors when run immediately after running mke2fs, it
would make me suspect the storage device.

Are you sure the RAID controller (is this a hw raid, or software raid)
is working correctly?

                                        - Ted


From be.nicolas.michel at gmail.com  Mon Sep 16 10:16:17 2013
From: be.nicolas.michel at gmail.com (Nicolas Michel)
Date: Mon, 16 Sep 2013 12:16:17 +0200
Subject: Numbers behind "df" and "tune2fs"
Message-ID: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>

Hello guys,

I have some difficulties to understand what really are the numbers
behing "df" and tune2fs. You'll find the output of tune2fs and df
below, on which my maths are based.

Here are my maths:

A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
also tell me that the size of one block is 4096 (bytes if I'm not
wrong?). So my maths tell me that the disk is 3284992 * 4096 =
13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.

A df --block-size=1 on the same FS tell me the disk is 13243846656
which is 211480576 bytes smaller than what tune2fs tell me.

In gigabytes, it means:
* for df, the disk is 12.33 GB
* for tune2fs, the disk is 12.53 GB

I thought that maybe df is only taking into account the real blocks
available for users. So I tried to remove the reserved blocks and the
GDT blocks:
(3284992 - 164249 - 801) * 4096 = 12779282432
or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...

My last thought was that "Reserved block" in tune2fs was not only the
reserved blocks for root (which is 5% per default on my system) but
take into account all other reserved blocks fo the fs internal usage.
So:
(3284992 - 164249) * 4096 = 12782563328
In GB : 11.90 Gb (the difference is not significative with a precision of 2.

So I'm lost ...

Is someone have an explanation? I would really really be grateful.
Nicolas

------------------------------
---------

Here is the output of df and tune2fs :

$ tune2fs -l /dev/mapper/datavg-datalogslv
tune2fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags:         unsigned_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              822544
Block count:              3284992
Reserved block count:     164249
Free blocks:              3109325
Free inodes:              822348
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      801
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8144
Inode blocks per group:   509
Filesystem created:       Wed Aug 28 08:30:10 2013
Last mount time:          Wed Sep 11 17:16:56 2013
Last write time:          Thu Sep 12 09:38:02 2013
Mount count:              18
Maximum mount count:      27
Last checked:             Wed Aug 28 08:30:10 2013
Check interval:           15552000 (6 months)
Next check after:         Mon Feb 24 07:30:10 2014
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
Journal backup:           inode blocks

$ df --block-size=1 /dev/mapper/datavg-datalogslv
Filesystem                      1B-blocks      Used   Available Use% Mounted on
/dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs


-- 
Nicolas MICHEL


From sandeen at redhat.com  Mon Sep 16 14:39:23 2013
From: sandeen at redhat.com (Eric Sandeen)
Date: Mon, 16 Sep 2013 09:39:23 -0500
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
Message-ID: <5237181B.1070109@redhat.com>

On 9/16/13 5:16 AM, Nicolas Michel wrote:
> Hello guys,
> 
> I have some difficulties to understand what really are the numbers
> behing "df" and tune2fs. You'll find the output of tune2fs and df
> below, on which my maths are based.
> 
> Here are my maths:
> 
> A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
> also tell me that the size of one block is 4096 (bytes if I'm not
> wrong?). So my maths tell me that the disk is 3284992 * 4096 =
> 13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.
> 
> A df --block-size=1 on the same FS tell me the disk is 13243846656
> which is 211480576 bytes smaller than what tune2fs tell me.

By default, df on extN assumes that metadata used by the filesystem
was never available for your use and is not part of the filesystem
space.

Documentation/filesystems/ext3.txt says:

bsddf           (*)     Make 'df' act like BSD.
minixdf                 Make 'df' act like Minix.

which is pretty unhelpful I suppose.  ;)

The mount man page is a little more helpful:

       bsddf|minixdf
              Set the behaviour for the  statfs  system  call.  The  minixdf
              behaviour  is to return in the f_blocks field the total number
              of blocks of the filesystem, while the bsddf behaviour  (which
              is the default) is to subtract the overhead blocks used by the
              ext2 filesystem and not available for file storage.

You're seeing the latter behavior.  if you mount with -o minixdf you should
see what you expect.  (Too bad there's no "linuxdf?")  :)

> In gigabytes, it means:
> * for df, the disk is 12.33 GB
> * for tune2fs, the disk is 12.53 GB
> 
> I thought that maybe df is only taking into account the real blocks
> available for users. So I tried to remove the reserved blocks and the
> GDT blocks:
> (3284992 - 164249 - 801) * 4096 = 12779282432
> or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...

you're on the right track, but you forgot the journal space, all the
preallocated inode table blocks, etc.

-Eric

> My last thought was that "Reserved block" in tune2fs was not only the
> reserved blocks for root (which is 5% per default on my system) but
> take into account all other reserved blocks fo the fs internal usage.
> So:
> (3284992 - 164249) * 4096 = 12782563328
> In GB : 11.90 Gb (the difference is not significative with a precision of 2.
> 
> So I'm lost ...
> 
> Is someone have an explanation? I would really really be grateful.
> Nicolas
> 
> ------------------------------
> ---------
> 
> Here is the output of df and tune2fs :
> 
> $ tune2fs -l /dev/mapper/datavg-datalogslv
> tune2fs 1.41.9 (22-Aug-2009)
> Filesystem volume name:   <none>
> Last mounted on:          <not available>
> Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode dir_index
> filetype needs_recovery sparse_super large_file
> Filesystem flags:         unsigned_directory_hash
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              822544
> Block count:              3284992
> Reserved block count:     164249
> Free blocks:              3109325
> Free inodes:              822348
> First block:              0
> Block size:               4096
> Fragment size:            4096
> Reserved GDT blocks:      801
> Blocks per group:         32768
> Fragments per group:      32768
> Inodes per group:         8144
> Inode blocks per group:   509
> Filesystem created:       Wed Aug 28 08:30:10 2013
> Last mount time:          Wed Sep 11 17:16:56 2013
> Last write time:          Thu Sep 12 09:38:02 2013
> Mount count:              18
> Maximum mount count:      27
> Last checked:             Wed Aug 28 08:30:10 2013
> Check interval:           15552000 (6 months)
> Next check after:         Mon Feb 24 07:30:10 2014
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:              256
> Required extra isize:     28
> Desired extra isize:      28
> Journal inode:            8
> Default directory hash:   half_md4
> Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
> Journal backup:           inode blocks
> 
> $ df --block-size=1 /dev/mapper/datavg-datalogslv
> Filesystem                      1B-blocks      Used   Available Use% Mounted on
> /dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs
> 
> 


From be.nicolas.michel at gmail.com  Mon Sep 16 14:44:59 2013
From: be.nicolas.michel at gmail.com (Nicolas Michel)
Date: Mon, 16 Sep 2013 16:44:59 +0200
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <5237181B.1070109@redhat.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
	<5237181B.1070109@redhat.com>
Message-ID: <CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>

Thanks for you help. I also tried adding some other informations as you suggest:
I can also take into account:
- "Reserved block count:     XXXXXXX" from tune2fs that gives me the
number of blocks reserved for root
- Reserved GDT blocks:      XXX

But I didn't thought about the FS journal. How can I gather
information about it? (it's size and any other information?)

2013/9/16 Eric Sandeen <sandeen at redhat.com>:
> On 9/16/13 5:16 AM, Nicolas Michel wrote:
>> Hello guys,
>>
>> I have some difficulties to understand what really are the numbers
>> behing "df" and tune2fs. You'll find the output of tune2fs and df
>> below, on which my maths are based.
>>
>> Here are my maths:
>>
>> A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
>> also tell me that the size of one block is 4096 (bytes if I'm not
>> wrong?). So my maths tell me that the disk is 3284992 * 4096 =
>> 13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.
>>
>> A df --block-size=1 on the same FS tell me the disk is 13243846656
>> which is 211480576 bytes smaller than what tune2fs tell me.
>
> By default, df on extN assumes that metadata used by the filesystem
> was never available for your use and is not part of the filesystem
> space.
>
> Documentation/filesystems/ext3.txt says:
>
> bsddf           (*)     Make 'df' act like BSD.
> minixdf                 Make 'df' act like Minix.
>
> which is pretty unhelpful I suppose.  ;)
>
> The mount man page is a little more helpful:
>
>        bsddf|minixdf
>               Set the behaviour for the  statfs  system  call.  The  minixdf
>               behaviour  is to return in the f_blocks field the total number
>               of blocks of the filesystem, while the bsddf behaviour  (which
>               is the default) is to subtract the overhead blocks used by the
>               ext2 filesystem and not available for file storage.
>
> You're seeing the latter behavior.  if you mount with -o minixdf you should
> see what you expect.  (Too bad there's no "linuxdf?")  :)
>
>> In gigabytes, it means:
>> * for df, the disk is 12.33 GB
>> * for tune2fs, the disk is 12.53 GB
>>
>> I thought that maybe df is only taking into account the real blocks
>> available for users. So I tried to remove the reserved blocks and the
>> GDT blocks:
>> (3284992 - 164249 - 801) * 4096 = 12779282432
>> or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...
>
> you're on the right track, but you forgot the journal space, all the
> preallocated inode table blocks, etc.
>
> -Eric
>
>> My last thought was that "Reserved block" in tune2fs was not only the
>> reserved blocks for root (which is 5% per default on my system) but
>> take into account all other reserved blocks fo the fs internal usage.
>> So:
>> (3284992 - 164249) * 4096 = 12782563328
>> In GB : 11.90 Gb (the difference is not significative with a precision of 2.
>>
>> So I'm lost ...
>>
>> Is someone have an explanation? I would really really be grateful.
>> Nicolas
>>
>> ------------------------------
>> ---------
>>
>> Here is the output of df and tune2fs :
>>
>> $ tune2fs -l /dev/mapper/datavg-datalogslv
>> tune2fs 1.41.9 (22-Aug-2009)
>> Filesystem volume name:   <none>
>> Last mounted on:          <not available>
>> Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
>> Filesystem magic number:  0xEF53
>> Filesystem revision #:    1 (dynamic)
>> Filesystem features:      has_journal ext_attr resize_inode dir_index
>> filetype needs_recovery sparse_super large_file
>> Filesystem flags:         unsigned_directory_hash
>> Default mount options:    (none)
>> Filesystem state:         clean
>> Errors behavior:          Continue
>> Filesystem OS type:       Linux
>> Inode count:              822544
>> Block count:              3284992
>> Reserved block count:     164249
>> Free blocks:              3109325
>> Free inodes:              822348
>> First block:              0
>> Block size:               4096
>> Fragment size:            4096
>> Reserved GDT blocks:      801
>> Blocks per group:         32768
>> Fragments per group:      32768
>> Inodes per group:         8144
>> Inode blocks per group:   509
>> Filesystem created:       Wed Aug 28 08:30:10 2013
>> Last mount time:          Wed Sep 11 17:16:56 2013
>> Last write time:          Thu Sep 12 09:38:02 2013
>> Mount count:              18
>> Maximum mount count:      27
>> Last checked:             Wed Aug 28 08:30:10 2013
>> Check interval:           15552000 (6 months)
>> Next check after:         Mon Feb 24 07:30:10 2014
>> Reserved blocks uid:      0 (user root)
>> Reserved blocks gid:      0 (group root)
>> First inode:              11
>> Inode size:              256
>> Required extra isize:     28
>> Desired extra isize:      28
>> Journal inode:            8
>> Default directory hash:   half_md4
>> Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
>> Journal backup:           inode blocks
>>
>> $ df --block-size=1 /dev/mapper/datavg-datalogslv
>> Filesystem                      1B-blocks      Used   Available Use% Mounted on
>> /dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs
>>
>>
>


-- 
Nicolas MICHEL


From sandeen at redhat.com  Mon Sep 16 16:25:37 2013
From: sandeen at redhat.com (Eric Sandeen)
Date: Mon, 16 Sep 2013 11:25:37 -0500
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
	<5237181B.1070109@redhat.com>
	<CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>
Message-ID: <52373101.3060802@redhat.com>

On 9/16/13 9:44 AM, Nicolas Michel wrote:
> Thanks for you help. I also tried adding some other informations as you suggest:
> I can also take into account:
> - "Reserved block count:     XXXXXXX" from tune2fs that gives me the
> number of blocks reserved for root
> - Reserved GDT blocks:      XXX
> 
> But I didn't thought about the FS journal. How can I gather
> information about it? (it's size and any other information?)

# dumpe2fs /dev/$YOUR_DEVICE  | grep Journal
dumpe2fs 1.41.12 (17-May-2010)
Journal inode:            8
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768

But you also need to take into account inode tables, inode
allocation bitmaps, block allocation bitmaps ...

-Eric

> 2013/9/16 Eric Sandeen <sandeen at redhat.com>:
>> On 9/16/13 5:16 AM, Nicolas Michel wrote:
>>> Hello guys,
>>>
>>> I have some difficulties to understand what really are the numbers
>>> behing "df" and tune2fs. You'll find the output of tune2fs and df
>>> below, on which my maths are based.
>>>
>>> Here are my maths:
>>>
>>> A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
>>> also tell me that the size of one block is 4096 (bytes if I'm not
>>> wrong?). So my maths tell me that the disk is 3284992 * 4096 =
>>> 13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.
>>>
>>> A df --block-size=1 on the same FS tell me the disk is 13243846656
>>> which is 211480576 bytes smaller than what tune2fs tell me.
>>
>> By default, df on extN assumes that metadata used by the filesystem
>> was never available for your use and is not part of the filesystem
>> space.
>>
>> Documentation/filesystems/ext3.txt says:
>>
>> bsddf           (*)     Make 'df' act like BSD.
>> minixdf                 Make 'df' act like Minix.
>>
>> which is pretty unhelpful I suppose.  ;)
>>
>> The mount man page is a little more helpful:
>>
>>        bsddf|minixdf
>>               Set the behaviour for the  statfs  system  call.  The  minixdf
>>               behaviour  is to return in the f_blocks field the total number
>>               of blocks of the filesystem, while the bsddf behaviour  (which
>>               is the default) is to subtract the overhead blocks used by the
>>               ext2 filesystem and not available for file storage.
>>
>> You're seeing the latter behavior.  if you mount with -o minixdf you should
>> see what you expect.  (Too bad there's no "linuxdf?")  :)
>>
>>> In gigabytes, it means:
>>> * for df, the disk is 12.33 GB
>>> * for tune2fs, the disk is 12.53 GB
>>>
>>> I thought that maybe df is only taking into account the real blocks
>>> available for users. So I tried to remove the reserved blocks and the
>>> GDT blocks:
>>> (3284992 - 164249 - 801) * 4096 = 12779282432
>>> or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...
>>
>> you're on the right track, but you forgot the journal space, all the
>> preallocated inode table blocks, etc.
>>
>> -Eric
>>
>>> My last thought was that "Reserved block" in tune2fs was not only the
>>> reserved blocks for root (which is 5% per default on my system) but
>>> take into account all other reserved blocks fo the fs internal usage.
>>> So:
>>> (3284992 - 164249) * 4096 = 12782563328
>>> In GB : 11.90 Gb (the difference is not significative with a precision of 2.
>>>
>>> So I'm lost ...
>>>
>>> Is someone have an explanation? I would really really be grateful.
>>> Nicolas
>>>
>>> ------------------------------
>>> ---------
>>>
>>> Here is the output of df and tune2fs :
>>>
>>> $ tune2fs -l /dev/mapper/datavg-datalogslv
>>> tune2fs 1.41.9 (22-Aug-2009)
>>> Filesystem volume name:   <none>
>>> Last mounted on:          <not available>
>>> Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
>>> Filesystem magic number:  0xEF53
>>> Filesystem revision #:    1 (dynamic)
>>> Filesystem features:      has_journal ext_attr resize_inode dir_index
>>> filetype needs_recovery sparse_super large_file
>>> Filesystem flags:         unsigned_directory_hash
>>> Default mount options:    (none)
>>> Filesystem state:         clean
>>> Errors behavior:          Continue
>>> Filesystem OS type:       Linux
>>> Inode count:              822544
>>> Block count:              3284992
>>> Reserved block count:     164249
>>> Free blocks:              3109325
>>> Free inodes:              822348
>>> First block:              0
>>> Block size:               4096
>>> Fragment size:            4096
>>> Reserved GDT blocks:      801
>>> Blocks per group:         32768
>>> Fragments per group:      32768
>>> Inodes per group:         8144
>>> Inode blocks per group:   509
>>> Filesystem created:       Wed Aug 28 08:30:10 2013
>>> Last mount time:          Wed Sep 11 17:16:56 2013
>>> Last write time:          Thu Sep 12 09:38:02 2013
>>> Mount count:              18
>>> Maximum mount count:      27
>>> Last checked:             Wed Aug 28 08:30:10 2013
>>> Check interval:           15552000 (6 months)
>>> Next check after:         Mon Feb 24 07:30:10 2014
>>> Reserved blocks uid:      0 (user root)
>>> Reserved blocks gid:      0 (group root)
>>> First inode:              11
>>> Inode size:              256
>>> Required extra isize:     28
>>> Desired extra isize:      28
>>> Journal inode:            8
>>> Default directory hash:   half_md4
>>> Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
>>> Journal backup:           inode blocks
>>>
>>> $ df --block-size=1 /dev/mapper/datavg-datalogslv
>>> Filesystem                      1B-blocks      Used   Available Use% Mounted on
>>> /dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs
>>>
>>>
>>
> 
> 
> 


From be.nicolas.michel at gmail.com  Tue Sep 17 06:14:07 2013
From: be.nicolas.michel at gmail.com (Nicolas Michel)
Date: Tue, 17 Sep 2013 08:14:07 +0200
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <52373101.3060802@redhat.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
	<5237181B.1070109@redhat.com>
	<CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>
	<52373101.3060802@redhat.com>
Message-ID: <CAO5znauNHwDPDw_k8mLG=J-xAjJWEgaj8iDJW0W2RM8FeErFQg@mail.gmail.com>

OK. Thanks for the journal information. I thought tune2fs -l and
dumpe2fs were the same. In reality it's almost the same but not
entirely ^^

I hear you about all the internal mecanisms that make the FS working
or give it some features, and I do understand that it takes some place
on the disk. However what I don't understand is why the number given
in the "available column" is called "available" if it's not really the
case and we have to remove some other thousand/million of bytes for
some internal mecanisms.

In other words I don't understand why the "used" percentage given by
df does not reflects the values given by itself in the other columns.

I can live with it but I really would like to understand why things
are what they are. Is there an historic reason? Or maybe a technical
reason that makes thoses numbers some added values?

The least would be to have the df algorithms documented somewhere? A
document that explains intentions and how the values are obtained.
The same for tune2fs and dumpe2fs (what really means the given numbers?)

2013/9/16 Eric Sandeen <sandeen at redhat.com>:
> On 9/16/13 9:44 AM, Nicolas Michel wrote:
>> Thanks for you help. I also tried adding some other informations as you suggest:
>> I can also take into account:
>> - "Reserved block count:     XXXXXXX" from tune2fs that gives me the
>> number of blocks reserved for root
>> - Reserved GDT blocks:      XXX
>>
>> But I didn't thought about the FS journal. How can I gather
>> information about it? (it's size and any other information?)
>
> # dumpe2fs /dev/$YOUR_DEVICE  | grep Journal
> dumpe2fs 1.41.12 (17-May-2010)
> Journal inode:            8
> Journal backup:           inode blocks
> Journal features:         journal_incompat_revoke
> Journal size:             128M
> Journal length:           32768
>
> But you also need to take into account inode tables, inode
> allocation bitmaps, block allocation bitmaps ...
>
> -Eric
>
>> 2013/9/16 Eric Sandeen <sandeen at redhat.com>:
>>> On 9/16/13 5:16 AM, Nicolas Michel wrote:
>>>> Hello guys,
>>>>
>>>> I have some difficulties to understand what really are the numbers
>>>> behing "df" and tune2fs. You'll find the output of tune2fs and df
>>>> below, on which my maths are based.
>>>>
>>>> Here are my maths:
>>>>
>>>> A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
>>>> also tell me that the size of one block is 4096 (bytes if I'm not
>>>> wrong?). So my maths tell me that the disk is 3284992 * 4096 =
>>>> 13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.
>>>>
>>>> A df --block-size=1 on the same FS tell me the disk is 13243846656
>>>> which is 211480576 bytes smaller than what tune2fs tell me.
>>>
>>> By default, df on extN assumes that metadata used by the filesystem
>>> was never available for your use and is not part of the filesystem
>>> space.
>>>
>>> Documentation/filesystems/ext3.txt says:
>>>
>>> bsddf           (*)     Make 'df' act like BSD.
>>> minixdf                 Make 'df' act like Minix.
>>>
>>> which is pretty unhelpful I suppose.  ;)
>>>
>>> The mount man page is a little more helpful:
>>>
>>>        bsddf|minixdf
>>>               Set the behaviour for the  statfs  system  call.  The  minixdf
>>>               behaviour  is to return in the f_blocks field the total number
>>>               of blocks of the filesystem, while the bsddf behaviour  (which
>>>               is the default) is to subtract the overhead blocks used by the
>>>               ext2 filesystem and not available for file storage.
>>>
>>> You're seeing the latter behavior.  if you mount with -o minixdf you should
>>> see what you expect.  (Too bad there's no "linuxdf?")  :)
>>>
>>>> In gigabytes, it means:
>>>> * for df, the disk is 12.33 GB
>>>> * for tune2fs, the disk is 12.53 GB
>>>>
>>>> I thought that maybe df is only taking into account the real blocks
>>>> available for users. So I tried to remove the reserved blocks and the
>>>> GDT blocks:
>>>> (3284992 - 164249 - 801) * 4096 = 12779282432
>>>> or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...
>>>
>>> you're on the right track, but you forgot the journal space, all the
>>> preallocated inode table blocks, etc.
>>>
>>> -Eric
>>>
>>>> My last thought was that "Reserved block" in tune2fs was not only the
>>>> reserved blocks for root (which is 5% per default on my system) but
>>>> take into account all other reserved blocks fo the fs internal usage.
>>>> So:
>>>> (3284992 - 164249) * 4096 = 12782563328
>>>> In GB : 11.90 Gb (the difference is not significative with a precision of 2.
>>>>
>>>> So I'm lost ...
>>>>
>>>> Is someone have an explanation? I would really really be grateful.
>>>> Nicolas
>>>>
>>>> ------------------------------
>>>> ---------
>>>>
>>>> Here is the output of df and tune2fs :
>>>>
>>>> $ tune2fs -l /dev/mapper/datavg-datalogslv
>>>> tune2fs 1.41.9 (22-Aug-2009)
>>>> Filesystem volume name:   <none>
>>>> Last mounted on:          <not available>
>>>> Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
>>>> Filesystem magic number:  0xEF53
>>>> Filesystem revision #:    1 (dynamic)
>>>> Filesystem features:      has_journal ext_attr resize_inode dir_index
>>>> filetype needs_recovery sparse_super large_file
>>>> Filesystem flags:         unsigned_directory_hash
>>>> Default mount options:    (none)
>>>> Filesystem state:         clean
>>>> Errors behavior:          Continue
>>>> Filesystem OS type:       Linux
>>>> Inode count:              822544
>>>> Block count:              3284992
>>>> Reserved block count:     164249
>>>> Free blocks:              3109325
>>>> Free inodes:              822348
>>>> First block:              0
>>>> Block size:               4096
>>>> Fragment size:            4096
>>>> Reserved GDT blocks:      801
>>>> Blocks per group:         32768
>>>> Fragments per group:      32768
>>>> Inodes per group:         8144
>>>> Inode blocks per group:   509
>>>> Filesystem created:       Wed Aug 28 08:30:10 2013
>>>> Last mount time:          Wed Sep 11 17:16:56 2013
>>>> Last write time:          Thu Sep 12 09:38:02 2013
>>>> Mount count:              18
>>>> Maximum mount count:      27
>>>> Last checked:             Wed Aug 28 08:30:10 2013
>>>> Check interval:           15552000 (6 months)
>>>> Next check after:         Mon Feb 24 07:30:10 2014
>>>> Reserved blocks uid:      0 (user root)
>>>> Reserved blocks gid:      0 (group root)
>>>> First inode:              11
>>>> Inode size:              256
>>>> Required extra isize:     28
>>>> Desired extra isize:      28
>>>> Journal inode:            8
>>>> Default directory hash:   half_md4
>>>> Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
>>>> Journal backup:           inode blocks
>>>>
>>>> $ df --block-size=1 /dev/mapper/datavg-datalogslv
>>>> Filesystem                      1B-blocks      Used   Available Use% Mounted on
>>>> /dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs
>>>>
>>>>
>>>
>>
>>
>>
>


-- 
Nicolas MICHEL


From be.nicolas.michel at gmail.com  Tue Sep 17 06:34:26 2013
From: be.nicolas.michel at gmail.com (Nicolas Michel)
Date: Tue, 17 Sep 2013 08:34:26 +0200
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <CAO5znauNHwDPDw_k8mLG=J-xAjJWEgaj8iDJW0W2RM8FeErFQg@mail.gmail.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
	<5237181B.1070109@redhat.com>
	<CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>
	<52373101.3060802@redhat.com>
	<CAO5znauNHwDPDw_k8mLG=J-xAjJWEgaj8iDJW0W2RM8FeErFQg@mail.gmail.com>
Message-ID: <CAO5znauf2oKrv_PMFWTEJy_o8ou6eQF_Uqq0ygHoM8Pdp0F3jA@mail.gmail.com>

In fact the thing I really want to achieve is to be able to find the
values and the algorithm that enable me to reproduce the percentage
given by df (and to understand deeply what it means).

Why do I need it? Because I'm trying to write some script to do
capacity planning and space problem forecast. Currently I don't really
know which values I should use to do it. (I could use the percentage
given by df but it lacks some precisions to make usefull forecasts)

2013/9/17 Nicolas Michel <be.nicolas.michel at gmail.com>:
> OK. Thanks for the journal information. I thought tune2fs -l and
> dumpe2fs were the same. In reality it's almost the same but not
> entirely ^^
>
> I hear you about all the internal mecanisms that make the FS working
> or give it some features, and I do understand that it takes some place
> on the disk. However what I don't understand is why the number given
> in the "available column" is called "available" if it's not really the
> case and we have to remove some other thousand/million of bytes for
> some internal mecanisms.
>
> In other words I don't understand why the "used" percentage given by
> df does not reflects the values given by itself in the other columns.
>
> I can live with it but I really would like to understand why things
> are what they are. Is there an historic reason? Or maybe a technical
> reason that makes thoses numbers some added values?
>
> The least would be to have the df algorithms documented somewhere? A
> document that explains intentions and how the values are obtained.
> The same for tune2fs and dumpe2fs (what really means the given numbers?)
>
> 2013/9/16 Eric Sandeen <sandeen at redhat.com>:
>> On 9/16/13 9:44 AM, Nicolas Michel wrote:
>>> Thanks for you help. I also tried adding some other informations as you suggest:
>>> I can also take into account:
>>> - "Reserved block count:     XXXXXXX" from tune2fs that gives me the
>>> number of blocks reserved for root
>>> - Reserved GDT blocks:      XXX
>>>
>>> But I didn't thought about the FS journal. How can I gather
>>> information about it? (it's size and any other information?)
>>
>> # dumpe2fs /dev/$YOUR_DEVICE  | grep Journal
>> dumpe2fs 1.41.12 (17-May-2010)
>> Journal inode:            8
>> Journal backup:           inode blocks
>> Journal features:         journal_incompat_revoke
>> Journal size:             128M
>> Journal length:           32768
>>
>> But you also need to take into account inode tables, inode
>> allocation bitmaps, block allocation bitmaps ...
>>
>> -Eric
>>
>>> 2013/9/16 Eric Sandeen <sandeen at redhat.com>:
>>>> On 9/16/13 5:16 AM, Nicolas Michel wrote:
>>>>> Hello guys,
>>>>>
>>>>> I have some difficulties to understand what really are the numbers
>>>>> behing "df" and tune2fs. You'll find the output of tune2fs and df
>>>>> below, on which my maths are based.
>>>>>
>>>>> Here are my maths:
>>>>>
>>>>> A tune2fs on an ext3 FS tell me the FS size is 3284992 block large. It
>>>>> also tell me that the size of one block is 4096 (bytes if I'm not
>>>>> wrong?). So my maths tell me that the disk is 3284992 * 4096 =
>>>>> 13455327232 bytes or 13455327232 / 1024 /1024 /1024 = 12.53 GB.
>>>>>
>>>>> A df --block-size=1 on the same FS tell me the disk is 13243846656
>>>>> which is 211480576 bytes smaller than what tune2fs tell me.
>>>>
>>>> By default, df on extN assumes that metadata used by the filesystem
>>>> was never available for your use and is not part of the filesystem
>>>> space.
>>>>
>>>> Documentation/filesystems/ext3.txt says:
>>>>
>>>> bsddf           (*)     Make 'df' act like BSD.
>>>> minixdf                 Make 'df' act like Minix.
>>>>
>>>> which is pretty unhelpful I suppose.  ;)
>>>>
>>>> The mount man page is a little more helpful:
>>>>
>>>>        bsddf|minixdf
>>>>               Set the behaviour for the  statfs  system  call.  The  minixdf
>>>>               behaviour  is to return in the f_blocks field the total number
>>>>               of blocks of the filesystem, while the bsddf behaviour  (which
>>>>               is the default) is to subtract the overhead blocks used by the
>>>>               ext2 filesystem and not available for file storage.
>>>>
>>>> You're seeing the latter behavior.  if you mount with -o minixdf you should
>>>> see what you expect.  (Too bad there's no "linuxdf?")  :)
>>>>
>>>>> In gigabytes, it means:
>>>>> * for df, the disk is 12.33 GB
>>>>> * for tune2fs, the disk is 12.53 GB
>>>>>
>>>>> I thought that maybe df is only taking into account the real blocks
>>>>> available for users. So I tried to remove the reserved blocks and the
>>>>> GDT blocks:
>>>>> (3284992 - 164249 - 801) * 4096 = 12779282432
>>>>> or in GB : 12779282432 / 1024 / 1024 / 1024 = 11.90 Gb ...
>>>>
>>>> you're on the right track, but you forgot the journal space, all the
>>>> preallocated inode table blocks, etc.
>>>>
>>>> -Eric
>>>>
>>>>> My last thought was that "Reserved block" in tune2fs was not only the
>>>>> reserved blocks for root (which is 5% per default on my system) but
>>>>> take into account all other reserved blocks fo the fs internal usage.
>>>>> So:
>>>>> (3284992 - 164249) * 4096 = 12782563328
>>>>> In GB : 11.90 Gb (the difference is not significative with a precision of 2.
>>>>>
>>>>> So I'm lost ...
>>>>>
>>>>> Is someone have an explanation? I would really really be grateful.
>>>>> Nicolas
>>>>>
>>>>> ------------------------------
>>>>> ---------
>>>>>
>>>>> Here is the output of df and tune2fs :
>>>>>
>>>>> $ tune2fs -l /dev/mapper/datavg-datalogslv
>>>>> tune2fs 1.41.9 (22-Aug-2009)
>>>>> Filesystem volume name:   <none>
>>>>> Last mounted on:          <not available>
>>>>> Filesystem UUID:          4e5bea3e-3e61-4fc8-9676-e5177522911c
>>>>> Filesystem magic number:  0xEF53
>>>>> Filesystem revision #:    1 (dynamic)
>>>>> Filesystem features:      has_journal ext_attr resize_inode dir_index
>>>>> filetype needs_recovery sparse_super large_file
>>>>> Filesystem flags:         unsigned_directory_hash
>>>>> Default mount options:    (none)
>>>>> Filesystem state:         clean
>>>>> Errors behavior:          Continue
>>>>> Filesystem OS type:       Linux
>>>>> Inode count:              822544
>>>>> Block count:              3284992
>>>>> Reserved block count:     164249
>>>>> Free blocks:              3109325
>>>>> Free inodes:              822348
>>>>> First block:              0
>>>>> Block size:               4096
>>>>> Fragment size:            4096
>>>>> Reserved GDT blocks:      801
>>>>> Blocks per group:         32768
>>>>> Fragments per group:      32768
>>>>> Inodes per group:         8144
>>>>> Inode blocks per group:   509
>>>>> Filesystem created:       Wed Aug 28 08:30:10 2013
>>>>> Last mount time:          Wed Sep 11 17:16:56 2013
>>>>> Last write time:          Thu Sep 12 09:38:02 2013
>>>>> Mount count:              18
>>>>> Maximum mount count:      27
>>>>> Last checked:             Wed Aug 28 08:30:10 2013
>>>>> Check interval:           15552000 (6 months)
>>>>> Next check after:         Mon Feb 24 07:30:10 2014
>>>>> Reserved blocks uid:      0 (user root)
>>>>> Reserved blocks gid:      0 (group root)
>>>>> First inode:              11
>>>>> Inode size:              256
>>>>> Required extra isize:     28
>>>>> Desired extra isize:      28
>>>>> Journal inode:            8
>>>>> Default directory hash:   half_md4
>>>>> Directory Hash Seed:      ad2251a9-ac33-4e5e-b933-af49cb4f2bb3
>>>>> Journal backup:           inode blocks
>>>>>
>>>>> $ df --block-size=1 /dev/mapper/datavg-datalogslv
>>>>> Filesystem                      1B-blocks      Used   Available Use% Mounted on
>>>>> /dev/mapper/datavg-datalogslv 13243846656 563843072 12007239680   5% /logs
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Nicolas MICHEL


-- 
Nicolas MICHEL


From sandeen at redhat.com  Wed Sep 18 15:25:48 2013
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 18 Sep 2013 10:25:48 -0500
Subject: Numbers behind "df" and "tune2fs"
In-Reply-To: <CAO5znauf2oKrv_PMFWTEJy_o8ou6eQF_Uqq0ygHoM8Pdp0F3jA@mail.gmail.com>
References: <CAO5znatWOjMM9QfEzNfTXfz_HB=FkaHZWz_2xEyQAHhjkEpbsg@mail.gmail.com>
	<5237181B.1070109@redhat.com>
	<CAO5znasgXM18EpmJYasz2Eqpp6Orxn57y8Rx4Wbkvxni3k+kaA@mail.gmail.com>
	<52373101.3060802@redhat.com>
	<CAO5znauNHwDPDw_k8mLG=J-xAjJWEgaj8iDJW0W2RM8FeErFQg@mail.gmail.com>
	<CAO5znauf2oKrv_PMFWTEJy_o8ou6eQF_Uqq0ygHoM8Pdp0F3jA@mail.gmail.com>
Message-ID: <5239C5FC.8010405@redhat.com>

On 9/17/13 1:34 AM, Nicolas Michel wrote:
> In fact the thing I really want to achieve is to be able to find the
> values and the algorithm that enable me to reproduce the percentage
> given by df (and to understand deeply what it means).
> 
> Why do I need it? Because I'm trying to write some script to do
> capacity planning and space problem forecast. Currently I don't really
> know which values I should use to do it. (I could use the percentage
> given by df but it lacks some precisions to make usefull forecasts)
> 

If you want "the truth" just mount -o minixdf, tune2fs to 0 blocks reserved,
and you'll get the actual number of blocks contained in the filesystem,
the actual number of blocks used, and the actual blocks free.  Why extN
made it so complicated, I don't really know.

If you want to see how the sausage is made, look at ext3_statfs()
for all the hairy calculations.  (ext4_statfs() is even more complex).

Until recently, it was all complicated enough that even the kernel code got
it wrong.  ;)

0875a2b448fcaba67010850cf9649293a5ef653d ext4: include journal blocks in df overhead calcs
b72f78cb63fb595af63fc781dced0a6fd354e572 ext4: fix overhead calculations in ext4_stats, again
952fc18ef9ec707ebdc16c0786ec360295e5ff15 ext4: fix overhead calculation used by ext4_statfs()
...

-Eric