[edk2-devel] RFC: EXT4 filesystem driver

Marvin Häuser mhaeuser at posteo.de
Thu Jul 22 16:07:56 UTC 2021


On 22.07.21 17:58, Andrew Fish wrote:
>
>
>> On Jul 22, 2021, at 3:24 AM, Marvin Häuser <mhaeuser at posteo.de 
>> <mailto:mhaeuser at posteo.de>> wrote:
>>
>> On 22.07.21 01:12, Pedro Falcato wrote:
>>> EXT4 (fourth extended filesystem) is a filesystem developed for Linux
>>> that has been in wide use (desktops, servers, smartphones) since 2008.
>>>
>>> The Ext4Pkg implements the Simple File System Protocol for a partition
>>> that is formatted with the EXT4 file system. This allows UEFI Drivers,
>>> UEFI Applications, UEFI OS Loaders, and the UEFI Shell to access files
>>> on an EXT4 partition and supports booting a UEFI OS Loader from an
>>> EXT4 partition.
>>> This project is one of the TianoCore Google Summer of Code projects.
>>>
>>> Right now, Ext4Pkg only contains a single member, Ext4Dxe, which is a
>>> UEFI driver that consumes Block I/O, Disk I/O and (optionally) Disk
>>> I/O 2 Protocols, and produces the Simple File System protocol. It
>>> allows mounting ext4 filesystems exclusively.
>>>
>>> Brief overhead of EXT4:
>>> Layout of an EXT2/3/4 filesystem:
>>> (note: this driver has been developed using
>>> https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html 
>>> <https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html> as
>>> documentation).
>>>
>>> An ext2/3/4 filesystem (here on out referred to as simply an ext4 
>>> filesystem,
>>> due to the similarities) is composed of various concepts:
>>>
>>> 1) Superblock
>>> The superblock is the structure near (1024 bytes offset from the start)
>>> the start of the partition, and describes the filesystem in general.
>>> Here, we get to know the size of the filesystem's blocks, which features
>>> it supports or not, whether it's been cleanly unmounted, how many blocks
>>> we have, etc.
>>>
>>> 2) Block groups
>>> EXT4 filesystems are divided into block groups, and each block group 
>>> covers
>>> s_blocks_per_group(8 * Block Size) blocks. Each block group has an
>>> associated block group descriptor; these are present directly after the
>>> superblock. Each block group descriptor contains the location of the
>>> inode table, and the inode and block bitmaps (note these bitmaps are 
>>> only
>>> a block long, which gets us the 8 * Block Size formula covered 
>>> previously).
>>>
>>> 3) Blocks
>>> The ext4 filesystem is divided into blocks, of size s_log_block_size 
>>> ^ 1024.
>>> Blocks can be allocated using individual block groups's bitmaps. Note
>>> that block 0 is invalid and its presence on extents/block tables means
>>> it's part of a file hole, and that particular location must be read as
>>> a block full of zeros.
>>>
>>> 4) Inodes
>>> The ext4 filesystem divides files/directories into inodes (originally
>>> index nodes). Each file/socket/symlink/directory/etc (here on out 
>>> referred
>>> to as a file, since there is no distinction under the ext4 
>>> filesystem) is
>>> stored as a /nameless/ inode, that is stored in some block group's inode
>>> table. Each inode has s_inode_size size (or GOOD_OLD_INODE_SIZE if it's
>>> an old filesystem), and holds various metadata about the file. Since the
>>> largest inode structure right now is ~160 bytes, the rest of the inode
>>> contains inline extended attributes. Inodes' data is stored using either
>>> data blocks (under ext2/3) or extents (under ext4).
>>>
>>> 5) Extents
>>> Ext4 inodes store data in extents. These let N contiguous logical blocks
>>> that are represented by N contiguous physical blocks be represented by a
>>> single extent structure, which minimizes filesystem metadata bloat and
>>> speeds up block mapping (particularly due to the fact that high-quality
>>> ext4 implementations like linux's try /really/ hard to make the file
>>> contiguous, so it's common to have files with almost 0 fragmentation).
>>> Inodes that use extents store them in a tree, and the top of the tree
>>> is stored on i_data. The tree's leaves always start with an
>>> EXT4_EXTENT_HEADER and contain EXT4_EXTENT_INDEX on eh_depth != 0 and
>>> EXT4_EXTENT on eh_depth = 0; these entries are always sorted by logical
>>> block.
>>>
>>> 6) Directories
>>> Ext4 directories are files that store name -> inode mappings for the
>>> logical directory; this is where files get their names, which means ext4
>>> inodes do not themselves have names, since they can be linked (present)
>>> multiple times with different names. Directories can store entries 
>>> in two
>>> different ways:
>>> 1) Classical linear directories: They store entries as a mostly-linked
>>> mostly-list of EXT4_DIR_ENTRY.
>>> 2) Hash tree directories: These are used for larger directories, with
>>> hundreds of entries, and are designed in a backwards-compatible way.
>>> These are not yet implemented in the Ext4Dxe driver.
>>>
>>> 7) Journal
>>> Ext3/4 filesystems have a journal to help protect the filesystem against
>>> system crashes. This is not yet implemented in Ext4Dxe but is described
>>> in detail in the Linux kernel's documentation.
>>>
>>> The EDK2 implementation of ext4 is based only on the public 
>>> documentation
>>> available at 
>>> https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html 
>>> <https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html>
>>> and
>>> the FreeBSD ext2fs driver (available at
>>> https://github.com/freebsd/freebsd-src/tree/main/sys/fs/ext2fs 
>>> <https://github.com/freebsd/freebsd-src/tree/main/sys/fs/ext2fs>,
>>> BSD-2-Clause-FreeBSD licensed). It is licensed as
>>> SPDX-License-Identifier: BSD-2-Clause-Patent.
>>>
>>> After a brief discussion with the community, the proposed package
>>> location is edk2-platform/Features/Ext4Pkg
>>> (relevant discussion: https://edk2.groups.io/g/devel/topic/83060185 
>>> <https://edk2.groups.io/g/devel/topic/83060185>).
>>>
>>> I was the main contributor and I would like to maintain the package in
>>> the future, if possible.
>>
>> While I personally don't like it's outside of the EDK II core, I kind 
>> of get it. However I would strongly suggest to choose a more general 
>> package name, like "LinuxFsPkg", or "NixFsPkg", or maybe even just 
>> "FileSystemPkg" (and move FAT over some day?). Imagine someone wants 
>> to import BTRFS next year, should it really be "BtrfsPkg"? I 
>> understand it follows the "FatPkg" convention, but I feel like people 
>> forget FatPkg was special as to its awkward license before Microsoft 
>> allowed a change a few years ago. Maintainers.txt already has the 
>> concept of different Reviewers per subfolder, maybe it could be 
>> extended a little to have a common EDK II contributor to officially 
>> maintain the package, but have you be a Maintainer or something like 
>> a Reviewer+ to your driver? Or you could maintain the entire package 
>> of course.
>>
>
> Marvin,
>
> Good point that the FatPkg was more about license boundary than 
> anything else, so I’m not opposed to a more generic package name.
>
>>> Current limitations:
>>> 1) The Ext4Dxe driver is, at the moment, read-only.
>>> 2) The Ext4Dxe driver at the moment cannot mount older (ext2/3)
>>> filesystems. Ensuring compatibility with
>>> those may not be a bad idea.
>>>
>>> I intend to test the package using the UEFI SCTs present in edk2-test,
>>> and implement any other needed unit tests myself using the already
>>> available unit test framework. I also intend to (privately) fuzz the
>>> UEFI driver with bad/unusual disk images, to improve the security and
>>> reliability of the driver.
>>>
>>> In the future, ext4 write support should be added so edk2 has a
>>> fully-featured RW ext4 driver. There could also be a focus on
>>> supporting the older ext4-like filesystems, as I mentioned in the
>>> limitations, but that is open for discussion.
>>
>> I may be alone, but I strongly object. One of our projects (OpenCore) 
>> has a disgusting way of writing files because the FAT32 driver in 
>> Aptio IV firmwares may corrupt the filesystem when resizing files. To 
>> be honest, it may corrupt with other usages too and we never noticed, 
>> because basically we wrote the code to require the least amount of 
>> (complex) FS operations.
>>
>> The issue with EDK II is, there is a lot of own code and a lot of 
>> users, but little testing. By that I do not mean that developers do 
>> not test their code, but that nobody sits down and performs all sorts 
>> of FS manipulations in all sorts of orders and closely observes the 
>> result for regression-testing. Users will not really test it either, 
>> as UEFI to them should just somehow boot to Windows. If at least the 
>> code was shared with a codebase that is known-trusted (e.g. the Linux 
>> kernel itself), that'd be much easier to trust, but realistically 
>> this is not going to happen.
>> My point is, if a company like AMI cannot guarantee writing does not 
>> damage the FS for a very simple FS, how do you plan to guarantee 
>> yours doesn't for a much more complex FS? I'd rather have only one 
>> simple FS type that supports writing for most use-cases (e.g. logging).
>>
>> At the very least I would beg you to have a PCD to turn write support 
>> off - if it will be off by default, that'd be great of course. :)
>> Was there any discussion yet as to why write support is needed in the 
>> first place you could point me to?
>>
>
> I think having a default PCD option of read only is a good idea.
>
> EFI on Mac carries HFS+ and APFS EFI file system drivers and both of 
> those are read only for safety, security, and to avoid the need to 
> validate them. So I think some products may want to have the option to 
> ship read only versions of the file system.
>
> Seems like having EFI base file system tests would be useful. I’d 
> imaging with OVMF it would be possible to implement a very robust test 
> infrastructure. Seems like the hard bits would be generating the test 
> cases and figuring out how to validate the tests did the correct 
> thing. I’m guess the OS based file system drivers are robust and try 
> to work around bugs gracefully? Maybe there is a way to turn on OS 
> logging, or even run an OS based fsck on the volume after the tests 
> complete. Regardless this seems like an interesting project, maybe we 
> can add it to next years GSoC?

Great idea, maybe it could be added to that wiki list of topic 
suggestions (preferably close to the top)? :)

Best regards,
Marvin

>
> Thanks,
>
> Andrew Fish
>
>> Thanks for your work!
>>
>> Best regards,
>> Marvin
>>
>>> The driver's handling of unclean unmounting through forced shutdown 
>>> is unclear.
>>> Is there a position in edk2 on how to handle such cases? I don't think
>>> FAT32 has a "this filesystem is/was dirty" and even though it seems to
>>> me that stopping a system from booting/opening the partition because
>>> "we may find some tiny irregularities" is not the best course of
>>> action, I can't find a clear answer.
>>>
>>> The driver also had to add implementations of CRC32C and CRC16, and
>>> after talking with my mentor we quickly reached the conclusion that
>>> these may be good candidates for inclusion in MdePkg. We also
>>> discussed moving the Ucs2 <-> Utf8 conversion library in RedfishPkg
>>> (BaseUcs2Utf8Lib) into MdePkg as well. Any comments?
>>>
>>> Feel free to ask any questions you may find relevant.
>>>
>>> Best Regards,
>>>
>>> Pedro Falcato
>>>
>>>
>>>
>>>
>>
>>
>>
>> 
>



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#78099): https://edk2.groups.io/g/devel/message/78099
Mute This Topic: https://groups.io/mt/84368561/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-






More information about the edk2-devel-archive mailing list