question?

James Wilkinson james at westexe.demon.co.uk
Fri Oct 1 16:33:38 UTC 2004


Dan Ladd wrote:
> Yeah this is the only email address i could find because I had a
> question about the fedora project.  I didn't konw where to direct it. 
> So here goes... I was wondering about the WinFS file management
> system.   If something like that (meaning the built in database
> system) was going to be included in fedora or other RedHat operating
> systems or if something like is already in an operating system with
> RedHat?  It seems like it is old technology because the AS/400 or now
> the iSeries has the built in database system to locate physical files
> that are stored in logical files.   If you could direct me to where i
> can have that answered that would be awesome.

Oohh. *BIG* question.

Background: Unix invented much of the "everything is a file, stored in a
file tree, and the OS just sees a normal file as a collection of bits"
philosophy that is now more or less standard.  Meanwhile, the relational
database became the standard way of storing smaller, structured data.

*Lots* of computer scientists have been wondering whether this split
between ways of storing data is ideal. So there has been a lot of work
done looking at the best ways to store general-purpose files in a
database.

It turns out that this is a Hard Problem. Storing files as opaque,
binary objects in a database isn't a problem, a lot of modern
filesystems effectively do this. The question is whether we can take
anything else from the database world.

Here you should understand that there is no agreed vision of how things
should work. This is the main point I want to make. So you will have to
work out what you want from a database filesystem, and see what provides
it.

The two big problems come under the heading of writing and reading.
Writing is relatively easy, since you can define the problem: It Would Be
Nice If Linux allowed multiple updates to one file or to many files to be
treated as a transaction. Even there, there is the unfortunate detail
of getting transactions to span filesystems.

Reading is more of a problem. Many file formats keep internal metadata
(author, image size, artist, etc.), and there is a demand to keep more
data against files (e.g. Access Control Lists). Many people think that
there should be a better way of finding all recordings of Vaughan
Williams' works than finding all MP3s, all OGGs, all Real Media, etc.
and running format-specific query programs against each file. (One still
has a problem if some-one entered "Old Hundredth, arr. RWV", but that
can be ignored in the first few versions...) Maybe icons for a file
should be stored against the file.

This is the metadata problem, or rather, series of problems. One is,
simply, how do you present metadata under Unix-like systems? Solaris
has a special system call and program to access the metadata: Hans
Reiser is proposing to allow you to access each file as a directory
with the metadata available underneath (obviously, this isn't practical
with real directories).

The other big problem is how much metadata should move around with a
file. It's obvious that you want to be able to export files in an
existing format, which will drop any metadata that isn't already in the
format.  (You still need to support existing filesystems, for example on
CDs).

Then when you're copying things around, some metadata (user to last
modify) should change, others (user to first modify) shouldn't. This
means that something is going to have to know a *lot* about the way that
metadata works, which means you are going to have a lot of per-filetype
programming and/or a lot of rigidity.

The main contender to "solve" both of these problems is the Reiser 4
filesystem. This is still very new, and has a number of problems with it.

 * It's very new, not fully debugged, and has a number of security and
   reliability problems.

 * It stores metadata to a file by treating the file as a directory, and
   putting metadata as pseudo-files in that directory. That changes the
   way users and programs think about files, and will invalidate a lot
   of assumputions.

See http://lwn.net/Articles/14035/ .

Other contenders include user-space plugins to the Gnome or KDE virtual
filesystems. These can be reasonably taught "this is an MP3, this is an
XML document", and retrieve the meta-data on demand.

It still isn't clear how best to make this visible to non-technical end-
users. It's largely those people who *aren't* happy with shell scripting
who would most benefit for easy ways to look for files with Vaughan
Williams recordings. (Those who can will probably have the sense to put
RVW in the pathname somewhere, and can use custom tools).

On top of this, maybe some files should be word-indexed. It doesn't make
sense doing this for Ogg files, though: Microsoft's Find Fast has long
done this in userspace with a separate database, and this does seem much
better than putting the suppot in the kernel.

I don't know much about OS/400: it always sounded as though they
implemented the database first, and then created the entire OS and
related applications around the database. They had the advantage that
everything knew it was going to be working with a database, and
progams on OS/400 probably really want a database backend anyway.

Linux doesn't have that, and is a much more general purpose OS.

I've also come across http://lwn.net/Articles/56923/ : you might want to
read that, too.

Note WinFS itself appears to be delayed until the end of the decade.

Sorry for the length of this e-mail: there's more I could say, but won't.

James.
-- 
E-mail address: james | Examiner: How does an AC motor start?
@westexe.demon.co.uk  | Student: vrrrrrrrrrrRrRRRRRRR...
                      | Examiner: Stop! Stop!
                      | Student: RRRRRRRmmmmm.




More information about the fedora-list mailing list