[Linux-cluster] gfs2 v. zfs?

Rafa Grimán rafagriman at gmail.com
Mon Jan 24 22:20:23 UTC 2011


Hi :)


On Monday 24 January 2011 22:37 Wendy Cheng wrote
> I would love to get an education here. From usage model point of view,
> what is the difference between a "parallel file system" and a "cluster
> file system" ? i.e., when to use a parallel file system and when to
> use a cluster file system ?


Please don't top post :)

A parallel file system is a file system in which:

	- metadata and data servers are separated

	- a file's data is distributed/striped among the data servers
	  (each data server has its own storage)

Due to #2, a file is read or written in parallel so you get a higher bandwidth. 
Each data server/node serves a chunk of each file. This is something similar to 
a RAID 0 on many servers.

Metadata is stored on a metadata server so it doesn't "get in the way" ;) That 
is, the client node asks the metadata server where the file's chunks are. The 
metadata server sends the client a list of data nodes which contain the chunks 
and then the client talks directly to the data nodes without having to talk 
again with the metadata server. Obviously, this is over simplified ;)

Also, take into account that metadata is IOPS intensive while data is 
bandwidth/throughput intensive. If you separate them both ... you can tune 
each storage susbsytem to get the best performance for IOPS or bandwidth.

Parallel file systems are useful for high bandwidth/throughput systems (HPC).

In clustered file systems:

	- metadata and data servers aren't usually separated (in CXFS they are)

	- a file's data is not striped among the data servers since there is a
	  single storage array

Due to #2, a file is not read/written in parallel. 1 file is served by 1 data 
node/server. This means you can have 2 nodes serving 2 files at the same time, 
but each node serves 1 file, not chunks of the same file.

Clustered file systems are useful for active/active HA/loadbalancing 
configurations.

This is a very simplified explanation of both. For more in depth explanations 
check Google ;) Look for GPFS, PVFS, Lustre, PanFS (Panasas), CXFS, GFS, 
OCFSv2, ...

HTH

   Rafa


> On Mon, Jan 24, 2011 at 1:10 PM, Rafa Grim�n <rafagriman at gmail.com> wrote:
> > Hi :)
> > 
> > On Monday 24 January 2011 21:25 Wendy Cheng wrote
> > 
> >> Sometime ago, the following was advertised:
> >> 
> >> "ZFS is not a native cluster, distributed, or parallel file system and
> >> cannot provide concurrent access from multiple hosts as ZFS is a local
> >> file system. Sun's Lustre distributed filesystem will adapt ZFS as
> >> back-end storage for both data and metadata in version 3.0, which is
> >> scheduled to be released in 2010."
> >> 
> >> You can google "Lustre" to see whether their plan (built Lustre on top
> >> of ZFS) is panned out.
> > 
> > But Lustre isn't a clustered filesystem, it's a parallel filesystem.
> > Similar to pNFS, PanFS, ... Comparing GFS to Lustre wouldn't be quite
> > right.
> > 
> > � Rafa


-- 
"We cannot treat computers as Humans. Computers need love."

Happily using KDE 4.5.4 :)




More information about the Linux-cluster mailing list