[Linux-cluster] iSCSI GFS

Mon Jan 28 08:07:20 UTC 2008

isplist at logicore.net wrote:
>> It's not virtualization. It is equivalent to mounting an NFS share, and
>> then exporting it again from the machine that mounted it.
> 
> Ok, so a single machine where all storage is attached to it. Won't that bog it 
> down big time pretty quickly? 

Not if it can handle the I/O. You just need enough CPU and enough bonded 
gigabit ethernet NICs in it. At the end of the day, all a SAN appliance 
is just a PC with a few NICs and a bunch of disks in it, and those can 
handle quite a few machines using it simultaneously.

> So, if this is correct, I can see how I could export everything from that one 
> machine but overall I/O would be unreal? 

How much I/O do you actually need? If you have 10 disk nodes, each with 
a 1Gb NIC, then you could just have couple of 10Gb NICs in the 
aggregator (one on the client, one on the disk node side), and you'll 
get no bottleneck. In reality, you can overbook it quite a lot unless 
all the machines are going flat out all the time. Caching on the 
aggregator and the client nodes will also help reduce the I/O on the 
disk node side.

> How would this machine be turned into an aggregator? Would it handle knowing 
> where everything is or would servers still need to know which share to connect 
> to in order to get the needed data?

The disk nodes export their space via iSCSI as volumes. The aggregator 
connects to each of those iSCSI volumes as normal SCSI device nodes, and 
creates virtual software RAID stripe over them. It then exports this 
back out via iSCSI. All the client nodes then connect to the single big 
iSCSI node that is the aggregator.

> I also happen to have a BlueArc i7500 machine which can offer up NFS shares. I 
> didn't want to use anything like that because I've read too many message about 
> NFS not being a good protocol to grow on. Do you disagree?

NFS can give considerably better performance than GFS under some 
circumstances. If you don't need POSIX compliant file locking, you may 
find that NFS works better for your application. You'll just have to try 
it and see. There is no reason the aggregator box couldn't export an NFS 
share to the aggregated space (i.e. be a NAS rather than a SAN).

>> Exactly. You have a machine that pretends to be a SAN when it in fact
>> has no space on it. Instead, it connects to all the individual storage
>> nodes, mounts their volumes, merges them into one big volume, and then
>> presents that one big volume via iSCSI.
> 
> Ok, I like it :). I don't get how I aggregate it all into a single volume, 
> guess I've not played with software RAID which expands to different storage 
> devices and volumes. I get the idea though.
> 
> For hardware, would this aggregator need massive resources in terms of CPU or 
> memory? I have IBM's which have 8-way CPU's and can have up to 64GB of memory. 

I suspect that possibly overkill. It's NIC I/O you'll need more than 
anything. Jumbo frames, as big as your hardware can handle, will also help.

> Would the aggregator be a potential cluster candidate perhaps? Might it be 
> possible to run a cluster of them to be safe and to offload? 

There is no reason why the aggregator couldn't mind it's own exports, 
and run as one of the client cluster nodes.

> This is interesting. I can see that if I could get to VM/shareroot and 
> something like this, I would have something quite nice going.
> 
>> It's a central connection point AND a router, only it isn't just
>> straight routing, because the data is RAID striped for redundancy.
> 
> Right, I just don't yet get how the aggregator handles all of that I/O. Or 
> perhaps it just tells the servers which storage device to connect to so that 
> it doesn't actually have to take on all of the I/O?

No, it handles all of the I/O through itself. The client nodes don't 
connect to the disk nodes directly, ever.

Gordan