[Linux-cluster] Storage Cluster Newbie Questions - any help with answers greatly appreciated!

Thu Mar 4 17:26:35 UTC 2010

Hello Kaloyan,

Thank you for the thoughts.

You are correct when I said - "Active / Passive" - I simply meant that I 
had no need for "Active / Active" - and floating IP on the NFS share 
would be exactly what I had in mind.

The software raid - of any type, raid1,5,6 etc... is the issue.  From 
what I have read - mdadm - is not cluster aware... and... since all 
disks are seen by all RHEL nodes. - As Leo mentioned; some method to 
disable the kernel from finding detecting and attempting to assemble all 
the available software raids - is a major problem.  This is why I was 
asking if perhaps - CLVM w/mirroring would be a better method.  Although 
since it was just introduced in RHEL 5.3 - I am a bit leery.

Sorry for being confusing - yes - the linux machines will have a 
completely different filesystem share; than the windows machines.  My 
original thought was I would do "node#1 primary nfs share (floating 
ip#1) to linux machines w/node#2 backup" - and then "node#2 primary nfs 
or samba share (floating ip#2) to windows machines w/node#1 backup".

Any more thoughts you have would be appreciated... as my original plan 
with MDADM w/HA-LVM - so far doesn't seem very possible.

-Michael

Kaloyan Kovachev wrote, On 3/4/2010 8:52 AM:
> Hi,
>
> On Wed, 03 Mar 2010 11:16:07 -0800, Michael @ Professional Edge LLC wrote
>    
>> Hail Linux Cluster gurus,
>>
>> I have researched myself into a corner and am looking for advice.  I've
>> never been a "clustered storage guy", so I apologize for the potentially
>> naive set of questions.  ( I am savvy on most other aspects of networks,
>> hardware, OS's etc... but not storage systems).
>>
>> I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
>> FC-AL disk shelves w/14 disks each; and told to make a mini NAS/SAN (NFS
>> required, GFS optional).  If I can get this working reliably then there
>> appear to be about another ( 10 ) FC-AL shelves and a couple of Fiber
>> Switches laying around that will be handed to me.
>>
>> NFS filesystems will be mounted by several (less than 6) linux machines,
>> and a few (less than 4) windows machines [[ microsoft nfs client ]] -
>> all more or less doing web server type activities (so lots of reads from
>> a shared filesystem - log files not on NFS so no issue with high IO
>> writes).  I'm locked into NFS v3 for various reasons.  Optionally the
>> linux machines can be clustered and GFS'd instead - but I would still
>> need to come up with a solution for the windows machines - so a NAS
>> solution is still required even if I do GFS to the linux boxes.
>>
>> Active / Passive on the NFS is fine.
>>      
> Why not start NFS/Samba on both machines with only the IP floating between
> them then?
>
>    
>> * Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
>> direct connected to each shelf  (no fiber switches yet - but will have
>> them later if I can make this all work); I've loaded RHEL 5.4 x86-64.
>>
>> * Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks w/onboard
>> fake raid1 = /dev/sda - basic install so /boot and LVM for the rest -
>> nothing special here (didn't do mdadm basically for simplicity of /dev/sda)
>>
>> * Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both shelves
>> - and since I don't have Fiber Switches yet - at the moment there is
>> only 1 path to each disk; however as I assume I will figure out a method
>> to make this work - I have enabled multipath - and therefore I have
>> consistent names to 28 disks.
>>
>> Here's my dilemma.  How do I best add Redundancy to the Disks, removing
>> as many single points of failure, and preserving as much diskspace as
>> possible?
>>
>> My initial thought was - to take "shelf1:disk1 and shelf2:disk1" and put
>> them into a software raid1 - mdadm; then put the resulting /dev/md0 into
>> a LVM.  When I need more diskspace, I just then create "shelf1:disk2 and
>> shelf2:disk2" as another software raid1 then just add the new "/dev/md1"
>> into the LVM and expand the FS. This handles a couple things in my mind:
>>
>> 1. Each shelf is really a FC-AL so it's possible that a single disk
>> going nuts could flood the FC-AL and all the disks in that shelf go poof
>> until the controller can figure itself out and/or the bad disk is removed.
>>
>> 2. Efficient I am retaining 50% storage capacity after redundancy - if I
>> can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all bandwidth
>> used is spread across the 2 HBA fibers and nothing goes over the TCP
>> network.  Conversely DRBD doesn't excite me much - as I then have to do
>> both raid in the shelf (probably still with MDADM) and then I add TCP
>> (ethernet) based RAID1 between the nodes - and when all is said and done
>> - I only the have 25% of storage capacity still available after redundancy.
>>
>> 3. I easy to add more diskspace - as each new mirror (software raid1)
>> can just be added to an existing LVM.
>>
>>      
> You may create RAID1 (between the two shelfs) over RAID6 (on the disks from
> the same shelf), so you will loose only 2 more disks per shelf or about 40%
> storage space left, but more stable and faster. Or several RAID6 arrays with
> 2+2 disks from each shelf - again 50% storage space, but better performance
> with the same chance for data loss like with several RAID1 ... the resulting
> mdX you may add to LVM and use the logical volumes
>
>    
>>    From what I can find messing with Luci (Conga) though... is - I don't
>> see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so would
>> my idea even work  (I have found some posts asking for a mdadm resource
>> script but I've seen no response)?  I also see with RHEL 5.3 LVM has
>> mirrors that can be clustered now - is this the right answer?  I've done
>> a ton of reading but everything I've dug up so far; assumes that the
>> fiber devices are being presented by a SAN that is doing the redundancy
>> before the RHEL box sees the disk... or... there are a ton of examples
>> of where fiber is not in the picture and there are a bunch of locally
>> attached hosts presenting storage onto the TCP (ethernet) - but I've not
>> found nearly anything on my situation...
>>
>> So... here I am... :-)  I really just have 2 nodes - who can both see -
>> a bunch of disks (JBOD) and I want to present them to multiple hosts via
>> NFS (required) or GFS (to linux boxes only).
>>
>>      
> if the Windows and Linux data are different volumes it is better to leave the
> GFS partition(s) available only via iSCSI to the linux nodes participating in
> the cluster and not to mount it/them locally for the NFS/Samba shares, but if
> the data should be the same you may go even Active/Active with GFS over iSCSI
> [over CLVM and/or] [over DRBD] over RAID and use NFS/Samba over GFS as a
> service in the cluster. It all depends on how the data will be used from the
> storage
>
>    
>> All ideas - are greatly appreciated!
>>
>> -Michael
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>      
>