[Linux-cluster] Storage Cluster Newbie Questions - any help with answers greatly appreciated!

Thu Mar 4 04:27:36 UTC 2010

Hello again Leo,

I'm not exactly sure I follow you on this one.

# ps -efaww | grep md0
root      2629    75  0 18:30 ?        00:00:02 [md0_raid1]

# ps -efaww | grep 75
root        75     1  0 18:29 ?        00:00:00 [kthread]

# dmesg | grep "md:" | less
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=28
md: bitmap version 4.39
md: Autodetecting RAID arrays.

So - what you are describing below - would imply you want me to write a 
script to detect and kill ??  Would that even work?  As shown below - I 
assume I have misunderstood - as this does not seem possible.

root      2629    75  0 18:30 ?        00:00:02 [md0_raid1]
# kill 2629
# ps -efaww | grep md0
root      2629    75  0 18:30 ?        00:00:02 [md0_raid1]
# kill -9 2629
# ps -efaww | grep md0
root      2629    75  0 18:30 ?        00:00:02 [md0_raid1]

Wouldn't it be more appropriate - to perhaps yank this out of the 
compiled kernel - build a custom kernel with it as a loadable module 
instead; and then have the cluster script - load the module?

-Michael

Leo Pleiman wrote, On 3/3/2010 1:09 PM:
> If you remove it from the initrd then it won't be available to the cluster software. At boot time both nodes will try to start the md devices. You'll need to add an init script to stop the md devices very early in the boot process, an S00 script would be appropriate. That will ensure that when your cluster resource script starts the md devices it will have exclusive use of the FCAL drives. You'll need to make sure your cluster md resource script has extensive error checking. When the service is migrated between nodes the md device must be stopped before allowing the second node to start the md devices.
>
> Leo J Pleiman
> Senior Consultant
> Red Hat Consulting Services
> 410.688.3873
>
> "Red Hat Ranked as #1 Software Vendor for Fifth Time in CIO Insight Study"
>
> ----- Original Message -----
> From: "Michael @ Professional Edge LLC"<m3 at professionaledgellc.com>
> To: "Leo Pleiman"<lpleiman at redhat.com>
> Cc: "linux clustering"<linux-cluster at redhat.com>
> Sent: Wednesday, March 3, 2010 3:37:09 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Linux-cluster] Storage Cluster Newbie Questions - any help with answers greatly appreciated!
>
> Leo,
>
> Thanks for the quick response.  It's nice to know my initial thought was
> close to what you are recommending as well.
>
> As for the details - (disable MD startup); wouldn't this mean - I'd need
> to rebuild with something like - "mkinitrd --omit-raid-modules" - but if
> I did that... wouldn't that prevent the onboard (fake raid1) /dev/sda -
> from loading?  Or did you just mean - make sure that nothing is in the
> /etc/fstab?
>
> -Michael
>
> Leo Pleiman wrote, On 3/3/2010 11:45 AM:
>    
>> One solution would be to build the two machines as a two node cluster. Qdisk is normally recommended but two node clusters are supported without it. Use the cluster resource manager to control md management of the FCAL drives. You'll need to disable md startup and create a custom script to allow the cluster resource manager to start and stop the md devices. I would build three cluster resources; md disk, nfs, and samba making nfs and samba dependent on md disk. You'll need a failover domain consisting of both machines. When you're done you'll have on machine doing nothing but waiting for the services on the other machine to fail over. Personally, I would build each FCAL shelf as a RAID5 or RAID6 array, presenting md0 and md1 to LVM. I wouldn't use RAID1 unless you have very little confidence in your hardware. GFS can't be used with software raid and since you won't be using GFS you won't need a fence device.
>>
>> Leo J Pleiman
>> Senior Consultant
>> Red Hat Consulting Services
>> 410.688.3873
>>
>> "Red Hat Ranked as #1 Software Vendor for Fifth Time in CIO Insight Study"
>>
>> ----- Original Message -----
>> From: "Michael @ Professional Edge LLC"<m3 at professionaledgellc.com>
>> To: linux-cluster at redhat.com
>> Sent: Wednesday, March 3, 2010 2:16:07 PM GMT -05:00 US/Canada Eastern
>> Subject: [Linux-cluster] Storage Cluster Newbie Questions - any help with answers greatly appreciated!
>>
>> Hail Linux Cluster gurus,
>>
>> I have researched myself into a corner and am looking for advice.  I've
>> never been a "clustered storage guy", so I apologize for the potentially
>> naive set of questions.  ( I am savvy on most other aspects of networks,
>> hardware, OS's etc... but not storage systems).
>>
>> I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
>> FC-AL disk shelves w/14 disks each; and told to make a mini NAS/SAN (NFS
>> required, GFS optional).  If I can get this working reliably then there
>> appear to be about another ( 10 ) FC-AL shelves and a couple of Fiber
>> Switches laying around that will be handed to me.
>>
>> NFS filesystems will be mounted by several (less than 6) linux machines,
>> and a few (less than 4) windows machines [[ microsoft nfs client ]] -
>> all more or less doing web server type activities (so lots of reads from
>> a shared filesystem - log files not on NFS so no issue with high IO
>> writes).  I'm locked into NFS v3 for various reasons.  Optionally the
>> linux machines can be clustered and GFS'd instead - but I would still
>> need to come up with a solution for the windows machines - so a NAS
>> solution is still required even if I do GFS to the linux boxes.
>>
>> Active / Passive on the NFS is fine.
>>
>> * Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
>> direct connected to each shelf  (no fiber switches yet - but will have
>> them later if I can make this all work); I've loaded RHEL 5.4 x86-64.
>>
>> * Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks w/onboard
>> fake raid1 = /dev/sda - basic install so /boot and LVM for the rest -
>> nothing special here (didn't do mdadm basically for simplicity of /dev/sda)
>>
>> * Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both shelves
>> - and since I don't have Fiber Switches yet - at the moment there is
>> only 1 path to each disk; however as I assume I will figure out a method
>> to make this work - I have enabled multipath - and therefore I have
>> consistent names to 28 disks.
>>
>> Here's my dilemma.  How do I best add Redundancy to the Disks, removing
>> as many single points of failure, and preserving as much diskspace as
>> possible?
>>
>> My initial thought was - to take "shelf1:disk1 and shelf2:disk1" and put
>> them into a software raid1 - mdadm; then put the resulting /dev/md0 into
>> a LVM.  When I need more diskspace, I just then create "shelf1:disk2 and
>> shelf2:disk2" as another software raid1 then just add the new "/dev/md1"
>> into the LVM and expand the FS. This handles a couple things in my mind:
>>
>> 1. Each shelf is really a FC-AL so it's possible that a single disk
>> going nuts could flood the FC-AL and all the disks in that shelf go poof
>> until the controller can figure itself out and/or the bad disk is removed.
>>
>> 2. Efficient I am retaining 50% storage capacity after redundancy - if I
>> can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all bandwidth
>> used is spread across the 2 HBA fibers and nothing goes over the TCP
>> network.  Conversely DRBD doesn't excite me much - as I then have to do
>> both raid in the shelf (probably still with MDADM) and then I add TCP
>> (ethernet) based RAID1 between the nodes - and when all is said and done
>> - I only the have 25% of storage capacity still available after redundancy.
>>
>> 3. I easy to add more diskspace - as each new mirror (software raid1)
>> can just be added to an existing LVM.
>>
>>    From what I can find messing with Luci (Conga) though... is - I don't
>> see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so would
>> my idea even work  (I have found some posts asking for a mdadm resource
>> script but I've seen no response)?  I also see with RHEL 5.3 LVM has
>> mirrors that can be clustered now - is this the right answer?  I've done
>> a ton of reading but everything I've dug up so far; assumes that the
>> fiber devices are being presented by a SAN that is doing the redundancy
>> before the RHEL box sees the disk... or... there are a ton of examples
>> of where fiber is not in the picture and there are a bunch of locally
>> attached hosts presenting storage onto the TCP (ethernet) - but I've not
>> found nearly anything on my situation...
>>
>> So... here I am... :-)  I really just have 2 nodes - who can both see -
>> a bunch of disks (JBOD) and I want to present them to multiple hosts via
>> NFS (required) or GFS (to linux boxes only).
>>
>> All ideas - are greatly appreciated!
>>
>> -Michael
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>