[Linux-cluster] Storage Cluster Newbie Questions - any helpwith answers greatly appreciated!

Mon Mar 15 14:52:59 UTC 2010

Hi Michael,

a way to prevent both system from thinking that they're responsible for the
FC-devices is to use LVM for building a host-based-mirror and  LVM-filter
and -tags:

- Set up your RHEL-Cluster
- Modify /etc/lvm/lvm.conf: volume_list =
["local-vg-if-used-to-store-/","@local_hostname-as-in-cluster.conf"] (all
systems of the cluster)
- rebuild initrd, reboot
--> LVM only accesses local LVs and LVs with the local hostname in the
lvm-tag 
- create an mirrored lv with lvcreate --addtag or lvchange --addtag so the
lv will be active on one system

RH-Cluster supports floating LVs on its own, just add the LV + FS as
resource.

If you're not using a LUN for mirror-logging, your're moving the LVs from
one system to the other at the cost of one full rebuild of the host based
mirror.

Regards
Andreas

On Sun, 14 Mar 2010 22:28:20 -0700, "Michael @ Professional Edge LLC"
<m3 at professionaledgellc.com> wrote:
> Kaloyan,
> 
> I agree - disabling the qla2xxx driver (Qlogic HBA) from starting at
> boot would be the simple method of handling the issue.  Then I just put
> all the commands to load the driver, multipath, mdadm, etc... inside
> cluster scripts.
> 
> Amusingly it seems I am missing something very basic - as I can't seem
> to figure out how to not load the qla2xxx driver.
> 
> Do you happen to know the syntax to make the qla2xxx driver not load at
> boot automatically?
> 
> I've been messing with /etc/modprobe.conf - and mkinird - but no
> combination has resulted in the - qla2xxx being properly disabled during
> boot - I did accomplish making one of my nodes unable to mount it's root
> partition - but I don't consider that success. :-)
> 
> 
> As for your 2nd idea; I have seen folks doing something similar in that
> mode; when the disks are local to the node.  But in my case - all nodes
> - can already see all LUNs - so I dont really have any need to do an
> iSCSI export - appreciate the thought though.
> 
> -Michael
> 
> 
> Kaloyan Kovachev wrote, On 3/4/2010 10:28 AM:
>> On Thu, 04 Mar 2010 09:26:35 -0800, Michael @ Professional Edge LLC
> wrote
>>
>>> Hello Kaloyan,
>>>
>>> Thank you for the thoughts.
>>>
>>> You are correct when I said - "Active / Passive" - I simply meant that
> I
>>> had no need for "Active / Active" - and floating IP on the NFS share
>>> would be exactly what I had in mind.
>>>
>>> The software raid - of any type, raid1,5,6 etc... is the issue.  From
>>> what I have read - mdadm - is not cluster aware... and... since all
>>> disks are seen by all RHEL nodes. - As Leo mentioned; some method to
>>> disable the kernel from finding detecting and attempting to assemble
> all
>>> the available software raids - is a major problem.  This is why I was
>>> asking if perhaps - CLVM w/mirroring would be a better method. 
> Although
>>> since it was just introduced in RHEL 5.3 - I am a bit leery.
>>>
>>>
>> I am not common with FC, so maybe completely wrong here, but if you do
> not
>> start multipath and load your HBA drivers on boot, how the FC disks
> based
>> software raid will start at all?
>>
>> even if started you may still issue 'mdadm --stop /dev/mdX' in S00 as
>> suggested from Leo and assemble it again as a cluster service later
>>
>>
>>> Sorry for being confusing - yes - the linux machines will have a
>>> completely different filesystem share; than the windows machines.  My
>>> original thought was I would do "node#1 primary nfs share (floating
>>> ip#1) to linux machines w/node#2 backup" - and then "node#2 primary nfs
>>> or samba share (floating ip#2) to windows machines w/node#1 backup".
>>>
>>> Any more thoughts you have would be appreciated... as my original plan
>>> with MDADM w/HA-LVM - so far doesn't seem very possible.
>>>
>>>
>> Then there are two services each with its own raid array and ip, but
> basically
>> the same
>>
>> another idea ... not using it in production, but i had good results
> (testing)
>> with (small) software raid5 array from 3 nodes ... Local device on each
> node
>> exported via iSCSI and software RAID5 over the imported ones which is
> then
>> used from LVM. Weird, but worked and the only problem was that on every
> reboot
>> of any node the raid is rebuilt, which i won't happen in your case as
> you will
>> see all the disks in sync (after the initial sync done on only one of
> them)
>> ... you may give it a try
>>
>>
>>> -Michael
>>>
>>> Kaloyan Kovachev wrote, On 3/4/2010 8:52 AM:
>>>
>>>> Hi,
>>>>
>>>> On Wed, 03 Mar 2010 11:16:07 -0800, Michael @ Professional Edge LLC
> wrote
>>>>
>>>>
>>>>> Hail Linux Cluster gurus,
>>>>>
>>>>> I have researched myself into a corner and am looking for advice. 
> I've
>>>>> never been a "clustered storage guy", so I apologize for the
> potentially
>>>>> naive set of questions.  ( I am savvy on most other aspects of
> networks,
>>>>> hardware, OS's etc... but not storage systems).
>>>>>
>>>>> I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
>>>>> FC-AL disk shelves w/14 disks each; and told to make a mini NAS/SAN
> (NFS
>>>>> required, GFS optional).  If I can get this working reliably then
> there
>>>>> appear to be about another ( 10 ) FC-AL shelves and a couple of Fiber
>>>>> Switches laying around that will be handed to me.
>>>>>
>>>>> NFS filesystems will be mounted by several (less than 6) linux
> machines,
>>>>> and a few (less than 4) windows machines [[ microsoft nfs client ]] -
>>>>> all more or less doing web server type activities (so lots of reads
> from
>>>>> a shared filesystem - log files not on NFS so no issue with high IO
>>>>> writes).  I'm locked into NFS v3 for various reasons.  Optionally the
>>>>> linux machines can be clustered and GFS'd instead - but I would still
>>>>> need to come up with a solution for the windows machines - so a NAS
>>>>> solution is still required even if I do GFS to the linux boxes.
>>>>>
>>>>> Active / Passive on the NFS is fine.
>>>>>
>>>>>
>>>> Why not start NFS/Samba on both machines with only the IP floating
> between
>>>> them then?
>>>>
>>>>
>>>>
>>>>> * Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
>>>>> direct connected to each shelf  (no fiber switches yet - but will
> have
>>>>> them later if I can make this all work); I've loaded RHEL 5.4 x86-64.
>>>>>
>>>>> * Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks w/onboard
>>>>> fake raid1 = /dev/sda - basic install so /boot and LVM for the rest -
>>>>> nothing special here (didn't do mdadm basically for simplicity of
> /dev/sda)
>>>>>
>>>>> * Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both
> shelves
>>>>> - and since I don't have Fiber Switches yet - at the moment there is
>>>>> only 1 path to each disk; however as I assume I will figure out a
> method
>>>>> to make this work - I have enabled multipath - and therefore I have
>>>>> consistent names to 28 disks.
>>>>>
>>>>> Here's my dilemma.  How do I best add Redundancy to the Disks,
> removing
>>>>> as many single points of failure, and preserving as much diskspace as
>>>>> possible?
>>>>>
>>>>> My initial thought was - to take "shelf1:disk1 and shelf2:disk1" and
> put
>>>>> them into a software raid1 - mdadm; then put the resulting /dev/md0
> into
>>>>> a LVM.  When I need more diskspace, I just then create "shelf1:disk2
> and
>>>>> shelf2:disk2" as another software raid1 then just add the new
> "/dev/md1"
>>>>> into the LVM and expand the FS. This handles a couple things in my
> mind:
>>>>>
>>>>> 1. Each shelf is really a FC-AL so it's possible that a single disk
>>>>> going nuts could flood the FC-AL and all the disks in that shelf go
> poof
>>>>> until the controller can figure itself out and/or the bad disk is
> removed.
>>>>>
>>>>> 2. Efficient I am retaining 50% storage capacity after redundancy -
> if I
>>>>> can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all bandwidth
>>>>> used is spread across the 2 HBA fibers and nothing goes over the TCP
>>>>> network.  Conversely DRBD doesn't excite me much - as I then have to
> do
>>>>> both raid in the shelf (probably still with MDADM) and then I add TCP
>>>>> (ethernet) based RAID1 between the nodes - and when all is said and
> done
>>>>> - I only the have 25% of storage capacity still available after
> redundancy.
>>>>>
>>>>> 3. I easy to add more diskspace - as each new mirror (software raid1)
>>>>> can just be added to an existing LVM.
>>>>>
>>>>>
>>>>>
>>>> You may create RAID1 (between the two shelfs) over RAID6 (on the disks
> from
>>>> the same shelf), so you will loose only 2 more disks per shelf or
> about 40%
>>>> storage space left, but more stable and faster. Or several RAID6
> arrays with
>>>> 2+2 disks from each shelf - again 50% storage space, but better
> performance
>>>> with the same chance for data loss like with several RAID1 ... the
> resulting
>>>> mdX you may add to LVM and use the logical volumes
>>>>
>>>>
>>>>
>>>>>      From what I can find messing with Luci (Conga) though... is - I
> don't
>>>>> see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so
> would
>>>>> my idea even work  (I have found some posts asking for a mdadm
> resource
>>>>> script but I've seen no response)?  I also see with RHEL 5.3 LVM has
>>>>> mirrors that can be clustered now - is this the right answer?  I've
> done
>>>>> a ton of reading but everything I've dug up so far; assumes that the
>>>>> fiber devices are being presented by a SAN that is doing the
> redundancy
>>>>> before the RHEL box sees the disk... or... there are a ton of
> examples
>>>>> of where fiber is not in the picture and there are a bunch of locally
>>>>> attached hosts presenting storage onto the TCP (ethernet) - but I've
> not
>>>>> found nearly anything on my situation...
>>>>>
>>>>> So... here I am... :-)  I really just have 2 nodes - who can both see
> -
>>>>> a bunch of disks (JBOD) and I want to present them to multiple hosts
> via
>>>>> NFS (required) or GFS (to linux boxes only).
>>>>>
>>>>>
>>>>>
>>>> if the Windows and Linux data are different volumes it is better to
> leave the
>>>> GFS partition(s) available only via iSCSI to the linux nodes
> participating in
>>>> the cluster and not to mount it/them locally for the NFS/Samba shares,
> but if
>>>> the data should be the same you may go even Active/Active with GFS
> over iSCSI
>>>> [over CLVM and/or] [over DRBD] over RAID and use NFS/Samba over GFS as
> a
>>>> service in the cluster. It all depends on how the data will be used
> from the
>>>> storage
>>>>
>>>>
>>>>
>>>>> All ideas - are greatly appreciated!
>>>>>
>>>>> -Michael
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>>
>>>>
>>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster