[Linux-cluster] Storage Cluster Newbie Questions - any help with answers greatly appreciated!

Mon Mar 15 22:04:57 UTC 2010

Ok... I don't feel quite as silly... as I had roasted one of my test 
machines (unable to mount /) - I had to do a full rebuild from scratch.

Well... It seems I had 2 of 3.

For anyone who ever needs to disable their Qlogic fiber card from 
auto-starting in the future - do all of the following 3 changes.

1. Modify /etc/modprobe.conf - and comment out the line "#alias 
scsi_hostadapter2 qla2xxx"
2. Rebuild the init - I use this - "mkinitrd -f /boot/initrd-$(uname 
-r).img $(uname -r)"
3. Add "blacklist qla2xxx" - to /etc/modprobe.d/blacklist

-Michael

Michael @ Professional Edge LLC wrote, On 3/15/2010 11:05 AM:
> DOH... ok I feel stupid... :-)
>
> Well 'blacklist scsi_transport_fc' didn't work... but - 'blacklist 
> qla2xxx' works fine.
>
> Amazing how all the docs I found said to use - remove qla2xxx modprobe 
> -r - kind of stuff... and all that I really had to do was add the 
> silly thing to the blacklist... :-)
>
> OK... at least I can proceed with my testing again.  Thank You!
>
> -Michael
>
> Kaloyan Kovachev wrote, On 3/15/2010 1:41 AM:
>> Hello,
>>
>> On Sun, 14 Mar 2010 22:28:20 -0700, Michael @ Professional Edge LLC 
>> wrote
>>> Kaloyan,
>>>
>>> I agree - disabling the qla2xxx driver (Qlogic HBA) from starting at
>>> boot would be the simple method of handling the issue.  Then I just put
>>> all the commands to load the driver, multipath, mdadm, etc... inside
>>> cluster scripts.
>>>
>>> Amusingly it seems I am missing something very basic - as I can't seem
>>> to figure out how to not load the qla2xxx driver.
>>>
>>> Do you happen to know the syntax to make the qla2xxx driver not load at
>>> boot automatically?
>>>
>>> I've been messing with /etc/modprobe.conf - and mkinird - but no
>>> combination has resulted in the - qla2xxx being properly disabled 
>>> during
>>> boot - I did accomplish making one of my nodes unable to mount it's 
>>> root
>>> partition - but I don't consider that success. :-)
>>>
>> Is 'blackist scsi_transport_fc' not enough? What other modules are 
>> loaded? If
>> you blacklist the one, that most others depend on they should not load.
>>
>>> As for your 2nd idea; I have seen folks doing something similar in that
>>> mode; when the disks are local to the node.  But in my case - all nodes
>>> - can already see all LUNs - so I dont really have any need to do an
>>> iSCSI export - appreciate the thought though.
>>>
>> The idea was actually not to export them, but to run mdamd 
>> simultaneously on
>> both nodes. But the problem is when just one of the nodes looses its 
>> link, to
>> just one of the arrays
>>
>>> -Michael
>>>
>>> Kaloyan Kovachev wrote, On 3/4/2010 10:28 AM:
>>>> On Thu, 04 Mar 2010 09:26:35 -0800, Michael @ Professional Edge LLC 
>>>> wrote
>>>>
>>>>> Hello Kaloyan,
>>>>>
>>>>> Thank you for the thoughts.
>>>>>
>>>>> You are correct when I said - "Active / Passive" - I simply meant 
>>>>> that I
>>>>> had no need for "Active / Active" - and floating IP on the NFS share
>>>>> would be exactly what I had in mind.
>>>>>
>>>>> The software raid - of any type, raid1,5,6 etc... is the issue.  From
>>>>> what I have read - mdadm - is not cluster aware... and... since all
>>>>> disks are seen by all RHEL nodes. - As Leo mentioned; some method to
>>>>> disable the kernel from finding detecting and attempting to 
>>>>> assemble all
>>>>> the available software raids - is a major problem.  This is why I was
>>>>> asking if perhaps - CLVM w/mirroring would be a better method.  
>>>>> Although
>>>>> since it was just introduced in RHEL 5.3 - I am a bit leery.
>>>>>
>>>>>
>>>> I am not common with FC, so maybe completely wrong here, but if you 
>>>> do not
>>>> start multipath and load your HBA drivers on boot, how the FC disks 
>>>> based
>>>> software raid will start at all?
>>>>
>>>> even if started you may still issue 'mdadm --stop /dev/mdX' in S00 as
>>>> suggested from Leo and assemble it again as a cluster service later
>>>>
>>>>
>>>>> Sorry for being confusing - yes - the linux machines will have a
>>>>> completely different filesystem share; than the windows machines.  My
>>>>> original thought was I would do "node#1 primary nfs share (floating
>>>>> ip#1) to linux machines w/node#2 backup" - and then "node#2 
>>>>> primary nfs
>>>>> or samba share (floating ip#2) to windows machines w/node#1 backup".
>>>>>
>>>>> Any more thoughts you have would be appreciated... as my original 
>>>>> plan
>>>>> with MDADM w/HA-LVM - so far doesn't seem very possible.
>>>>>
>>>>>
>>>> Then there are two services each with its own raid array and ip, 
>>>> but basically
>>>> the same
>>>>
>>>> another idea ... not using it in production, but i had good results 
>>>> (testing)
>>>> with (small) software raid5 array from 3 nodes ... Local device on 
>>>> each node
>>>> exported via iSCSI and software RAID5 over the imported ones which 
>>>> is then
>>>> used from LVM. Weird, but worked and the only problem was that on 
>>>> every reboot
>>>> of any node the raid is rebuilt, which i won't happen in your case 
>>>> as you will
>>>> see all the disks in sync (after the initial sync done on only one 
>>>> of them)
>>>> ... you may give it a try
>>>>
>>>>
>>>>> -Michael
>>>>>
>>>>> Kaloyan Kovachev wrote, On 3/4/2010 8:52 AM:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, 03 Mar 2010 11:16:07 -0800, Michael @ Professional Edge 
>>>>>> LLC wrote
>>>>>>
>>>>>>
>>>>>>> Hail Linux Cluster gurus,
>>>>>>>
>>>>>>> I have researched myself into a corner and am looking for 
>>>>>>> advice.  I've
>>>>>>> never been a "clustered storage guy", so I apologize for the 
>>>>>>> potentially
>>>>>>> naive set of questions.  ( I am savvy on most other aspects of 
>>>>>>> networks,
>>>>>>> hardware, OS's etc... but not storage systems).
>>>>>>>
>>>>>>> I've been handed ( 2 ) x86-64 boxes w/2 local disks each; and ( 2 )
>>>>>>> FC-AL disk shelves w/14 disks each; and told to make a mini 
>>>>>>> NAS/SAN (NFS
>>>>>>> required, GFS optional).  If I can get this working reliably 
>>>>>>> then there
>>>>>>> appear to be about another ( 10 ) FC-AL shelves and a couple of 
>>>>>>> Fiber
>>>>>>> Switches laying around that will be handed to me.
>>>>>>>
>>>>>>> NFS filesystems will be mounted by several (less than 6) linux 
>>>>>>> machines,
>>>>>>> and a few (less than 4) windows machines [[ microsoft nfs client 
>>>>>>> ]] -
>>>>>>> all more or less doing web server type activities (so lots of 
>>>>>>> reads from
>>>>>>> a shared filesystem - log files not on NFS so no issue with high IO
>>>>>>> writes).  I'm locked into NFS v3 for various reasons.  
>>>>>>> Optionally the
>>>>>>> linux machines can be clustered and GFS'd instead - but I would 
>>>>>>> still
>>>>>>> need to come up with a solution for the windows machines - so a NAS
>>>>>>> solution is still required even if I do GFS to the linux boxes.
>>>>>>>
>>>>>>> Active / Passive on the NFS is fine.
>>>>>>>
>>>>>>>
>>>>>> Why not start NFS/Samba on both machines with only the IP 
>>>>>> floating between
>>>>>> them then?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> * Each of the ( 2 ) x86-64 machines have a Qlogic dual HBA 1 fiber
>>>>>>> direct connected to each shelf  (no fiber switches yet - but 
>>>>>>> will have
>>>>>>> them later if I can make this all work); I've loaded RHEL 5.4 
>>>>>>> x86-64.
>>>>>>>
>>>>>>> * Each of the ( 2 ) RHEL 5.4 boxes - used the 2 local disks 
>>>>>>> w/onboard
>>>>>>> fake raid1 = /dev/sda - basic install so /boot and LVM for the 
>>>>>>> rest -
>>>>>>> nothing special here (didn't do mdadm basically for simplicity 
>>>>>>> of /dev/sda)
>>>>>>>
>>>>>>> * Each of the ( 2 ) RHEL 5.4 boxes can see all the disks on both 
>>>>>>> shelves
>>>>>>> - and since I don't have Fiber Switches yet - at the moment 
>>>>>>> there is
>>>>>>> only 1 path to each disk; however as I assume I will figure out 
>>>>>>> a method
>>>>>>> to make this work - I have enabled multipath - and therefore I have
>>>>>>> consistent names to 28 disks.
>>>>>>>
>>>>>>> Here's my dilemma.  How do I best add Redundancy to the Disks, 
>>>>>>> removing
>>>>>>> as many single points of failure, and preserving as much 
>>>>>>> diskspace as
>>>>>>> possible?
>>>>>>>
>>>>>>> My initial thought was - to take "shelf1:disk1 and shelf2:disk1" 
>>>>>>> and put
>>>>>>> them into a software raid1 - mdadm; then put the resulting 
>>>>>>> /dev/md0 into
>>>>>>> a LVM.  When I need more diskspace, I just then create 
>>>>>>> "shelf1:disk2 and
>>>>>>> shelf2:disk2" as another software raid1 then just add the new 
>>>>>>> "/dev/md1"
>>>>>>> into the LVM and expand the FS. This handles a couple things in 
>>>>>>> my mind:
>>>>>>>
>>>>>>> 1. Each shelf is really a FC-AL so it's possible that a single disk
>>>>>>> going nuts could flood the FC-AL and all the disks in that shelf 
>>>>>>> go poof
>>>>>>> until the controller can figure itself out and/or the bad disk 
>>>>>>> is removed.
>>>>>>>
>>>>>>> 2. Efficient I am retaining 50% storage capacity after 
>>>>>>> redundancy - if I
>>>>>>> can do the "shelf1:disk1 + shelf2:disk2" mirrors; plus all 
>>>>>>> bandwidth
>>>>>>> used is spread across the 2 HBA fibers and nothing goes over the 
>>>>>>> TCP
>>>>>>> network.  Conversely DRBD doesn't excite me much - as I then 
>>>>>>> have to do
>>>>>>> both raid in the shelf (probably still with MDADM) and then I 
>>>>>>> add TCP
>>>>>>> (ethernet) based RAID1 between the nodes - and when all is said 
>>>>>>> and done
>>>>>>> - I only the have 25% of storage capacity still available after 
>>>>>>> redundancy.
>>>>>>>
>>>>>>> 3. I easy to add more diskspace - as each new mirror (software 
>>>>>>> raid1)
>>>>>>> can just be added to an existing LVM.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> You may create RAID1 (between the two shelfs) over RAID6 (on the 
>>>>>> disks from
>>>>>> the same shelf), so you will loose only 2 more disks per shelf or 
>>>>>> about 40%
>>>>>> storage space left, but more stable and faster. Or several RAID6 
>>>>>> arrays with
>>>>>> 2+2 disks from each shelf - again 50% storage space, but better 
>>>>>> performance
>>>>>> with the same chance for data loss like with several RAID1 ... 
>>>>>> the resulting
>>>>>> mdX you may add to LVM and use the logical volumes
>>>>>>
>>>>>>
>>>>>>
>>>>>>>        From what I can find messing with Luci (Conga) though... 
>>>>>>> is - I don't
>>>>>>> see any resource scripts listed for - "mdadm" (on RHEL 5.4) - so 
>>>>>>> would
>>>>>>> my idea even work  (I have found some posts asking for a mdadm 
>>>>>>> resource
>>>>>>> script but I've seen no response)?  I also see with RHEL 5.3 LVM 
>>>>>>> has
>>>>>>> mirrors that can be clustered now - is this the right answer?  
>>>>>>> I've done
>>>>>>> a ton of reading but everything I've dug up so far; assumes that 
>>>>>>> the
>>>>>>> fiber devices are being presented by a SAN that is doing the 
>>>>>>> redundancy
>>>>>>> before the RHEL box sees the disk... or... there are a ton of 
>>>>>>> examples
>>>>>>> of where fiber is not in the picture and there are a bunch of 
>>>>>>> locally
>>>>>>> attached hosts presenting storage onto the TCP (ethernet) - but 
>>>>>>> I've not
>>>>>>> found nearly anything on my situation...
>>>>>>>
>>>>>>> So... here I am... :-)  I really just have 2 nodes - who can 
>>>>>>> both see -
>>>>>>> a bunch of disks (JBOD) and I want to present them to multiple 
>>>>>>> hosts via
>>>>>>> NFS (required) or GFS (to linux boxes only).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> if the Windows and Linux data are different volumes it is better to
>> leave the
>>>>>> GFS partition(s) available only via iSCSI to the linux nodes
>> participating in
>>>>>> the cluster and not to mount it/them locally for the NFS/Samba 
>>>>>> shares,
>> but if
>>>>>> the data should be the same you may go even Active/Active with 
>>>>>> GFS over
>> iSCSI
>>>>>> [over CLVM and/or] [over DRBD] over RAID and use NFS/Samba over 
>>>>>> GFS as a
>>>>>> service in the cluster. It all depends on how the data will be 
>>>>>> used from the
>>>>>> storage
>>>>>>
>>>>>>
>>>>>>
>>>>>>> All ideas - are greatly appreciated!
>>>>>>>
>>>>>>> -Michael
>>>>>>>
>>>>>>> -- 
>>>>>>> Linux-cluster mailing list
>>>>>>> Linux-cluster at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>>
>>>>>>>
>>>>>>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster