[Linux-cluster] GFS + CORAID Performance Problem

Tue Dec 12 16:55:34 UTC 2006

Hi Jayson,

I just plugged the hosts directly into the Coraid for troubleshooting
purposes.  This is how Coraid setup the machines in their published
benchmarks, so I figured it would be safe.  I actually have two Asante
IC36524 switches dedicated for Coraid storage with the intention of having
redundant paths.  My dual-port PCIe Ethernet cards didn't arrive until
yesterday, so I only had a single port on each host to connect to each
switch.  This isolated one of the hosts to the bad port on the SR1520.  I
should have found this sooner.

On a separate note, I have modified the fence_vixel script to perform
fencing on the Asante switches by shutting down the appropriate switch
ports.  These Asante switches use what appears to be a cloned Cisco IOS
interface, so this script should work for any Ethernet switch that also has
the IOS style telnet interface or will at least get you close.  It works on
the command-line, but I haven't actually tested it in the cluster through a
real fence operation.  I'd be happy to share it if it would be helpful.

How are you fencing your cluster nodes?  I specify the Vixel fence in the
configuration GUI since I can't find a way to easily add a custom fence
agent.

What is the benefit of using qdisk?

Thanks,
Tom

On 12/12/06, Jayson Vantuyl <jvantuyl at engineyard.com> wrote:
>
> My thanks to Jayson and especially Wendy for providing so much help with
> this issue.  With a little help from Coraid, I've troubleshot the
> performance issues down to one of the two ports on the Coraid device.  In
> the end, I was able to move the performance problem from on of my two hosts
> to the other just by swapping ports.  I'll follow up with Coraid to see if I
> have a hardware problem.
>
> I don't know what Coraid recommends, but I usually recommend not plugging
> devices directly into the ports.  I vastly prefer having a good gigabit
> switch there instead.  Here at Engine Yard, we actually have two switches
> that provide redundancy across either port.  The current AoE driver is good
> enough to use both networks to spread the load if you have two independent
> network paths, so you also get better performance.  We actually have
> separate cards in each of our servers to prevent failure of an individual
> network card from being an issue (and AoE should handle this well as long as
> the driver doesn't crash in this state).
>
> In terms of clustering, having redundant networks is very
> handy--especially if you use a qdisk.
>
> It's really nice to have such a great level of community support.  Wendy,
> I'd be happy to share the particulars on my deployment once I get things
> stabilized.
>
> I'd be interested too.
>
> --
> Jayson Vantuyl
> Systems Architect
> *Engine Yard*
> jvantuyl at engineyard.com
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061212/ae9eec89/attachment.htm>