Disk IO issues

Thu Jan 1 00:45:18 UTC 2009

On Wed, 31 Dec 2008, Greg Swift wrote:

> On Wed, Dec 31, 2008 at 17:35, Mike McGrath <mmcgrath at redhat.com> wrote:
>       On Wed, 31 Dec 2008, Corey Chandler wrote:
>
>       > Mike McGrath wrote:
>       > > Lets pool some knowledge together because at this point, I'm missing
>       > > something.
>       > >
>       > > I've been doing all measurements with sar as bonnie, etc, causes builds to
>       > > timeout.
>       > >
>       > > Problem: We're seeing slower then normal disk IO.  At least I think we
>       > > are.  This is a PERC5/E and MD1000 array.
>       > >
>       >
>       > 1. Are we sure the array hasn't lost a drive?
>
> I can't physically look at the drive (they're a couple hundred miles away)
> but we've seen no reports of it (via the drac anyway).  I'll have to get
> the raid software on there to be for sure.  I'd think a degraded raid
> array would affect both direct block access and file level access.
>
> > 2. What's your scheduler set to?  CFQ tends to not work in many applications
> > where the deadline scheduler works better...
> >
>
> I'd tried other schedulers earlier but they didn't seem to make much of a
> difference.  Even still, I'll get dealine setup and take a look.
>
> At least we've got the dd and cat problem figured out.  Now to figure out
> why there's such a discrepancy between file level reads and block level
> reads.  Anyone else have an array of this type and size to run those tests
> on?  I'd be curious to see what others are getting.
>
>
> we are working on a rhel3 to 5 migration at my job.  We have 2 primary filesystems.  one is large database files and the
> other is lots of small documents.  As we were testing backup software for rhel5 we noticed a 60% decrease in speed moving
> from rhel3 to rhel5 with the same file system, but only on the document filesystem, the db file system was perfectly
> snappy.
>

Our files are some smaller logs, but mostly rpms.

> After a lot of troubleshooting it was deemed to be related to the dir_index btree hash.  The path was to long before
> there was a difference in the names of the files, making the index incredibly slow.  Removing dir_index recovered a bit
> of the difference, but didn't resolve the issue.  A quick rename of one of the base directories recovered almost the
> entire 60%.
>

I'd be curious to hear more about this.  How long was your path?  Our
paths aren't short but I don't think they'd be approaching any limits.
For example:

/mnt/koji/packages/nagios/3.0.5/1.fc11/x86_64/nagios-3.0.5-1.fc11.x86_64.rpm

> Thought I'd at least throw it out there, although I'm not sure that it is the exact issue, it doesn't hurt to have it
> floating in the background.
>

thanks.

	-Mike