Memory Performance Issue with Fedora Core 2 Kernels
James Foris
james.foris at med.ge.com
Mon Jul 26 19:22:23 UTC 2004
I am forwarding this message for a co-worker; his email to "fedora-list" keeps
geting bounced. Having said that, I have worked with him on this issue and
will be able to answer questions/describe the issues well enough for anyone
who is kind enough to reply.
What he tried to send follows below:
-------------------------------------------------------------------------------------------------------------------------
I have run into an issue with memory bandwidth using the Fedora Core
2 kernels and I need help. I don't know what is wrong, but something
killed performance of my custom driver when I ported it from RedHat 7.3
to Fedora Core 2. I believe I have narrowed it down to the kernel.
My driver requires a large amount of contiguous physical memory for
DMA from a PCI device. I use the 'mem=YYY' command line parameter to
reserve the top of physical RAM for my driver. Then I allow mapping
via mmap() calls to user space. The user space app then uses this
pointer to save the data to disk.
Normally the user space app writes to disk using the mmap()'d pointer as
the source. With the new kernels these writes are taking way too long
(around 20 MB/s). Even when the write goes to /dev/shm, the speed is
limited to around 20 MB/s. A memcpy from the mmap()'d memory seems to
have no such slowdown.
This driver has been in use for some time on a RedHat 7.3 (2.4) kernel
with no issues. To narrow the problem down, I removed all code that talks
to the HW and created a driver that only maps host memory. The pattern
I use is shown below. It is almost identical to the code in the kernel
mem driver (...drivers/char/mem.c).
dev_mmap(...)
{
...
u32 remap_addr = num_physpages*PAGE_SIZE; // Top of memory
...
vma->vm_flags |= VM_IO;
vma->vm_flags |= VM_RESERVED;
status = remap_page_range(
vma,
vma->vm_start;
remap_addr,
vma->vm_end - vma->vm_start,
vma->vm_page_prot );
if( status )
return -EAGAIN;
...
}
I created a test program that opens the device, calls mmap() to get a
pointer, then saves 32 MB to /dev/shm and times it with the wall clock, as follows:
dev_fd = open("/dev/mydevice",O_RDWR,0);
shm_fd = open("/dev/shm/foo.dat",O_O_TRUNC|O_CREAT,0666);
void *devptr = mmap(0,0x2000000,PROT_READ,MAP_SHARED,dev_fd,0);
msync(devptr,num_bytes,MS_SYNC|MS_INVALIDATE);
double t1 = /* time in seconds using gettimeofday() */
int n = write(shm_fd,devptr,0x2000000);
double t2 = /* time in seconds using gettimeofday() */
/* check for errors */
I have tried this on several platforms and kernels and the results vary,
but the common denominator seems to be:
Fedora kernel + 32-bit Intel = poor performance (see below)
Processor Kernel Chipset Arch Results
Opteron 2.6.5-1.358smp AMD 64-bit Pass
Opteron 2.6.7-1.492smp AMD 32-bit Fail
Xeon 2.6.7-1.492smp Intel E7505 32-bit Fail
Xeon 2.6.6-1.435.2.3 Intel E7505 32-bit Fail
Xeon 2.6.6-1.435.2.3smp Intel E7505 32-bit Fail
Xeon 2.4.18-24smp Intel E7505 32-bit Pass
Xeon 2.4.18-24smp Intel E7501 32-bit Pass
P4 2.6.7 (kernel.org) Via ??? 32-bit Pass
Notes:
* The Fails are always around 20 MB/s
* When it passes, the performance depends on the chipset (e.g. 700+ MB/s)
* The E7505 is hosted in an HP xw8000.
* The E7501 is hosted on an Intel SE7501WV2 motherboard.
* The P4 is my home PC, which is a VIA chipset - don't ask me which.
Any help is appreciated.
Thanks,
John Fusco
-------------------------------------------------------------------------------------------------------------------------
Does anyone have any ideas where to begin with this one ? And is there some
other list that this question should be passed to ?
Thanks,
Jim Foris
More information about the fedora-list
mailing list