From wolfgang+gnus20050908 at dailyplanet.dontspam.wsrcc.com Thu Sep 8 23:15:39 2005 From: wolfgang+gnus20050908 at dailyplanet.dontspam.wsrcc.com (Wolfgang S. Rupprecht) Date: Thu, 08 Sep 2005 16:15:39 -0700 Subject: generating 32-bit RPM's Message-ID: <87fysfywv8.fsf@bonnet.wsrcc.com> Is it possible to generate i386 binary RPM's on an x86_64 system? I tried the obvious --target switch to buildrpm and that did make an appropriately named RPM, but it still appeared to be built with 64-bit code and linked with the 64-bit libs. -wolfgang -- Wolfgang S. Rupprecht http://www.wsrcc.com/wolfgang/ Microsoft Vista - because "Virus Installer" was too long. From berryja at gmail.com Fri Sep 9 04:14:14 2005 From: berryja at gmail.com (Jonathan Berry) Date: Thu, 8 Sep 2005 23:14:14 -0500 Subject: generating 32-bit RPM's In-Reply-To: <87fysfywv8.fsf@bonnet.wsrcc.com> References: <87fysfywv8.fsf@bonnet.wsrcc.com> Message-ID: <8767947e05090821141905438b@mail.gmail.com> On 9/8/05, Wolfgang S. Rupprecht wrote: > > Is it possible to generate i386 binary RPM's on an x86_64 system? > > I tried the obvious --target switch to buildrpm and that did make an > appropriately named RPM, but it still appeared to be built with 64-bit > code and linked with the 64-bit libs. Hi Wolfgang, The --target switch should be the way to do it. Are you sure that what was made is 64-bit? I think it might be possible that a particular RPM is not setup right (with the spec file) to build 32-bit on a 64-bit machine. I think all the --target switch does is change which sections of the spec file rpmbuild looks at. I'm not extremely knowledgeable of rpmbuild, so I could be wrong. I have used it some so what I say is what I have gathered when I used it. My first step would be to check the spec file. Make sure under the i386 headings that it adds the -m32 compile switch and links to /lib and /usr/lib and not their 64 counterparts. What package are you trying to build? Jonathan > Microsoft Vista - because "Virus Installer" was too long. Ouch :). From joshua at iwsp.com Fri Sep 9 15:46:20 2005 From: joshua at iwsp.com (Joshua Jensen) Date: Fri, 9 Sep 2005 11:46:20 -0400 Subject: generating 32-bit RPM's In-Reply-To: <8767947e05090821141905438b@mail.gmail.com> References: <87fysfywv8.fsf@bonnet.wsrcc.com> <8767947e05090821141905438b@mail.gmail.com> Message-ID: <20050909154619.GC29987@iwsp.com> When compiling the kernel on 32bit platforms, you can specify --target of say i686 which activates %ifarch sections inside of the specfile. However, all that really does is use different "-m" compiler switches for CPU optimization. So first off, you have to have the %ifarch stuff defined in your package's specfile (almost no RPMs do), and you would have to have a *cross compiler* installed on your x86_64 platform. I don't know that -m32 does all that you need. Joshua produce 32bit binaries. I'm not a compiler wizard, but from my understanding the 64 bit gcc that ships with Red Hat on x86_64 platforms only targets x86_64 CPUs. Joshua On Thu, Sep 08, 2005 at 11:14:14PM -0500, Jonathan Berry wrote: > On 9/8/05, Wolfgang S. Rupprecht wrote: > > > > Is it possible to generate i386 binary RPM's on an x86_64 system? > > > > I tried the obvious --target switch to buildrpm and that did make an > > appropriately named RPM, but it still appeared to be built with 64-bit > > code and linked with the 64-bit libs. > > Hi Wolfgang, > > The --target switch should be the way to do it. Are you sure that > what was made is 64-bit? I think it might be possible that a > particular RPM is not setup right (with the spec file) to build 32-bit > on a 64-bit machine. I think all the --target switch does is change > which sections of the spec file rpmbuild looks at. I'm not extremely > knowledgeable of rpmbuild, so I could be wrong. I have used it some > so what I say is what I have gathered when I used it. My first step > would be to check the spec file. Make sure under the i386 headings > that it adds the -m32 compile switch and links to /lib and /usr/lib > and not their 64 counterparts. What package are you trying to build? > > Jonathan > > > Microsoft Vista - because "Virus Installer" was too long. > > Ouch :). > > -- > amd64-list mailing list > amd64-list at redhat.com > https://www.redhat.com/mailman/listinfo/amd64-list -- Joshua Jensen joshua at iwsp.com "If God didn't want us to eat animals, why did he make them out of meat?" From berryja at gmail.com Fri Sep 9 16:52:20 2005 From: berryja at gmail.com (Jonathan Berry) Date: Fri, 9 Sep 2005 11:52:20 -0500 Subject: generating 32-bit RPM's In-Reply-To: <20050909154619.GC29987@iwsp.com> References: <87fysfywv8.fsf@bonnet.wsrcc.com> <8767947e05090821141905438b@mail.gmail.com> <20050909154619.GC29987@iwsp.com> Message-ID: <8767947e05090909524c951f03@mail.gmail.com> On 9/9/05, Joshua Jensen wrote: > When compiling the kernel on 32bit platforms, you can specify --target > of say i686 which activates %ifarch sections inside of the specfile. Agreed. > However, all that really does is use different "-m" compiler switches > for CPU optimization. So first off, you have to have the %ifarch stuff > defined in your package's specfile (almost no RPMs do), and you would I don't know about the "almost no RPMs do." Almost all RPMs are built for multiple architectures. > have to have a *cross compiler* installed on your x86_64 platform. I > don't know that -m32 does all that you need. But, 32-bit and 64-bit are both x86. It's not like he's trying to compile for SPARC or PowerPC here, which *would* need a cross-compiler. See below. > produce 32bit binaries. I'm not a compiler wizard, but from my > understanding the 64 bit gcc that ships with Red Hat on x86_64 platforms > only targets x86_64 CPUs. > > Joshua Well, from the gcc man page: -m32 -m64 Generate code for a 32-bit or 64-bit environment. The 32-bit envi- ronment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. which sounds a lot like -m32 makes gcc compile 32-bit programs. Of course, an experiment is worth a thousand man-page words :), so I'll try this out sometime and see what happens. If you are linking to libraries, you will of course need to also make sure you link to 32-bit libs or else ld (the linker) won't like you too well :). Jonathan From joshua at iwsp.com Fri Sep 9 18:08:30 2005 From: joshua at iwsp.com (Joshua Jensen) Date: Fri, 9 Sep 2005 14:08:30 -0400 Subject: generating 32-bit RPM's In-Reply-To: <8767947e05090909524c951f03@mail.gmail.com> References: <87fysfywv8.fsf@bonnet.wsrcc.com> <8767947e05090821141905438b@mail.gmail.com> <20050909154619.GC29987@iwsp.com> <8767947e05090909524c951f03@mail.gmail.com> Message-ID: <20050909180830.GA31426@iwsp.com> On Fri, Sep 09, 2005 at 11:52:20AM -0500, Jonathan Berry wrote: > > I don't know about the "almost no RPMs do." Almost all RPMs are built > for multiple architectures. You don't need multiple %ifarch statements to build a package for both arches... so long as you are building them *natively on those platforms*. You need nothing special to get gcc to assume 64bits on a 64 bit platform. You need lots of special considerations though to recompile for a platform that *isn't* native. I've worked with a ton of packages (even recompiled every single RPM in RHEL3), and besides openssl, the kernel, and the glibc packages, there is nothing there to support --target. > > have to have a *cross compiler* installed on your x86_64 platform. I > > don't know that -m32 does all that you need. > > But, 32-bit and 64-bit are both x86. It's not like he's trying to > compile for SPARC or PowerPC here, which *would* need a > cross-compiler. See below. No they aren't. A 32 bit binary is very very different than a 64bit one. Sure, the machine code from IA32 looks more similar to x86-64 code than say 64bit PPC, but it isn't the same and they should be considered completely seperate archs. -- Joshua Jensen joshua at iwsp.com "If God didn't want us to eat animals, why did he make them out of meat?" From jch at scalix.com Fri Sep 9 21:02:06 2005 From: jch at scalix.com (John Haxby) Date: Fri, 9 Sep 2005 22:02:06 +0100 Subject: generating 32-bit RPM's In-Reply-To: <20050909180830.GA31426@iwsp.com> References: <87fysfywv8.fsf@bonnet.wsrcc.com> Message-ID: <4321F84E.8040002@scalix.com> Joshua Jensen wrote: >>But, 32-bit and 64-bit are both x86. It's not like he's trying to >>compile for SPARC or PowerPC here, which *would* need a >>cross-compiler. See below. >> >> > >No they aren't. A 32 bit binary is very very different than a 64bit >one. Sure, the machine code from IA32 looks more similar to x86-64 code >than say 64bit PPC, but it isn't the same and they should be considered >completely seperate archs. > > > Well, that's a bit strong. I can compile and run programs on an i386 machine and copy those same compiled programs to an x86_64 machine and they run just fine. Similarly, If I "gcc -m32" something on the x86_64 machine it runs on the i386 machine. The only thing that won't work is a "gcc -m64" program running on an i386. They may be different architectures, but, really and truly you can regard the x86_64 as a strict superset of i386 from an application programming point of view. Compare that with, say, SPARC64 vs i386. There's nothing I can compile on either platform that will run on both. Apart from python and java maybe :-) jch From wolfgang+gnus20050915 at dailyplanet.dontspam.wsrcc.com Thu Sep 15 19:17:35 2005 From: wolfgang+gnus20050915 at dailyplanet.dontspam.wsrcc.com (Wolfgang S. Rupprecht) Date: Thu, 15 Sep 2005 12:17:35 -0700 Subject: generating 32-bit RPM's References: <87fysfywv8.fsf@bonnet.wsrcc.com> <8767947e05090821141905438b@mail.gmail.com> <20050909154619.GC29987@iwsp.com> <8767947e05090909524c951f03@mail.gmail.com> <20050909180830.GA31426@iwsp.com> Message-ID: <87irx2b0og.fsf@bonnet.wsrcc.com> Joshua Jensen writes: > On Fri, Sep 09, 2005 at 11:52:20AM -0500, Jonathan Berry wrote: >> >> I don't know about the "almost no RPMs do." Almost all RPMs are built >> for multiple architectures. > > You don't need multiple %ifarch statements to build a package for both > arches... so long as you are building them *natively on those > platforms*. You need nothing special to get gcc to assume 64bits on a > 64 bit platform. You need lots of special considerations though to > recompile for a platform that *isn't* native. I've worked with a ton of > packages (even recompiled every single RPM in RHEL3), and besides > openssl, the kernel, and the glibc packages, there is nothing there to > support --target. > >> > have to have a *cross compiler* installed on your x86_64 platform. I >> > don't know that -m32 does all that you need. >> >> But, 32-bit and 64-bit are both x86. It's not like he's trying to >> compile for SPARC or PowerPC here, which *would* need a >> cross-compiler. See below. > > No they aren't. A 32 bit binary is very very different than a 64bit > one. Sure, the machine code from IA32 looks more similar to x86-64 code > than say 64bit PPC, but it isn't the same and they should be considered > completely seperate archs. [ Sorry for the late reply. I was off camping for a few days. -wsr ] I see now I wasn't very clear in my question. Sorry about that. I am writing my own simple helloworld.spcc file and trying to build a helloworld-0.1-1.i386.rpm and helloworld-0.1-1.x86_64.rpm RPM for both architectures while running on a machine with x86_64 installed. I was hoping that "rpmbuild --target i386" would take care of all the "behind the scenes" work of throwing the right compiler and linker switches and modifying whatever else needed a bit of tweaking. I am getting the impression that it isn't that simple and I need to throw all those switches myself from inside the *.spec file. I guess I need to look at openssl and glibc to see what they do. Is there a better rpm spec file to use as a model? -wolfgang -- Wolfgang S. Rupprecht http://www.wsrcc.com/wolfgang/ From jdepaul at techemail.com Fri Sep 16 12:44:26 2005 From: jdepaul at techemail.com (Jason DePaul) Date: Fri, 16 Sep 2005 05:44:26 -0700 (PDT) Subject: HP DL 585 Dual Core - Kernel Panic Message-ID: <20050916054426.55D9C90F@dm20.mta.everyone.net> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdreese at bucknell.edu Mon Sep 19 13:04:14 2005 From: jdreese at bucknell.edu (Jeremy Dreese) Date: Mon, 19 Sep 2005 09:04:14 -0400 Subject: HP DL 585 Dual Core - Kernel Panic Message-ID: <432EB74E.6040208@bucknell.edu> On the following page: http://h18004.www1.hp.com/products/servers/linux/dl585-drivers-cert.html you'll note that the minimum supported RedHat updates is "Red Hat Enterprise Linux 3 Update 5 (dual core processors)." And yes, I'm pretty certain that AMD64 *dual-core* support was added in update 5 (and RHEL 4 update 1). And just from personal experience, we have RHEL 3 U5 (kernel-2.4.21-32.0.1.EL) running fine on a DL 385 with AMD64 dual-core processors. I hope this helps. -- Jeremy Dreese Engineering Computing Systems Integrator College of Engineering Bucknell University voice: (570) 577-3714 fax: (570) 577-3579 email: jdreese at bucknell.edu From dave_atkinson at blueyonder.co.uk Tue Sep 20 10:12:42 2005 From: dave_atkinson at blueyonder.co.uk (Dave Atkinson) Date: Tue, 20 Sep 2005 11:12:42 +0100 Subject: generating 32-bit RPM's Message-ID: <1127211162.3137.55.camel@phineas.mndnet> Can I stick my 2p's worth in? Sorry for v. late reply, I've only just discovered this list... I too have been considering this problem recently. I wanted to install 32-bit mplayer so's I could use the [ahem] other codecs... couldn't get yum to install 32-bit mplayer (if anyone can explain to me a simple way of installing an arbitary 32-bit pkg on x86_64..? thx ;). Couldn't find one, so I thought I'd compile the pkgs myself, using $ rpmbuild --target i386 .... as I'd had no problems building SMP i686 and non-SMP i386 and i686 pkgs on my old dual PIII... I have tried to understand what rpmbuild does from looking at the srcs and the rc/macro files that get sourced (rpmbuild --showrc, or more accurately, strace -e open rpmbuild --showrc | awk 'prog to dig out filenames'*). The --target option sets the %_target_cpu, %_target_os and % target_platform macros. This I _believe_ affects the setting of the % _optflags, so any RPM that uses %configure should get the right compiler flags. What gets screwed on x86_64 is that the %_lib and %_libdir macros remains set to lib64 regardless of the setting of --target. So I'm wondering if the fix is a patch to rpm, either to a) post-process (hack) the above macros based on the host and target arch at runtime, (bad idea) or b) add a new conditional %if(|n)targetarch and ship a rpmrc file which sets the macros correctly at runtime (better idea) or c) have rpm set %_lib, %_libdir and anything else that may be required internally based on target arch (previous idea is better?) d) something I haven't thought of ;) What have I missed? ;) Dave A. * awk '/^open\("\/(etc|usr\/lib)\/rpm\/.*(rc|macro).*/ && $0 ! ~ /ENOENT/{tmp=substr($1,7);gsub(/".*/,"",tmp);print tmp}' From D.Mierzejewski at icm.edu.pl Tue Sep 20 10:23:27 2005 From: D.Mierzejewski at icm.edu.pl (Dominik 'Rathann' Mierzejewski) Date: Tue, 20 Sep 2005 12:23:27 +0200 Subject: generating 32-bit RPM's In-Reply-To: <1127211162.3137.55.camel@phineas.mndnet> References: <1127211162.3137.55.camel@phineas.mndnet> Message-ID: <20050920102327.GB21821@ws-gradcol1.icm.edu.pl> On Tue, Sep 20, 2005 at 11:12:42AM +0100, Dave Atkinson wrote: [...] > So I'm wondering if the fix is a patch to rpm, either to > a) post-process (hack) the above macros based on the host and target > arch at runtime, (bad idea) or > b) add a new conditional %if(|n)targetarch and ship a rpmrc file which > sets the macros correctly at runtime (better idea) or > c) have rpm set %_lib, %_libdir and anything else that may be required > internally based on target arch (previous idea is better?) > d) something I haven't thought of ;) > > What have I missed? ;) man setarch And on sparc, it's man sparc32, just in case anyone is having problems with AuroraLinux. ;) Regards, R. -- Dominik 'Rathann' Mierzejewski Interdisciplinary Centre for Mathematical and Computational Modelling Warsaw University | http://www.icm.edu.pl | tel. +48 (22) 5540810 From dave_atkinson at blueyonder.co.uk Tue Sep 20 11:42:26 2005 From: dave_atkinson at blueyonder.co.uk (Dave Atkinson) Date: Tue, 20 Sep 2005 12:42:26 +0100 Subject: generating 32-bit RPM's In-Reply-To: <20050920102327.GB21821@ws-gradcol1.icm.edu.pl> References: <1127211162.3137.55.camel@phineas.mndnet> <20050920102327.GB21821@ws-gradcol1.icm.edu.pl> Message-ID: <1127216546.3137.69.camel@phineas.mndnet> On Tue, 2005-09-20 at 12:23 +0200, Dominik 'Rathann' Mierzejewski wrote: > On Tue, Sep 20, 2005 at 11:12:42AM +0100, Dave Atkinson wrote: > [...] > > So I'm wondering if the fix is a patch to rpm, either to > > a) post-process (hack) the above macros based on the host and target > > arch at runtime, (bad idea) or > > b) add a new conditional %if(|n)targetarch and ship a rpmrc file which > > sets the macros correctly at runtime (better idea) or > > c) have rpm set %_lib, %_libdir and anything else that may be required > > internally based on target arch (previous idea is better?) > > d) something I haven't thought of ;) > > > > What have I missed? ;) > > man setarch > And on sparc, it's man sparc32, just in case anyone is having problems > with AuroraLinux. ;) Thanks, but I did. Using $ setarch i386 rpmbuild --target i386 ... gives me packages that contain files with /usr/lib64/... in their path. >From the man page changes the output of uname. If you look in /usr/lib/rpm/*-linux/macros there will be a line /usr/lib/rpm/x86_64-linux/macros %_lib lib64 /usr/lib/rpm/i386-linux/macros %_lib lib /usr/lib/rpm/noarch-linux/macros %_lib lib64 It looks like this file gets sourced based on the host arch, not the target arch, resulting in the above problem... Regards, Dave From jakub at redhat.com Tue Sep 20 11:45:38 2005 From: jakub at redhat.com (Jakub Jelinek) Date: Tue, 20 Sep 2005 07:45:38 -0400 Subject: generating 32-bit RPM's In-Reply-To: <1127216546.3137.69.camel@phineas.mndnet> References: <1127211162.3137.55.camel@phineas.mndnet> <20050920102327.GB21821@ws-gradcol1.icm.edu.pl> <1127216546.3137.69.camel@phineas.mndnet> Message-ID: <20050920114537.GI1020@devserv.devel.redhat.com> On Tue, Sep 20, 2005 at 12:42:26PM +0100, Dave Atkinson wrote: > Thanks, but I did. Using > > $ setarch i386 rpmbuild --target i386 ... > > gives me packages that contain files with /usr/lib64/... in their path. > >From the man page changes the output of uname. If you look > in /usr/lib/rpm/*-linux/macros there will be a line > > /usr/lib/rpm/x86_64-linux/macros > %_lib lib64 > /usr/lib/rpm/i386-linux/macros > %_lib lib > /usr/lib/rpm/noarch-linux/macros > %_lib lib64 > > It looks like this file gets sourced based on the host arch, not the > target arch, resulting in the above problem... No, it is target arch. But if you have /etc/rpm/platform file, that unfortunately overrides it. Just rm -f /etc/rpm/platform and it will DTRT. Jakub From Philip.R.Schaffner at nasa.gov Wed Sep 21 12:07:59 2005 From: Philip.R.Schaffner at nasa.gov (Phil Schaffner) Date: Wed, 21 Sep 2005 08:07:59 -0400 Subject: generating 32-bit RPM's In-Reply-To: <1127211162.3137.55.camel@phineas.mndnet> References: <1127211162.3137.55.camel@phineas.mndnet> Message-ID: <1127304479.17121.63.camel@wx1.larc.nasa.gov> On Tue, 2005-09-20 at 11:12 +0100, Dave Atkinson wrote: ... > I too have been considering this problem recently. I wanted to > install > 32-bit mplayer so's I could use the [ahem] other codecs... couldn't > get > yum to install 32-bit mplayer (if anyone can explain to me a simple > way > of installing an arbitary 32-bit pkg on x86_64..? thx ;). > ... Not exactly simple, but I manage it by creating i386 repo definitions that are disabled by default and enabling them with "yum --enablerepo=reponame update " (or "install"). Works OK once you get the right set of 32-bit packages installed for browsers, etc. Here's my current set of CentOS4 browser-related packages for a dual Opteron: acroread-7.0.0-2.rf.i386 firefox-1.0.6-1.4.2.centos4.i386 mozilla-1.7.10-1.4.2.centos4.i386 mozilla-acroread-7.0.0-2.rf.i386 mozilla-chat-1.7.10-1.4.2.centos4.i386 mozilla-devel-1.7.10-1.4.2.centos4.i386 mozilla-dom-inspector-1.7.10-1.4.2.centos4.i386 mozilla-js-debugger-1.7.10-1.4.2.centos4.i386 mozilla-mail-1.7.10-1.4.2.centos4.i386 mozilla-nspr-1.7.10-1.4.2.centos4.i386 mozilla-nspr-1.7.10-1.4.2.centos4.x86_64 mozilla-nspr-devel-1.7.10-1.4.2.centos4.i386 mozilla-nss-1.7.10-1.4.2.centos4.i386 mozilla-nss-1.7.10-1.4.2.centos4.x86_64 mozilla-nss-devel-1.7.10-1.4.2.centos4.i386 mozilla-nss-devel-1.7.10-1.4.2.centos4.x86_64 Here's a sample set of repo definitions: [prs at wx1 yum.repos.d]$ cat CentOS-Base-i386.repo [base-i386] name=CentOS-$releasever - Base baseurl=http://mirror.centos.org/centos/$releasever/os/i386/ gpgcheck=1 enabled=0 #released updates [update-i386] name=CentOS-$releasever - Updates baseurl=http://mirror.centos.org/centos/$releasever/updates/i386/ gpgcheck=1 enabled=0 #packages used/produced in the build but not released [addons-i386] name=CentOS-$releasever - Addons baseurl=http://mirror.centos.org/centos/$releasever/addons/i386/ gpgcheck=1 enabled=0 #additional packages that may be useful [extras-i386] name=CentOS-$releasever - Extras baseurl=http://mirror.centos.org/centos/$releasever/extras/i386/ gpgcheck=1 enabled=0 #additional packages that extend functionality of existing packages [centosplus-i386] name=CentOS-$releasever - Plus baseurl=http://mirror.centos.org/centos/$releasever/centosplus/i386/ gpgcheck=1 enabled=0 #contrib - packages by Centos Users [contrib-i386] name=CentOS-$releasever - Contrib baseurl=http://mirror.centos.org/centos/$releasever/contrib/i386/ gpgcheck=1 enabled=0 #packages in testing [testing-i386] name=CentOS-$releasever - Testing baseurl=http://mirror.centos.org/centos/$releasever/testing/i386/ gpgcheck=1 enabled=0 Phil From dave_atkinson at blueyonder.co.uk Wed Sep 21 16:59:08 2005 From: dave_atkinson at blueyonder.co.uk (Dave Atkinson) Date: Wed, 21 Sep 2005 17:59:08 +0100 Subject: generating 32-bit RPM's In-Reply-To: <1127304479.17121.63.camel@wx1.larc.nasa.gov> References: <1127211162.3137.55.camel@phineas.mndnet> <1127304479.17121.63.camel@wx1.larc.nasa.gov> Message-ID: <1127321949.3137.105.camel@phineas.mndnet> On Wed, 2005-09-21 at 08:07 -0400, Phil Schaffner wrote: > On Tue, 2005-09-20 at 11:12 +0100, Dave Atkinson wrote: > ... > > yum to install 32-bit mplayer (if anyone can explain to me a simple > > way > > of installing an arbitary 32-bit pkg on x86_64..? thx ;). > > ... > > Not exactly simple, but I manage it by creating i386 repo definitions > that are disabled by default and enabling them with > "yum --enablerepo=reponame update " (or "install"). Works OK > once you get the right set of 32-bit packages installed for browsers, > etc. > ... Thanks for that, although it's dawned on me that $ yum clean all $ setarch i386 yum install $ yum clean all is probably what I need. Seems to work, too... ;) From joshua at iwsp.com Wed Sep 21 19:31:16 2005 From: joshua at iwsp.com (Joshua Jensen) Date: Wed, 21 Sep 2005 15:31:16 -0400 Subject: generating 32-bit RPM's In-Reply-To: <1127321949.3137.105.camel@phineas.mndnet> References: <1127211162.3137.55.camel@phineas.mndnet> <1127304479.17121.63.camel@wx1.larc.nasa.gov> <1127321949.3137.105.camel@phineas.mndnet> Message-ID: <20050921193116.GD19038@iwsp.com> On Wed, Sep 21, 2005 at 05:59:08PM +0100, Dave Atkinson wrote: > $ yum clean all > $ setarch i386 yum install > $ yum clean all If I understand what you are trying to do, yum already does this: yum install package1.i386 package2.i386 -- Joshua Jensen joshua at iwsp.com "If God didn't want us to eat animals, why did he make them out of meat?" From berryja at gmail.com Thu Sep 22 00:51:34 2005 From: berryja at gmail.com (Jonathan Berry) Date: Wed, 21 Sep 2005 19:51:34 -0500 Subject: generating 32-bit RPM's In-Reply-To: <20050921193116.GD19038@iwsp.com> References: <1127211162.3137.55.camel@phineas.mndnet> <1127304479.17121.63.camel@wx1.larc.nasa.gov> <1127321949.3137.105.camel@phineas.mndnet> <20050921193116.GD19038@iwsp.com> Message-ID: <8767947e05092117513610a988@mail.gmail.com> On 9/21/05, Joshua Jensen wrote: > On Wed, Sep 21, 2005 at 05:59:08PM +0100, Dave Atkinson wrote: > > > $ yum clean all > > $ setarch i386 yum install > > $ yum clean all > > If I understand what you are trying to do, yum already does this: > > yum install package1.i386 package2.i386 > Yes, but that only works if package1.i386 and package2.i386 are in the x86_64 tree. Which is not the case for some programs of which you might want to install a 32-bit version. For instance, Firefox and Mozilla are only available as 64-bit packages in the x86_64 tree. I personally use the extra i386 repo approach with the arch hardcoded that Phil presented earlier. Jonathan From b.j.smith at ieee.org Fri Sep 23 02:14:48 2005 From: b.j.smith at ieee.org (Bryan J. Smith) Date: Thu, 22 Sep 2005 19:14:48 -0700 (PDT) Subject: Fedora SMP dual core, dual AMD 64 processor system In-Reply-To: <20050923004557.GA20195@cse.ucdavis.edu> Message-ID: <20050923021448.59142.qmail@web34107.mail.mud.yahoo.com> Bill Broadley wrote: > Here are my results: > Version @version@ ------Sequential Output------ > --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP > K/sec %CP /sec %CP > fileserver 16G 250195 39 75476 19 > 141364 23 328.1 1 I'm confused, which numbers are which (for software, for hardware, etc...)? And what was the _exact_ command you ran? It has a lot to do with everything. > If you want to convince me of the "obvious" performance > superiority of hardware RAID-5 please post performance > numbers. The next time I roll out a DL365 or DL585 for a client, I will. > The above numbers are from a dual opteron + pci-e system > with 8 SATA drives, linux, software RAID, and a Areca > controller. With this configuration I managed 250MB/sec > writes, and 140MB/sec or so reads. Which is which? And how did you thread to simulate multiple clients? The more clients and I/O queues, the better many hardware RAID solution do. Furthermore, I'm kinda scratching my head how you could get higher RAID-5 performance in writes than reads? That would seem rather impossible. _Unless_ buffer is not being flushed. Then the lack latency in synchronous DRAM writes are vastly improved over the latency in a DRAM read (regardless if it is synchronous or not). > I've not read anything like this, instead all of my data > indicates just the opposite. Can you provide data to the > contrary? No, because I don't have a 3Ware system in front of myself other than an older dual-P3/ServerWorks system. I also deploy RAID-10 more than RAID-5. I will get you numbers. But I'm still trying to figure out what numbers are what? Which was the software RAID? Which was the hardware RAID? And what was your exact command (options and all)? Did you run multiple clients/operations? The more independent operations you through at a hardware RAID controller, the more they queue, the more differential they introduce. > I never used the word bliss or mentioned LVM. Okay, fair enough. > Right, 3ware can do it, but there's a custom interface > specific to 3ware. Yes, and it's well known -- has been for 6+ years. > My point is that software raid standardizes all the > tools, is flexible, and doesn't make you learn a set > of vendor specific tools. First off, there is no "learning curve" to 3Ware's tools. Secondly, 3Ware does have some interfaces into SMART and other capabilities now. Granted, they are not yet supported by 3Ware itself, but the work is happening. 3Ware sends many standard kernel messages, they do a lot of GPL work. But given the hardware already sends these to both the kernel _and_ its own management software, as well as a NVRAM that logs _all_ events (regardless of the OS'), that's a nice option IMHO. > This lets you migrate RAIDs between linux boxes without > hunting for another RAID controller of the same brand. I've heard this argument again and again. When someone can demonstrate 6+ years of Linux MD compatibility, I will believe them. So far, I haven't seen it myself. MD has changed several times. > Monitor then without custom binaries, 3Ware sends standard kernel messages. You _can_ trap those like any other disk controller or syslog approach. Don't knock 3Ware because they give you an additional option. > serial connections, ??? Are you're thinking of external subsystems ??? >From the standpoint of 3Ware v. software RAID, same difference, the driver/controller is local, so the local kernel sees _all_ messages. > or web interfaces. 3Ware offers both CLI and web interfaces. > You don't have to rebuild a kernel with the right drivers, Considering 3Ware releases GPL drivers in the _stock_ kernel, I _rarely_ run into this. At the most, I do a make, then copy the module, depmod -a and create a new initrd. Done. > and download binaries from the vendor website. *NO*BINARY*DRIVERS* 100% GPL drivers/source. 3Ware isn't "FRAID." You only need the CLI/3DM _if_ you want their integrated monitoring. You _can_ trap syslog messages from the kernel driver like anything else. > Note avoiding the posting of numbers #1. The studies are out there my friend. But the next time I have a modern Opteron with a 3Ware card, I'll send them to you. > Please post them, preferably on a file big enough to make > sure that the file is coming through the RAID controller > and not the file cache. I'm not sure that's what you did. From your benchmark where the RAID-5 write performance was better than RAID-5 read performance, I can only assume you benefitted from the fact that write SDRAM latency is better than read SDRAM. Otherwise, to disk, it's _worse_. > There's 250MB/sec posted above. The I/O interconnect is > 8.0 GB/sec why would it be useless? Because you're making _redundant_ trips. If you have hardware RAID, you go directly from memory to I/O. If you have software RAID, you push _all_ data from memory to CPU first. > So? The CPU<->CPU interconnect is 8.0GB/sec. The I/O bus > from the cpu is 8.0GB/sec, and pci-e is 2GB/sec (4x), Actually it's 1GBps (4x) one-way. > 4GB/sec (8x) or 8GB/sec (16x). 250-500MB/sec creates > this huge problem exactly why? You're pushing from memory to CPU first, instead of directly to memory mapped I/O. You're basically turning your direct memory access (DMA) disk transfer into one long, huge programmed I/O (PIO) disk transfer. If your CPU/interconnect is doing _nothing_ else, your CPU might be able to handle it, no issue. That's why generic mainboards are commonly used for dedicated storage systems. But when you're doing additional, user/service I/O, you don't want to tie up your interconnect with that additional overhead. It takes away from what your system _could_ be doing in actually servicing user requests. > Notice avoid the posting of numbers #2. Again, there are some great studies out there. I will get you numbers when I have a system to play with. > Yes, I have many fileserver in production using it. And it probably works fine. But you're taking throughput away from services. > I'm not learning this stuff from reading, I'm learning it > from doing. "read up on things" is not making a > particularly strong case for your point of view. There are several case studies out there from many organizations where user/services are pushing a lot of I/O. If your system isn't servicing much I/O, and the disk is just for local processing, then the impact is less. But when your storage processing is contending for the same interconnect as user/service processing (such as a file server), every redundant transfer you have due to software RAID takes away from what you could be servicing users with. > Actually I do know quite a bit on this subject. Please > post numbers to support your conclusions. I will. Please post your _exact_setup_, including commands, client/processes, etc... > More hand waving. So you are right because you built too > many file and database servers? Can you at least understand what I mean by the fact that software RAID-5 turns your disk transfer into a PIO transfer, instead of a DMA transfer? And do you understand why we use network ASICs instead of CPUs for networking equipment? Same difference, the PC interconnect is not designed for throwing around raw data. It is not an I/O processor and interconnect. > Bring me up to date, show me actual performance numbers > representing this advantage you attribute to these > hardware raid setups your discussing. > Quite a few, mainly clusters, webservers, mailservers, and > fileserver's. I thought this was a technical discussion > and not a pissing contest. I just don't understand how you could not understand that any additional overhead you place on the interconnect in doing a programmed I/O transfer for just storage takes away from user/service transfer usage. If you have web servers and mail servers, which are more about CPU processing, you're probably not having too much user/service I/O contending with the software RAID PIO. But on database server and NFS fileservers, it clearly makes a massive dent. > So avoid all this handwaving and claims of superiority > please provide a reference or actual performance numbers. Will do, personally when I have the next opportunity to benchmark. But I'll need your _exact_ commands, how you emulated multiple operations, etc... -- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith at ieee.org | (please excuse any http://thebs413.blogspot.com/ | missing headers) From b.j.smith at ieee.org Fri Sep 23 15:08:45 2005 From: b.j.smith at ieee.org (Bryan J. Smith) Date: Fri, 23 Sep 2005 08:08:45 -0700 (PDT) Subject: Fedora SMP dual core, dual AMD 64 processor system In-Reply-To: <20050923070555.GA21239@cse.ucdavis.edu> Message-ID: <20050923150846.60941.qmail@web34114.mail.mud.yahoo.com> Bill Broadley wrote: > Queue where? Linux? RAID driver? RAID hardware? RAID hardware. In a true, intelligent RAID card, queuing is done via the on-board uC/ASIC controller. In reality, true, intelligent RAID cards have rather "dumb" block drivers (other than management/reporting features) since the "intelligence" is on-card. > I don't follow this line of reasoning. Instead of relying on the kernel to schedule I/O, the hardware itself schedules I/O. The kernel merely passes on requests, and doesn't get caught up with all the overhead, which is the responsibility of the I/O processor on the RAID card. In fact, this is definitely an area where FRAID (fake RAID) hardware is at its absolute worst. An OS knows how to queue better for itself than a FRAID (software driver). But at the same time, an intelligent RAID controller is closer to the hardware so it can schedule it far better/more optimal than the OS can logically too. Understand I use OS software RAID (MD/LVM) for RAID-0, I love it. But when it comes to RAID-1 (and 1e/10) and RAID-5, I then rely on the hardware. > In any case name a workload, post numbers, and I'll > replicate so we can compare. I promise I will. I just took a job about 5 weeks ago that is permanent, and I'm doing more engineering again than IT. I should have done some benchmarks months ago, but I'm typically a consultant that designs and brings in a solution and my benchmarking is probably too application-specific. > My main point is that even for the ideal bandwidth case (a > large sequential read or write) that software RAID does > not cause any bottlenecks, everything involved is mostly > idle (memory bus, cpu, hypertransport, and I/O bus). How can you measure I/O bus in Linux? You can't. You can only measure the I/O the CPU is servicing, which is not actual. I don't dispute that the Opteron can handle the PIO required for today's advanced storage I/O done in software. I just said the transfer load, especially using the Opterons as I/O Processors doing programmed I/O, takes away from other transfer operations it _might_ be doing if its servicing user capabilities. PC processors and interconnects will always be grossly inefficient compared to dedicated I/O Processors and interconnects. I think Intel has (and this is one of the few times I agree with Intel) the right idea in putting the I/O Processor in the I/O controller. Although the most idea solution is to put it on the card itself. Especially during a failed drive, when you are constantly reading in disk data over the much slower PCI-X interconnect. At those times, you really could use a 2-4GBps of local interconnect handling that -- instead of pushing all the way up through the I/O to memory to CPU, just to get the data. In fact, this is one area where the Acera really _tanks_ compared to the 3Ware cards. > In either case (hardware or software) I'd expect multiple > sequential or random streams to have lower throughput in > both cases leaving even more of the I/O, cpu, and related > idle. CPU processing idle is one thing. XOR operations don't even dent a CPU's processing capability. The problem is that CPUs are designed for computation, not pushing data around. Their interconnects are designed to have a good balance between processing and data movement. If you're jamming your CPU with LOAD/MOV operations just for storage, then you're turning it into an I/O processor -- something it's not designed for, and it ends up doing Programmed I/O. I/O Processors are designed for less processing, more data movement -- including simplistic, virtually "in-line" data movement operations like XORs and compares using ASICs and other peripherals outside the core. They use far, far less clock cycles -- typically 1:1 to their external bus, without the traditional fetch-decode-execute-etc... > My point is that any linux/MD/software RAID in the world > uses the same tools, interfaces, drivers. And if you haven't followed it, as these are more "standardized" in the Linux world, 3Ware has been adding support for them. > So tuning various parameters, recovery, monitoring, and > migration is the same. The approach, yes. But for the hardware, it varies. So you don't get away from having to tune. But instead of tuning on individual disks, you now tune on the card itself. One thing I've learn to trust explicitly is 3Ware's ability to handle even the most problematic ATA drives. In software RAID, you often have a tri-fecta clusterfsck between 1) the ATA drive vendor's Integrated Drive Electronics (IDE), 2) the ATA channel vendor's registers/bus control and 3) the OS driver that supposively gets the two to talk correctly. With 3Ware -- both the uC/ASIC-firware and the ATA channel registers/bus control are 3Ware's -- which just leaves 3Ware to deal with the IDE of the ATA drive itself. I've yet to have ATA bus timeouts, resets, etc... in 6+ years of 3Ware devices. Now some would argue SCSI, and I would agree, SCSI is less headache. And the new crop of Serial Attached SCSI (SAS) solutions are very capable. In fact, many SAS controllers are coming with hardware RAID-0, 1, 1e or 10 for "free." They also do SATA for "free" too. > A RAID volume can be migrated across machines without > problem, worst case (which I've not seen) you'd > have to run a different kernel. And I understand this argument, but I've yet it match the 6+ year history of 3Ware upgradability. As long as the firmware is the same or newer, you're set. Other vendors have various records. Adaptec has a _poor_ one, and they _destroyed_ DPT's when they took them over (not that DPT offered anything good, they were all old i960 designs). LSI Logic has varied, with any StrongARM or newer (now XScale) having great records as well. BTW, except for RAID-5, there is DM manager support for 3Ware volumes on regular ATA channels. And there were early solutions as well. > Nothing that can't be done onsite. I've done lots of > late 2.2, 2.4, and 2.6 migrations and upgrades without > issue. Although I suspect my 2.2 setup was using > backported MD drivers (which redhat did). The 2.0 -> 2.2 > migration is a bit further back them I'd trust my memory. Well, I've been using 3Ware since 1999. > If you don't have a spare hardware raid card, recovery is > very tough. Again, except for RAID-5, not so. Most 3Ware volumes are readable by kernel 2.4+ MD and newer 2.6 DM code. I tend to stick with RAID-10 for performance. > Even if you do getting that card working on a new machine > can be fairly difficult. ??? Please explain ??? I've plopped in 3Ware cards without issue. The only issue I had with older cards was the 3.3V v 5V, but the newer 7000+ don't have that issue, they are universal (PCI 5V, PCI64 3.3/5V, PCI-X 3.3V). > I.e. finding a kernel+initrd that will load the hardware > RAID driver before you can mount the RAID. You obviously haven't used 3Ware. ;-> GPL driver in stock kernel since 2.2.15 (yes, 2.2). Same 3w-xxxx driver is used for _all_ products until the latest 9000 series (3w-9xxx) which adds DRAM. The core logic of the 3Ware AccelerATA through Escalade 8000 is all the same -- ASIC+SRAM design. There was a slight redesign for RAID-5 in the 7000+ series. > I like that I can take 4-8 drives in a RAID volume and plug > them into external or internal arrays on various > architectures (alpha, itanium, opteron, and IA32) and just > have it work without tracking down which RAID controller is > in which. 3Ware doesn't have Alpha or Itanium support, no. But the MD/DM drivers can read 3Ware RAID-0, 1 and 10 volumes. > BTW, I'm agree 3ware cards are reliable, functional, and > work well. In hardware RAID or software RAID mode. Actually, in software RAID mode with exception of RAID-0, it kinda defeats the purpose. In fact, the #1 complaint I've seen on 3Ware cards is when people use them with software RAID for the hot-swap capability, leaving the discs in JBOD mode. It was _only_ until recently that the kernel added hotplug capability, so you should _never_ use 3Ware cards with JBOD (instead of RAID) and attempt hot-swap if you are doing software RAID. 3Ware gets a "bad rap" for people who do _not_ understand the limitations of hot-swap in all but the latest kernels. The 3Ware design "hides" the raw disks from the OS so it _can_ provide hot-swap _regardless_ of kernel capability, but _only_ when you provide it a redundant volume managed by the 3Ware card itself. > Sure, and each RAID controller sends different messages. Again, 3Ware is following a lot of the standard messages being standardized in newer LVM/MD development, including SMART messages. > So you need to very carefully filter for each controller > and each message they could send. I see a trend here. You're talking "in general." I'm saying I _agree_ with you on "most" hardware RAID vendors. But for companies like 3Ware and select LSI Logic (SA/XScale solutions), I strong _disagree_. > mdadm, /proc/mdstat, diff, SMTP, and cron are all you need > to manage, watch, and receive status reports on any linux > MD raid on the planet. And the 3Ware /proc interface provides a superset of capabilities, with 3Ware adding GPL code to many of these projects to interface into them. > Sure 3ware has the functionality, if you jump through the > customized hopes to get it. By "jump through" I strongly _disagree_. It's _cake_ to setup. The lack of standard approaches in Linux MD/LVM management until just recently is part of the reason I don't like it. But as many things are being standardized (such as mdadm), 3Ware is moving to support them. 3Ware thinks of Linux _first_, unlike most other vendors. > Or if you say.. want to manage the RAID? The newer mdadm developments underway. > 5 or 10 years ago I'd agree. More recently I've seeing an > increasing number of people concluding that software RAID > is faster in most cases. Software RAID-0, yes. And given a _poor_ RAID-5 solution -- even pre-9000 series 3Ware products, I'd agree that an Opteron doing software RAID gave you more throughput. But for RAID-10, I'll stick with my 3Ware. And for RAID-5, I like the new 9000 series -- _especially_ during a failed disk/rebuild. That's when you're really killing your disk with software RAID-5. > I'm certainly open to data points to support either > conclusion. Er, no. If it was filecache it would be > much faster, 16GB is plenty large to mostly flush the > cache of a 4GB of ram machine. I just don't see how your write speed is 2x the read. It doesn't make sense. > Er, so you have a 6.4 GB/sec interface to memory (actually > 2), 8.0 GB/sec hypertransport, and 4GB/sec pci-e. Which > one is the bottleneck for this 250MB/sec stream? The problem is that you're going from memory to CPU, then back to memory, before you even commit to I/O. You're _not_ getting anywhere 8GBps from the memory to CPU, because the CPU is engaged in traditional LOAD/MOV operations (even if the XOR takes only a few cycles), which cost dozens upon dozens of cycles in the entire fetch-decode-execute-etc... cycle. A well-designed hardware RAID card does these in-line with the data-write with an ASIC XOR. 0 wait state, non-blocking I/O. The system merely commits from memory directly to storage controller, and that's it. Again, it's like using a PC as a layer-3 switch versus a device with a layer-3 switch fabric. The PC is going to incur massive overhead to do what a switch fabric does non-blocking (sub-10ms). > Mine is 8x, and why not count both ways? PCI-x is 1GB/sec > total (read or write). PCI-e 8x is 2GB/sec read and 2GB/sec > write. All communications use both sides (the request and > the answer), even reads cause disk writes (updating file > timestamps), writes cause reads (to calculate the new > checksum). They do _not_ happen simultenously _unless_ you have a hardware RAID card. The PC operation is buffered. > 250 MB/sec streams + overhead for checksums leaves all > involved busses mostly idle. Not when you are failed/rebuilding in RAID-5. First you have to read from the storage to memory, then memory to CPU for PIO storage operations, then back to memory and finally back to storage. Those operations do not happen simultaneously. Even when just doing normal writes, it's buffered I/O, as the data stream is jammed, waiting on the CPU to go through the traditional fetch-decode-execute-etc... operation just to do an XOR (the actual instruction is not the bottleneck). > Er, more like, read 7 chunks of data, calculate 8th block > of checksum data then setup a DMA to write all 8 blocks. > MD is just as capable of setting up a DMA as the RAID card. Not true! A CPU is _not_ an I/O processor with XOR ASICs designed to calculate in-line. Again, 3Ware calls its solution a "Storage Switch" for a reason. It's the same reason you don't use a PC as a network switch. > Er, and what exactly else should the fileserver be doing > besides, er serving files? Serving out GigE? That's only > another 100MB/sec. Not for some of us. ;-> > The hypertransport, CPU, and PCI-e are still mostly idle. So you say. Unfortunately, you can't track this in the Linux kernel. But you can in the Solaris kernel. > Don't forget the read overhead of software RAID is ZERO. Agreed. That's why 99% of software RAID benchmarks only show read performance. In that case -- other than heavy I/O queuing -- software RAID typically _wins_. No argument from me there. > The write overhead (depending on write size) can be as > little as something like 1/8th. Not true. You have to push _every_single_byte_ up through the CPU interconnect and run a SIMD instruction which does LOAD/MOV in a traditional CPU design. A "storage switch" does XORs directly in the datapath. > So a 100MB write could be as little as an extra 12MB. A 100MB RAID-5 write pushes 100MB through the CPU's interconnect, _period_. It might only generate an extra 12MB overall in the write, but you can _not_ avoid pushing that data through. That path is _not_ traversed in hardware RAID. The hardware storage controller takes the 100MB, and a well-designed card does non-blocking I/O with XOR calculations on-the-fly in virtually real-time. > On these 4-8GB/sec busses an extra 12.5 % is not a big deal. You obviously don't understand the dataflow. You are pushing 100% to the CPU, then an additional 12.5% out. > You seem to claim there are all these studies supporting > hardware RAIDs performance superiority. Maybe you could > share some. Not hardware RAID in general, just 3Ware and select LSI Logic solutions. I'll send you some links on 3Ware when I have time. I apologize but I'm spending 15 hours/day working right now (not including my 2+ hours travel time a day) just supporting Katrina and planning Rita recovery efforts. I work for the company that provides emergency communications where none are available. > I'll start: > http://www.chemistry.wustl.edu/~gelb/castle_raid.html > All below are the 8GB filesize numbers on a 1GB ram machine > (I.e. not effected by the file cache much.) Note the date: 2004 Mar! The 9000 series just came out. What firmware was used for the 9000 series with how many volumes? There were well-know issues with early 9000 series firmware and multiple volumes. BTW, I also hope you noted the RAID-10 performance. But even then, it's pretty crappy. I have a dual-P3 at home with an old 3Ware Escalade 7800 that breaks 100MBps writes with RAID-10 at 8GiB bonnie tests. I don't see how their newer system could be slower. I assume they are running an early 9000 series firmware. > So Hardware RAID manages 20MB/sec write 50MB/sec read using > the 3ware 8500. > Software raid is 52-76MB/sec write and 120-229MB/sec reads. And what firmware was used? Also, I noted this statement: "This suggests that using two 4-drive hardware RAID cards and striping them via software might be competitive with the all-software solution above, but it would depend very much on the performance of the RAID cards." > You have measured this, or it's just a theory you have? > Have you quantified it? Yes. At the very high-end, 4x GbE and 2x 8506 cards spread over 4 PCI-X channels using RAID-0 across the 8506 volumes. I was serving over 500MBps via NFS consistently to over 25 clients simultaneously. > Having significantly higher I/O bandwidth to draw from > leaves many advantages. You keep missing the fact that you're stuffing it into the CPU, which can't work as fast as an ASIC. > With a CPU that can do the xor calculations on the order > of 7GB/sec most parts of the system ??? I'd _really_ like to know how you came up with that number! I don't see someone being able to stuff 7GBps through a CPU with even a SIMD operation. > (besides the disks) > are mostly idle even when sustaining these 140-250MB/sec > data rates. Yes, that would suggest the bottleneck is the CPU doing PIO. > Which interconnect are you talking about exactly? Each > opteron has 3 8GB/sec hypertransports. With today's > kernels they are mostly idle. You can_not_ read this with the Linux kernel. The Linux kernel _only_ shows you the time the CPU is doing I/O, not the actual I/O throughput/latency/usage. > Shared memory happens over one link, but the newer kernels > keep most memory traffic local to the CPU the process is > running on (in most cases). > The other hypertransport links are mostly idle a 250MB/sec > RAID doesn't have much effect. Sure the 6.4 GB/sec memory > busses can be very heavily used by many usage patterns, but > 250MB/sec isn't going to impact those very much. Understand that when you use _any_ time in an operation, you deduct that _time_. I.e., if you use 50% of a 250MBps bus, you do _not_ deduct 250MBps from the next bus, but you deduct 50% of the available bandwidth from that next bus. So if it took you 50% of your I/O bus to read the data into memory, then the memory to CPU operation do _not_ happen at the theoretical maximum of the memory throughput, but only 50% left. At this point, I think I'd have to draw some timing and state diagrams to explain this. You seem to be missing it. -- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith at ieee.org | (please excuse any http://thebs413.blogspot.com/ | missing headers) From hahn at physics.mcmaster.ca Sat Sep 24 20:38:17 2005 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sat, 24 Sep 2005 16:38:17 -0400 (EDT) Subject: Fedora SMP dual core, dual AMD 64 processor system In-Reply-To: <20050923150846.60941.qmail@web34114.mail.mud.yahoo.com> Message-ID: > Instead of relying on the kernel to schedule I/O, the > hardware itself schedules I/O. The kernel merely passes on > requests, and doesn't get caught up with all the overhead, of course, the "overhead" in scheduling a disk is rather trivial. > better for itself than a FRAID (software driver). But at the > same time, an intelligent RAID controller is closer to the > hardware so it can schedule it far better/more optimal than > the OS can logically too. this is not very true. for instance, a serious server today will have many GB of pages cached from disk. that's significantly larger than any HW raid I've seen (typically in the ~256M range.) it's also not true that the HW controller has much more knowlege of the disk hardware. both the host and HW controller have to guess about where the actual head is, and have to guess about how tracks are laid out. but this is not very hard: ignoring remapped sectors, seek distance is monotonic with block distance. that means that no one except the disk itself can really know that two blocks are on the same cylinder, but if two block addresses are "close", you can guess that they are. simply establishing monotonicity is the crux of disk scheduling. however, the other part of disk scheduling is "meta-request" info, such as which processes a req belongs to, whether it's synchronous, merely readahead/writebehind, etc. here's where the host has a real advantage - it knows about more requests, and more about them. > > My main point is that even for the ideal bandwidth case (a > > large sequential read or write) that software RAID does > > not cause any bottlenecks, everything involved is mostly > > idle (memory bus, cpu, hypertransport, and I/O bus). > > How can you measure I/O bus in Linux? You can't. You can > only measure the I/O the CPU is servicing, which is not > actual. non-sequitur. Bill rightly points out that ~300 MB/s of IO, which is pretty decent, does not come close to saturating a modern platform. this is true by inspection. > I don't dispute that the Opteron can handle the PIO required > for today's advanced storage I/O done in software. I just > said the transfer load, especially using the Opterons as I/O > Processors doing programmed I/O, takes away from other > transfer operations it _might_ be doing if its servicing user > capabilities. sure, but so what? so SW raid will need to transfer a few extra chunks over some of the 8GB/s HT channels, among some of the 26 GB/s of memory bandwidth available. why do you think that a few hundred MB/s out of many GB/s is going to make a difference? > Especially during a failed drive, when you are constantly > reading in disk data over the much slower PCI-X interconnect. > At those times, you really could use a 2-4GBps of local > interconnect handling that -- instead of pushing all the way > up through the I/O to memory to CPU, just to get the data. how strange! what on earth do you think you can do with disks at 4 GB/s? or are you worried about streaming reads from ~50 disks at once? > The problem is that CPUs are designed for computation, not > pushing data around. what a strange idea! easily most of what most computers do is just dumb pushing around of data. there's very little computation in most web/db serving, for instance, very little in any desktop app. > have a good balance between processing and data movement. If > you're jamming your CPU with LOAD/MOV operations just for > storage, then you're turning it into an I/O processor -- your whole critique seems to be aesthetic - that the noble CPU should not be doing lowly xors. even if the CPU already has dedicated prefetch engines to help with this sort of thing, and can have multiple 128b xors in the pipe at once. > something it's not designed for, and it ends up doing > Programmed I/O. that's just plain weird. I haven't had a computer that did PIO for probably a decade. if you're just saying that SW raid>1 is like PIO in that the CPU touches data, well, OK, but what's so bad with that? the data rates are basically trivial, and does the server actually have something better to do with its cycles? > I just don't see how your write speed is 2x the read. It > doesn't make sense. you misread the columns. > > Er, and what exactly else should the fileserver be doing > > besides, er serving files? Serving out GigE? That's only > > another 100MB/sec. > > Not for some of us. ;-> hmm. the fastest IO clusters I know of are Luster+HSI (Quadrics or IB usually). servers typically manage about 300 MB/s each. > > So a 100MB write could be as little as an extra 12MB. > > A 100MB RAID-5 write pushes 100MB through the CPU's > interconnect, _period_. It might only generate an extra 12MB but who gives a damn? 100 MB approximately 2 second*disk, but about .025 second*cpu. in other words, 8 disks will take about .5 seconds to transfer 100MB (ignoring seeks), but the CPU will take about 1/20 that to process it. > That path is _not_ traversed in hardware RAID. The hardware duh. everyone knows that HW raid avoids passing the raw blocks through the host cpu. really, *everyone*. trust me. > > On these 4-8GB/sec busses an extra 12.5 % is not a big > deal. > > You obviously don't understand the dataflow. You are pushing > 100% to the CPU, then an additional 12.5% out. 100% of the cpu for a small fraction of the time. > > You have measured this, or it's just a theory you have? > > Have you quantified it? > > Yes. At the very high-end, 4x GbE and 2x 8506 cards spread > over 4 PCI-X channels using RAID-0 across the 8506 volumes. > I was serving over 500MBps via NFS consistently to over 25 > clients simultaneously. those numbers are pretty odd - just getting 500 MB/s over 4x Gb is pretty unusual. or do you mean that the 25 clients saw an aggregate 500 MB/s (which would be explained by client-side caching)? > > Having significantly higher I/O bandwidth to draw from > > leaves many advantages. > > You keep missing the fact that you're stuffing it into the > CPU, which can't work as fast as an ASIC. and you're missing the fact that disks are slow, therefore disk IO is slow, and only amounts to a small fraction of a modest CPU's capability. > > With a CPU that can do the xor calculations on the order > > of 7GB/sec most parts of the system > > ??? I'd _really_ like to know how you came up with that > number! I don't see someone being able to stuff 7GBps > through a CPU with even a SIMD operation. xor is not computationally harder than copying data, and yes, just look at stream+openmp to see 7GB/s on a system. (and system is the right target here, not cpu.) > Understand that when you use _any_ time in an operation, you > deduct that _time_. I.e., if you use 50% of a 250MBps bus, > you do _not_ deduct 250MBps from the next bus, but you deduct > 50% of the available bandwidth from that next bus. but your numbers are wrong. MD uses a ~10% of many-GB/s buses. we're not talking about dual-P3's with 64x33 PCI any more. > So if it took you 50% of your I/O bus to read the data into > memory, then the memory to CPU operation do _not_ happen at but it doesn't. the 8 GB/s HT channel is *not* 50% committed to a measly 250 MB/s. > At this point, I think I'd have to draw some timing and state > diagrams to explain this. You seem to be missing it. you underestimate your partners in dialog. not really a good thing. From b.j.smith at ieee.org Sun Sep 25 08:25:15 2005 From: b.j.smith at ieee.org (Bryan J. Smith) Date: Sun, 25 Sep 2005 03:25:15 -0500 Subject: Fedora SMP dual core, dual AMD 64 processor system In-Reply-To: References: Message-ID: <1127636716.4778.18.camel@bert64.oviedo.smithconcepts.com> On Sat, 2005-09-24 at 16:38 -0400, Mark Hahn wrote: > sure, but so what? so SW raid will need to transfer a few extra > chunks over some of the 8GB/s HT channels, among some of the > 26 GB/s of memory bandwidth available. I didn't know Opterons could distribute the I/O operations across CPUs to use all that memory bandwidth. You're throwing around theoretical and aggregate maximums like that's actually what you're going to get. > why do you think that a few hundred MB/s out of many GB/s is going > to make a difference? Yes, when you're moving those MBs through the CPU and turning it all into one huge Programmed I/O (PIO) operation. You're not going to get GBps in transfers. You're losing valuable chunks of time and, therefore, cutting cycles away from transfer time as well. Again, if it takes up 50% of your cycles just to push hundreds of MBs data through the CPU, then you have reduced all that theoretical bandwidth significantly. > how strange! what on earth do you think you can do with disks > at 4 GB/s? or are you worried about streaming reads from ~50 > disks at once? But you're not getting 4GBps. At this point, I need to bow out of this discussion. It's obvious that you believe you can push 8GB of data through your CPU in a second. -- Bryan J. Smith b.j.smith at ieee.org http://thebs413.blogspot.com ---------------------------------------------------------------------- The best things in life are NOT free - which is why life is easiest if you save all the bills until you can share them with the perfect woman From hahn at physics.mcmaster.ca Sun Sep 25 14:22:33 2005 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Sun, 25 Sep 2005 10:22:33 -0400 (EDT) Subject: Fedora SMP dual core, dual AMD 64 processor system In-Reply-To: <1127636716.4778.18.camel@bert64.oviedo.smithconcepts.com> Message-ID: > > sure, but so what? so SW raid will need to transfer a few extra > > chunks over some of the 8GB/s HT channels, among some of the > > 26 GB/s of memory bandwidth available. > > I didn't know Opterons could distribute the I/O operations across CPUs > to use all that memory bandwidth. You're throwing around theoretical > and aggregate maximums like that's actually what you're going to get. opterons can certainly have peripherals attached to multiple CPUs; this is normal for nvidia-based systems, and certainly possible with the AMD chipset. > > why do you think that a few hundred MB/s out of many GB/s is going > > to make a difference? > > Yes, when you're moving those MBs through the CPU and turning it all > into one huge Programmed I/O (PIO) operation. You're not going to get a stream of very brief block memory operations. > GBps in transfers. You're losing valuable chunks of time and, > therefore, cutting cycles away from transfer time as well. again, since the host can do the xor at several GB/s, a trickle of them at 250 MB/s just doesn't amount to much. > Again, if it takes up 50% of your cycles just to push hundreds of MBs again, 5% is about right, for an otherwise idle cpu. > > how strange! what on earth do you think you can do with disks > > at 4 GB/s? or are you worried about streaming reads from ~50 > > disks at once? > > But you're not getting 4GBps. At this point, I need to bow out of this > discussion. It's obvious that you believe you can push 8GB of data > through your CPU in a second. Stream says it only takes 3 cpus to push 8GB/s; that's triad, which is more CPU-intense than R5. that's also not counting write-allocate traffic, or using prefetchnta. similarly 4GB/s is correct, even low for a dual. From arijit_engg2001 at yahoo.com Fri Sep 30 11:39:21 2005 From: arijit_engg2001 at yahoo.com (Arijit Das) Date: Fri, 30 Sep 2005 04:39:21 -0700 (PDT) Subject: Strange Virtual Mem Mapping... Message-ID: <20050930113921.73129.qmail@web33305.mail.mud.yahoo.com> I have RH3.0 installed in an AMD64 machine. In this system, when I look at the virtual address space mappings of a process (say a sleep process), I see quite a few strange memory region mappings which are neither readable, nor writable/executable and all of them are Private (i.e. unshared). Check this: 1024 ---p /lib64/tls/libc-2.3.2.so 1024 ---p /lib64/tls/libm-2.3.2.so 1024 ---p /lib64/tls/librtkaio-2.3.2.so 1024 ---p /lib64/tls/libpthread-0.60.so On the other hand, when I look at the same info in a Rh7.2 system, I don't see anything like that... Question: How do I make sense of an unreadable/unwritable/unexecutable privately mapped memory region? What is its usage? It looks like a 100% wastage of the process's address space. How is the process using it? Any idea...anybody? You can find the sample "sleep" commands below. Thanks, Arijit RH3.0 on AMD64: ============= vgamd126:arijit>sleep 400 & [1] 19916 vgamd126:arijit>pmap 19916 Size (KB) Perm Associated files (if any) ========== ==== ============================================= 16 r-xp /bin/sleep 4 rw-p /bin/sleep 132 rwxp 1108 r-xp /lib64/ld-2.3.2.so 4 rw-p /lib64/ld-2.3.2.so 4 rw-p 540 r-xp /lib64/tls/libm-2.3.2.so 1024 ---p /lib64/tls/libm-2.3.2.so 4 rw-p /lib64/tls/libm-2.3.2.so 4 rw-p 36 r-xp /lib64/tls/librtkaio-2.3.2.so 1024 ---p /lib64/tls/librtkaio-2.3.2.so 4 rw-p /lib64/tls/librtkaio-2.3.2.so 64 rw-p 1260 r-xp /lib64/tls/libc-2.3.2.so 1024 ---p /lib64/tls/libc-2.3.2.so 20 rw-p /lib64/tls/libc-2.3.2.so 16 rw-p 60 r-xp /lib64/tls/libpthread-0.60.so 1024 ---p /lib64/tls/libpthread-0.60.so 4 rw-p /lib64/tls/libpthread-0.60.so 20 rw-p 31396 r--p /usr/lib/locale/locale-archive 24 rw-p Total Virtual Memory = 38816 KB vgamd126:arijit> RH7.2 in i686 ========== eurika120:arijit>sleep 400 & [1] 11065 eurika120:arijit>pmap 11065 Size (KB) Perm Associated files (if any) ========== ==== ============================================= 12 r-xp /bin/sleep 4 rw-p /bin/sleep 8 rwxp 88 r-xp /lib/ld-2.2.4.so 4 rw-p /lib/ld-2.2.4.so 4 r--p /usr/lib/locale/en_US/LC_IDENTIFICATION 4 r--p /usr/lib/locale/en_US/LC_MEASUREMENT 4 r--p /usr/lib/locale/en_US/LC_TELEPHONE 4 r--p /usr/lib/locale/en_US/LC_ADDRESS 4 r--p /usr/lib/locale/en_US/LC_NAME 4 r--p /usr/lib/locale/en_US/LC_PAPER 4 r--p /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES 4 r--p /usr/lib/locale/en_US/LC_MONETARY 24 r--p /usr/lib/locale/en_US/LC_COLLATE 4 r--p /usr/lib/locale/en_US/LC_TIME 4 r--p /usr/lib/locale/en_US/LC_NUMERIC 4 rw-p 136 r-xp /lib/i686/libm-2.2.4.so 4 rw-p /lib/i686/libm-2.2.4.so 28 r-xp /lib/librt-2.2.4.so 4 rw-p /lib/librt-2.2.4.so 40 rw-p 1224 r-xp /lib/i686/libc-2.2.4.so 20 rw-p /lib/i686/libc-2.2.4.so 16 rw-p 52 r-xp /lib/i686/libpthread-0.9.so 32 rw-p /lib/i686/libpthread-0.9.so 172 r--p /usr/lib/locale/en_US/LC_CTYPE 24 rwxp Total Virtual Memory = 1936 KB eurika120:arijit> --------------------------------- Yahoo! for Good Click here to donate to the Hurricane Katrina relief effort. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arijit_engg2001 at yahoo.com Fri Sep 30 12:02:31 2005 From: arijit_engg2001 at yahoo.com (Arijit Das) Date: Fri, 30 Sep 2005 05:02:31 -0700 (PDT) Subject: RH30: Virtual Mem shot by locale-archive Message-ID: <20050930120231.96032.qmail@web33307.mail.mud.yahoo.com> I have RH3.0 installed in an AMD64 machine. In this system, when I look at the virtual address space mappings of a process (say a sleep process), I find that almost 80% of its virtual address space has been taken by a private copy of /usr/lib/locale/locale-archive mapped to its virtual address space by default. Check this: 31396 KB r--p /usr/lib/locale/locale-archive Total Virtual Memory = 38816 KB On the other hand, when I look at the same info in a RH7.2 system, I see that a few small set of essential locale files have been mapped whose overall summed up size is around 236KB (way smaller than RH3.0)...Check this: 4 r--p /usr/lib/locale/en_US/LC_IDENTIFICATION 4 r--p /usr/lib/locale/en_US/LC_MEASUREMENT 4 r--p /usr/lib/locale/en_US/LC_TELEPHONE 4 r--p /usr/lib/locale/en_US/LC_ADDRESS 4 r--p /usr/lib/locale/en_US/LC_NAME 4 r--p /usr/lib/locale/en_US/LC_PAPER 4 r--p /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES 4 r--p /usr/lib/locale/en_US/LC_MONETARY 24 r--p /usr/lib/locale/en_US/LC_COLLATE 4 r--p /usr/lib/locale/en_US/LC_TIME 4 r--p /usr/lib/locale/en_US/LC_NUMERIC 172 r--p /usr/lib/locale/en_US/LC_CTYPE This seems like a huge requirement of memory for each small process executed in the RH3.0 system and hence, shots up the memory requirement of the entire system because the mapped region /usr/lib/locale/locale-archive is privately mapped. Question: 1) Is there any way by which I can instruct my RH3.0 system not to map the huge locale-archive file by default? Rather it should map the few small set of locale files, as mapped in RH7.2 system. 2) If the answer of my previous question is yes (it is possible), then what will be the impact of doing that? You can find the sample "sleep" commands below. Thanks, Arijit RH3.0 on AMD64: ============= vgamd126:arijit>sleep 400 & [1] 19916 vgamd126:arijit>pmap 19916 Size (KB) Perm Associated files (if any) ========== ==== ============================================= 16 r-xp /bin/sleep 4 rw-p /bin/sleep 132 rwxp 1108 r-xp /lib64/ld-2.3.2.so 4 rw-p /lib64/ld-2.3.2.so 4 rw-p 540 r-xp /lib64/tls/libm-2.3.2.so 1024 ---p /lib64/tls/libm-2.3.2.so 4 rw-p /lib64/tls/libm-2.3.2.so 4 rw-p 36 r-xp /lib64/tls/librtkaio-2.3.2.so 1024 ---p /lib64/tls/librtkaio-2.3.2.so 4 rw-p /lib64/tls/librtkaio-2.3.2.so 64 rw-p 1260 r-xp /lib64/tls/libc-2.3.2.so 1024 ---p /lib64/tls/libc-2.3.2.so 20 rw-p /lib64/tls/libc-2.3.2.so 16 rw-p 60 r-xp /lib64/tls/libpthread-0.60.so 1024 ---p /lib64/tls/libpthread-0.60.so 4 rw-p /lib64/tls/libpthread-0.60.so 20 rw-p 31396 r--p /usr/lib/locale/locale-archive 24 rw-p Total Virtual Memory = 38816 KB vgamd126:arijit> RH7.2 in i686 ========== eurika120:arijit>sleep 400 & [1] 11065 eurika120:arijit>pmap 11065 Size (KB) Perm Associated files (if any) ========== ==== ============================================= 12 r-xp /bin/sleep 4 rw-p /bin/sleep 8 rwxp 88 r-xp /lib/ld-2.2.4.so 4 rw-p /lib/ld-2.2.4.so 4 r--p /usr/lib/locale/en_US/LC_IDENTIFICATION 4 r--p /usr/lib/locale/en_US/LC_MEASUREMENT 4 r--p /usr/lib/locale/en_US/LC_TELEPHONE 4 r--p /usr/lib/locale/en_US/LC_ADDRESS 4 r--p /usr/lib/locale/en_US/LC_NAME 4 r--p /usr/lib/locale/en_US/LC_PAPER 4 r--p /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES 4 r--p /usr/lib/locale/en_US/LC_MONETARY 24 r--p /usr/lib/locale/en_US/LC_COLLATE 4 r--p /usr/lib/locale/en_US/LC_TIME 4 r--p /usr/lib/locale/en_US/LC_NUMERIC 4 rw-p 136 r-xp /lib/i686/libm-2.2.4.so 4 rw-p /lib/i686/libm-2.2.4.so 28 r-xp /lib/librt-2.2.4.so 4 rw-p /lib/librt-2.2.4.so 40 rw-p 1224 r-xp /lib/i686/libc-2.2.4.so 20 rw-p /lib/i686/libc-2.2.4.so 16 rw-p 52 r-xp /lib/i686/libpthread-0.9.so 32 rw-p /lib/i686/libpthread-0.9.so 172 r--p /usr/lib/locale/en_US/LC_CTYPE 24 rwxp Total Virtual Memory = 1936 KB eurika120:arijit> --------------------------------- Yahoo! for Good Click here to donate to the Hurricane Katrina relief effort. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arjanv at redhat.com Fri Sep 30 13:29:59 2005 From: arjanv at redhat.com (Arjan van de Ven) Date: Fri, 30 Sep 2005 15:29:59 +0200 Subject: RH30: Virtual Mem shot by locale-archive In-Reply-To: <20050930120231.96032.qmail@web33307.mail.mud.yahoo.com> References: <20050930120231.96032.qmail@web33307.mail.mud.yahoo.com> Message-ID: <1128086999.3012.12.camel@laptopd505.fenrus.org> On Fri, 2005-09-30 at 05:02 -0700, Arijit Das wrote: > I have RH3.0 installed in an AMD64 machine. > > In this system, when I look at the virtual address space mappings of a > This seems like a huge requirement of memory for each small process *virtual* memory. Of which an amd64 has a LOT of. (unlike a 32 bit system) > executed in the RH3.0 system and hence, shots up the memory > requirement of the entire system because the mapped > region /usr/lib/locale/locale-archive is privately mapped. but.. not used for the languages you don't use. It's not in actual physical memory. Also why do you think it's privately mapped? Unless it's written to it's Copy-On-Write for sure.. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: