From Axel.Thimm at ATrpms.net Tue Jul 13 15:26:11 2004 From: Axel.Thimm at ATrpms.net (Axel Thimm) Date: Tue, 13 Jul 2004 17:26:11 +0200 Subject: [LATE POST-ANNOUNCE] ATrpms for FC2/x86_64 Message-ID: <20040713152611.GC24735@neu.nirvana> Hi, this is not anymore hot-off-the-press, but still kind of a well kept secret: Courtesy of Heiko Appel, who made a Dual-Opteron box available for ATrpms, almost all ATrpms packages could be rebuilt for x86_64 by the end of June. The few packages refusing to build natively on x86_64 are such requiring mmx (jpeg-mmx) or i386 assembly (clisp). Also check the note on apt below. The packages are not in a separate distribution, for manual browsing they are side by side next to i386 packages: http://ATrpms.net/dist/fc2/ apt and yum repositories are similar to the i386 hierarchy: # ATrpms for Fedora Core 2 # Possible sections: at-stable, at-good, at-testing, at-bleeding rpm http://apt.physik.fu-berlin.de fedora/2/en/x86_64 at-stable #rpm-src http://apt.physik.fu-berlin.de fedora/2/en/x86_64 at-stable [at-stable] name=ATrpms for Fedora Core $releasever stable baseurl=http://apt.physik.fu-berlin.de/fedora/$releasever/en/$basearch/at-stable Note on apt: There are currently unresolved issues with biarch setups. If you don't need the grub rpm, you can remove all i386 rpms and apt will work. I tried an alternative approach using AllowDuplicates entries, but this did not work (AllowDuplicates on glibc makes the i386 provides invisible for apt and it detects broken dependencies in the grub rpm). Until apt goes biarch, use yum or up2date! :) Feedback on the ATrpms lists (http://lists.atrpms.net/), or the common bug tracker (http://bugzilla.atrpms.net/), as well as PM is welcome :) Enjoy! :) -- Axel.Thimm at ATrpms.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From Axel.Thimm at ATrpms.net Tue Jul 13 15:40:22 2004 From: Axel.Thimm at ATrpms.net (Axel Thimm) Date: Tue, 13 Jul 2004 17:40:22 +0200 Subject: nvidia-graphics6106 drivers rpm rebuilt for inclusion of x86_64. Message-ID: <20040713154022.GD24735@neu.nirvana> Well, I could swear that the 6106 driver for x86_64 wasn't there a couple of days ago, even if it says it was released two weeks ago. Anyway, this is the first "common" release of the nvidia driver for both i386 and x86_64, so I merged the src.rpms and rebuilt new rpms for both archs (otherwise I would have to use release tag tricks to make the src.rpms for the different archs distinguishable). I don't have x86_64 hardware with nvidia graphics on it, so I'd welcome any feedback on the x86_64 drivers. Thanks! Enjoy! -- Axel.Thimm at ATrpms.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From suneel_kumi at yahoo.co.in Wed Jul 14 05:38:23 2004 From: suneel_kumi at yahoo.co.in (=?iso-8859-1?q?sunil=20kumar?=) Date: Wed, 14 Jul 2004 06:38:23 +0100 (BST) Subject: amd64-list Digest, Vol 5, Issue 1 In-Reply-To: <20040713160002.BAFE173EFA@hormel.redhat.com> Message-ID: <20040714053823.2242.qmail@web8311.mail.in.yahoo.com> Plzz unsubscribe from the list. my mail box can't handle the frequency of ur mail. Do it earliest. cheers, suneel. --- amd64-list-request at redhat.com wrote: > Send amd64-list mailing list submissions to > amd64-list at redhat.com > > To subscribe or unsubscribe via the World Wide Web, > visit > https://www.redhat.com/mailman/listinfo/amd64-list > or, via email, send a message with subject or body > 'help' to > amd64-list-request at redhat.com > > You can reach the person managing the list at > amd64-list-owner at redhat.com > > When replying, please edit your Subject line so it > is more specific > than "Re: Contents of amd64-list digest..." > > > Today's Topics: > > 1. [LATE POST-ANNOUNCE] ATrpms for FC2/x86_64 > (Axel Thimm) > 2. nvidia-graphics6106 drivers rpm rebuilt for > inclusion of > x86_64. (Axel Thimm) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 13 Jul 2004 17:26:11 +0200 > From: Axel Thimm > Subject: [LATE POST-ANNOUNCE] ATrpms for FC2/x86_64 > To: ATrpms announcements > > Cc: fedora-list at redhat.com, amd64-list at redhat.com > Message-ID: <20040713152611.GC24735 at neu.nirvana> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > this is not anymore hot-off-the-press, but still > kind of a well kept > secret: > > Courtesy of Heiko Appel, who made a Dual-Opteron box > available for > ATrpms, almost all ATrpms packages could be rebuilt > for x86_64 by the > end of June. > > The few packages refusing to build natively on > x86_64 are such > requiring mmx (jpeg-mmx) or i386 assembly (clisp). > Also check the note > on apt below. > > The packages are not in a separate distribution, for > manual browsing > they are side by side next to i386 packages: > > http://ATrpms.net/dist/fc2/ > > apt and yum repositories are similar to the i386 > hierarchy: > > # ATrpms for Fedora Core 2 > # Possible sections: at-stable, at-good, at-testing, > at-bleeding > rpm http://apt.physik.fu-berlin.de > fedora/2/en/x86_64 at-stable > #rpm-src http://apt.physik.fu-berlin.de > fedora/2/en/x86_64 at-stable > > [at-stable] > name=ATrpms for Fedora Core $releasever stable > baseurl=http://apt.physik.fu-berlin.de/fedora/$releasever/en/$basearch/at-stable > > Note on apt: There are currently unresolved issues > with biarch > setups. If you don't need the grub rpm, you can > remove all i386 rpms > and apt will work. I tried an alternative approach > using > AllowDuplicates entries, but this did not work > (AllowDuplicates on > glibc makes the i386 provides invisible for apt and > it detects broken > dependencies in the grub rpm). > > Until apt goes biarch, use yum or up2date! :) > > Feedback on the ATrpms lists > (http://lists.atrpms.net/), or the common > bug tracker (http://bugzilla.atrpms.net/), as well > as PM is welcome :) > > Enjoy! :) > -- > Axel.Thimm at ATrpms.net > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 189 bytes > Desc: not available > Url : > /archives/amd64-list/attachments/20040713/4f8dcabd/attachment.bin > > ------------------------------ > > Message: 2 > Date: Tue, 13 Jul 2004 17:40:22 +0200 > From: Axel Thimm > Subject: nvidia-graphics6106 drivers rpm rebuilt for > inclusion of > x86_64. > To: For users of Fedora Core releases > > Cc: amd64-list at redhat.com > Message-ID: <20040713154022.GD24735 at neu.nirvana> > Content-Type: text/plain; charset="us-ascii" > > Well, I could swear that the 6106 driver for x86_64 > wasn't there a > couple of days ago, even if it says it was released > two weeks ago. > > Anyway, this is the first "common" release of the > nvidia driver for > both i386 and x86_64, so I merged the src.rpms and > rebuilt new rpms > for both archs (otherwise I would have to use > release tag tricks to > make the src.rpms for the different archs > distinguishable). > > I don't have x86_64 hardware with nvidia graphics on > it, so I'd > welcome any feedback on the x86_64 drivers. Thanks! > > Enjoy! > -- > Axel.Thimm at ATrpms.net > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: not available > Type: application/pgp-signature > Size: 189 bytes > Desc: not available > Url : > /archives/amd64-list/attachments/20040713/21400acf/attachment.bin > > ------------------------------ > > -- > amd64-list mailing list > amd64-list at redhat.com > https://www.redhat.com/mailman/listinfo/amd64-list > > > End of amd64-list Digest, Vol 5, Issue 1 > **************************************** ________________________________________________________________________ Yahoo! India Careers: Over 65,000 jobs online Go to: http://yahoo.naukri.com/ From kajyuneko at excite.co.jp Thu Jul 15 15:11:23 2004 From: kajyuneko at excite.co.jp (kajyuneko at excite.co.jp) Date: 16 Jul 2004 00:11:23 +0900 Subject: FedoraCore3 test1 Message-ID: <20040715151123.52319.qmail@asp111.mail.excite.co.jp> Hello Im Japanese FedoraCore2 Development user Im not FedoraCore3 Test3 install Anaconda Installer It does not move. I have a PC SPEC athlon 64 3200+ Aopen AK86-L NVIDIA Geforce5900XT How can I FedoraCore3 Test3 install???? I am sorry that it is poor at English. From kernel at linuxfarms.com Fri Jul 16 03:47:17 2004 From: kernel at linuxfarms.com (Arthur Perry) Date: Fri, 16 Jul 2004 03:47:17 -0000 Subject: GART Error 11 In-Reply-To: References: Message-ID: Hi Saurabh, I am working on this issue as we speak. It is interesting that your machine crashes entirely with iommu disabled. I am starting to think there is more to this than just the kernel misreporting other hardware errors (being improperly decoded as GART errors). On my machine, I am actaully getting Gart erros on 3 out of 4 CPUS when I use RedHat's 2.4.21-9EL kernel. This same kernel when rebuilt from source, however, will not produce GART errors when built without AGP support. Here is my Extended error code (bits 19-16 on 0:[18,19,1b]:3 at offset 0x44: 0101 = GART error So, this is not a translation issue on my side. Can you do this for me? pcitweak -r 0:18:3 0x44 and pcitweak -r 0:19:3 0x44 Thanks! Arthur Perry Lead Linux Developer / Linux Systems Architect Validation, CSU Celestica Sair/Linux Gnu Certified Professional Providing professional Linux solutions for 7+ years On Tue, 1 Jun 2004, Saurabh Barve wrote: > Hi, > > I know this has been posted before on this list, but the solution > suggested does not seem to work for me. > > I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on > it. The kernel version is 2.4.21-4.ELsmp. The motherboard I am using is > the Tyan Thunder K8S Pro - 2882 motherboard. > > I am getting the following error every two minutes or so: > > GART error 11 > Lost an northbridge error > NB error address some-hex-number > Error uncorrected > > I checked the various postings on the list, and someone suggested that > passing iommu=off option to the kernel solved the problem for him. > However, when I tried that, it got the kernel to panic. I read somewhere > that a newer kernel would fix these 'bugs' in the default RHEL kernel. > However, I am using the onboard SATA controller for my hard disks. This > requires binary drivers from Tyan. I already downloaded a newer kernel, > however, it breaks the drivers, so I can't boot into the new kernel. > > Here is my output from lspci: > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) > 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) > 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) > 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) > 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > (rev 12) > 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > (rev 12) > 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > 03:05.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD > Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) > 03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > 03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev > 10) > > The dmesg output was too large to include inline, so I am attaching it as > a text file. > > I tried passing the following options to the kernel: > > iommu=noagp > iommu=noforce > iommu=off (results in kernel-panic) > mce=off > mce=0 > > I tried all the above in various combinations, but none of them worked. > The machine doesn't crash, and everything else seems to work fine, but I'd > like to get rid of these errors. > > There are some snippets from the dmesg output that I found to be of > interest: > > ------------------------------------------------------------ > Linux agpgart interface v0.99 (c) Jeff Hartmann > agpgart: Maximum main memory to use for agp memory: 7956M > agpgart: no supported devices found. > PCI-DMA: Disabling AGP. > PCI-DMA: aperture base @ 10000000 size 65536 KB > PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > ----------------------------------------------------------- > > ----------------------------------------------------------- > > GART error 11 > Lost an northbridge error > NB error address 00000000fbfe4398 > Error uncorrected > Northbridge status a40000000005001b > > ---------------------------------------------------------- > > > Any suggestions? > > Thanks, > Saurabh. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From kernel at linuxfarms.com Fri Jul 16 03:47:17 2004 From: kernel at linuxfarms.com (Arthur Perry) Date: Fri, 16 Jul 2004 03:47:17 -0000 Subject: GART Error 11 In-Reply-To: References: Message-ID: Hi Saurabh, I almost forgot. Can you also tell me which AMD CPUs you are using? Preferrably by number if you know (starts with OSA I believe), or at least the CPU speed. Thanks! Arthur Perry Lead Linux Developer / Linux Systems Architect Validation, CSU Celestica Sair/Linux Gnu Certified Professional Providing professional Linux solutions for 7+ years On Tue, 1 Jun 2004, Arthur Perry wrote: > Hi Saurabh, > > I am working on this issue as we speak. > It is interesting that your machine crashes entirely with iommu disabled. > > I am starting to think there is more to this than just the kernel > misreporting other hardware errors (being improperly decoded as GART > errors). > On my machine, I am actaully getting Gart erros on 3 out of 4 CPUS when I > use RedHat's 2.4.21-9EL kernel. This same kernel when rebuilt from source, > however, will not produce GART errors when built without AGP support. > > Here is my Extended error code (bits 19-16 on 0:[18,19,1b]:3 at offset 0x44: > 0101 = GART error > > So, this is not a translation issue on my side. > > Can you do this for me? > > pcitweak -r 0:18:3 0x44 > and > pcitweak -r 0:19:3 0x44 > > > Thanks! > > > Arthur Perry > Lead Linux Developer / Linux Systems Architect > Validation, CSU Celestica > Sair/Linux Gnu Certified Professional > Providing professional Linux solutions for 7+ years > > On Tue, 1 Jun 2004, Saurabh Barve wrote: > > > Hi, > > > > I know this has been posted before on this list, but the solution > > suggested does not seem to work for me. > > > > I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on > > it. The kernel version is 2.4.21-4.ELsmp. The motherboard I am using is > > the Tyan Thunder K8S Pro - 2882 motherboard. > > > > I am getting the following error every two minutes or so: > > > > GART error 11 > > Lost an northbridge error > > NB error address some-hex-number > > Error uncorrected > > > > I checked the various postings on the list, and someone suggested that > > passing iommu=off option to the kernel solved the problem for him. > > However, when I tried that, it got the kernel to panic. I read somewhere > > that a newer kernel would fix these 'bugs' in the default RHEL kernel. > > However, I am using the onboard SATA controller for my hard disks. This > > requires binary drivers from Tyan. I already downloaded a newer kernel, > > however, it breaks the drivers, so I can't boot into the new kernel. > > > > Here is my output from lspci: > > > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) > > 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) > > 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) > > 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) > > 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > > 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > (rev 12) > > 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > (rev 12) > > 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > Gigabit Ethernet (rev 03) > > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > Gigabit Ethernet (rev 03) > > 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > 03:05.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD > > Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) > > 03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > 03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev > > 10) > > > > The dmesg output was too large to include inline, so I am attaching it as > > a text file. > > > > I tried passing the following options to the kernel: > > > > iommu=noagp > > iommu=noforce > > iommu=off (results in kernel-panic) > > mce=off > > mce=0 > > > > I tried all the above in various combinations, but none of them worked. > > The machine doesn't crash, and everything else seems to work fine, but I'd > > like to get rid of these errors. > > > > There are some snippets from the dmesg output that I found to be of > > interest: > > > > ------------------------------------------------------------ > > Linux agpgart interface v0.99 (c) Jeff Hartmann > > agpgart: Maximum main memory to use for agp memory: 7956M > > agpgart: no supported devices found. > > PCI-DMA: Disabling AGP. > > PCI-DMA: aperture base @ 10000000 size 65536 KB > > PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > ----------------------------------------------------------- > > > > ----------------------------------------------------------- > > > > GART error 11 > > Lost an northbridge error > > NB error address 00000000fbfe4398 > > Error uncorrected > > Northbridge status a40000000005001b > > > > ---------------------------------------------------------- > > > > > > Any suggestions? > > > > Thanks, > > Saurabh. > > > > > -- > amd64-list mailing list > amd64-list at redhat.com > https://www.redhat.com/mailman/listinfo/amd64-list > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sa at atmos.colostate.edu Fri Jul 16 03:47:30 2004 From: sa at atmos.colostate.edu (Saurabh Barve) Date: Fri, 16 Jul 2004 03:47:30 -0000 Subject: GART Error 11 In-Reply-To: Message-ID: Arthur, I list all the information I have right off the bat: AMD Opteron Model 246, 1 MB L2 Cache 64-bit processor Model : AMD Opteron Model 246 Core : Hammer Operating Frequency : 2 GHz Cache : L1/128K, L2/1024K Socekt: Socket 940 Is that info enough? I just remembered looking at /var/log/dmesg again. There was a line that said that IOMMU was not enabled in my BIOS, and that I should enable it. However, I can't see any option in my BIOS for enabling/disabling IOMMU. Thanks, Saurabh. On Tue, 1 Jun 2004, Arthur Perry wrote: > Hi Saurabh, > > I almost forgot. > Can you also tell me which AMD CPUs you are using? > Preferrably by number if you know (starts with OSA I believe), or at least > the CPU speed. > Thanks! > > Arthur Perry > Lead Linux Developer / Linux Systems Architect > Validation, CSU Celestica > Sair/Linux Gnu Certified Professional > Providing professional Linux solutions for 7+ years > > On Tue, 1 Jun 2004, Arthur Perry wrote: > > > Hi Saurabh, > > > > I am working on this issue as we speak. > > It is interesting that your machine crashes entirely with iommu disabled. > > > > I am starting to think there is more to this than just the kernel > > misreporting other hardware errors (being improperly decoded as GART > > errors). > > On my machine, I am actaully getting Gart erros on 3 out of 4 CPUS when I > > use RedHat's 2.4.21-9EL kernel. This same kernel when rebuilt from source, > > however, will not produce GART errors when built without AGP support. > > > > Here is my Extended error code (bits 19-16 on 0:[18,19,1b]:3 at offset 0x44: > > 0101 = GART error > > > > So, this is not a translation issue on my side. > > > > Can you do this for me? > > > > pcitweak -r 0:18:3 0x44 > > and > > pcitweak -r 0:19:3 0x44 > > > > > > Thanks! > > > > > > Arthur Perry > > Lead Linux Developer / Linux Systems Architect > > Validation, CSU Celestica > > Sair/Linux Gnu Certified Professional > > Providing professional Linux solutions for 7+ years > > > > On Tue, 1 Jun 2004, Saurabh Barve wrote: > > > > > Hi, > > > > > > I know this has been posted before on this list, but the solution > > > suggested does not seem to work for me. > > > > > > I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on > > > it. The kernel version is 2.4.21-4.ELsmp. The motherboard I am using is > > > the Tyan Thunder K8S Pro - 2882 motherboard. > > > > > > I am getting the following error every two minutes or so: > > > > > > GART error 11 > > > Lost an northbridge error > > > NB error address some-hex-number > > > Error uncorrected > > > > > > I checked the various postings on the list, and someone suggested that > > > passing iommu=off option to the kernel solved the problem for him. > > > However, when I tried that, it got the kernel to panic. I read somewhere > > > that a newer kernel would fix these 'bugs' in the default RHEL kernel. > > > However, I am using the onboard SATA controller for my hard disks. This > > > requires binary drivers from Tyan. I already downloaded a newer kernel, > > > however, it breaks the drivers, so I can't boot into the new kernel. > > > > > > Here is my output from lspci: > > > > > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) > > > 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) > > > 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) > > > 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) > > > 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > > > 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > > (rev 12) > > > 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > > (rev 12) > > > 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > > Gigabit Ethernet (rev 03) > > > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > > Gigabit Ethernet (rev 03) > > > 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > > 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > > 03:05.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD > > > Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) > > > 03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > > 03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev > > > 10) > > > > > > The dmesg output was too large to include inline, so I am attaching it as > > > a text file. > > > > > > I tried passing the following options to the kernel: > > > > > > iommu=noagp > > > iommu=noforce > > > iommu=off (results in kernel-panic) > > > mce=off > > > mce=0 > > > > > > I tried all the above in various combinations, but none of them worked. > > > The machine doesn't crash, and everything else seems to work fine, but I'd > > > like to get rid of these errors. > > > > > > There are some snippets from the dmesg output that I found to be of > > > interest: > > > > > > ------------------------------------------------------------ > > > Linux agpgart interface v0.99 (c) Jeff Hartmann > > > agpgart: Maximum main memory to use for agp memory: 7956M > > > agpgart: no supported devices found. > > > PCI-DMA: Disabling AGP. > > > PCI-DMA: aperture base @ 10000000 size 65536 KB > > > PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > > ----------------------------------------------------------- > > > > > > ----------------------------------------------------------- > > > > > > GART error 11 > > > Lost an northbridge error > > > NB error address 00000000fbfe4398 > > > Error uncorrected > > > Northbridge status a40000000005001b > > > > > > ---------------------------------------------------------- > > > > > > > > > Any suggestions? > > > > > > Thanks, > > > Saurabh. > > > > > > > > > -- > > amd64-list mailing list > > amd64-list at redhat.com > > https://www.redhat.com/mailman/listinfo/amd64-list > > > -- =============================================================================== Saurabh Barve Phone: System Administrator/Data Specialist 970-491-7714 (voice) Montgomery Research Group, 970-491-8449 (Fax) Atmospheric Sciences Department, Fort Collins, Colorado Colorado State University Mail : sa at atmos.colostate.edu Web : http://fjortoft.atmos.colostate.edu/~sa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sa at atmos.colostate.edu Fri Jul 16 03:47:30 2004 From: sa at atmos.colostate.edu (Saurabh Barve) Date: Fri, 16 Jul 2004 03:47:30 -0000 Subject: GART Error 11 In-Reply-To: Message-ID: Arthur, Here are the results that I got > Can you do this for me? > > pcitweak -r 0:18:3 0x44 0x02400040 > and > pcitweak -r 0:19:3 0x44 0x02400040 Hope this helps, Saurabh. > > Thanks! > > > Arthur Perry > Lead Linux Developer / Linux Systems Architect > Validation, CSU Celestica > Sair/Linux Gnu Certified Professional > Providing professional Linux solutions for 7+ years > > On Tue, 1 Jun 2004, Saurabh Barve wrote: > > > Hi, > > > > I know this has been posted before on this list, but the solution > > suggested does not seem to work for me. > > > > I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on > > it. The kernel version is 2.4.21-4.ELsmp. The motherboard I am using is > > the Tyan Thunder K8S Pro - 2882 motherboard. > > > > I am getting the following error every two minutes or so: > > > > GART error 11 > > Lost an northbridge error > > NB error address some-hex-number > > Error uncorrected > > > > I checked the various postings on the list, and someone suggested that > > passing iommu=off option to the kernel solved the problem for him. > > However, when I tried that, it got the kernel to panic. I read somewhere > > that a newer kernel would fix these 'bugs' in the default RHEL kernel. > > However, I am using the onboard SATA controller for my hard disks. This > > requires binary drivers from Tyan. I already downloaded a newer kernel, > > however, it breaks the drivers, so I can't boot into the new kernel. > > > > Here is my output from lspci: > > > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) > > 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) > > 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) > > 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) > > 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > > 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > (rev 12) > > 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > (rev 12) > > 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > Gigabit Ethernet (rev 03) > > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > Gigabit Ethernet (rev 03) > > 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > 03:05.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD > > Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) > > 03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > 03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev > > 10) > > > > The dmesg output was too large to include inline, so I am attaching it as > > a text file. > > > > I tried passing the following options to the kernel: > > > > iommu=noagp > > iommu=noforce > > iommu=off (results in kernel-panic) > > mce=off > > mce=0 > > > > I tried all the above in various combinations, but none of them worked. > > The machine doesn't crash, and everything else seems to work fine, but I'd > > like to get rid of these errors. > > > > There are some snippets from the dmesg output that I found to be of > > interest: > > > > ------------------------------------------------------------ > > Linux agpgart interface v0.99 (c) Jeff Hartmann > > agpgart: Maximum main memory to use for agp memory: 7956M > > agpgart: no supported devices found. > > PCI-DMA: Disabling AGP. > > PCI-DMA: aperture base @ 10000000 size 65536 KB > > PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > ----------------------------------------------------------- > > > > ----------------------------------------------------------- > > > > GART error 11 > > Lost an northbridge error > > NB error address 00000000fbfe4398 > > Error uncorrected > > Northbridge status a40000000005001b > > > > ---------------------------------------------------------- > > > > > > Any suggestions? > > > > Thanks, > > Saurabh. > > > -- =============================================================================== Saurabh Barve Phone: System Administrator/Data Specialist 970-491-7714 (voice) Montgomery Research Group, 970-491-8449 (Fax) Atmospheric Sciences Department, Fort Collins, Colorado Colorado State University Mail : sa at atmos.colostate.edu Web : http://fjortoft.atmos.colostate.edu/~sa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From kernel at linuxfarms.com Fri Jul 16 03:50:55 2004 From: kernel at linuxfarms.com (Arthur Perry) Date: Fri, 16 Jul 2004 03:50:55 -0000 Subject: GART Error 11 In-Reply-To: References: Message-ID: Hello, Oops. Sorry I have made a mistake in all of my statements below. It was after 5pm yesterday, and it was a long day... It's not offset 0x44 that we are interested in. My listings were at offset 0x48, which is MCA NB Status Low Register. Sorry, did not mean to confuse anybody. So Saurabh, can you please do this again with the corrected lines? pcitweak -r 0:18:3 0x48 and pcitweak -r 0:19:3 0x48 While you are at it, can you send us status high as well? pcitweak -r 0:18:3 0x4c and pcitweak -r 0:19:3 0x4c Thanks, and sorry about the confusion. Arthur Perry On Tue, 1 Jun 2004, Saurabh Barve wrote: > Arthur, > > Here are the results that I got > > > Can you do this for me? > > > > pcitweak -r 0:18:3 0x44 > > 0x02400040 > > > and > > pcitweak -r 0:19:3 0x44 > > 0x02400040 > > Hope this helps, > Saurabh. > > > > > Thanks! > > > > > > Arthur Perry > > Lead Linux Developer / Linux Systems Architect > > Validation, CSU Celestica > > Sair/Linux Gnu Certified Professional > > Providing professional Linux solutions for 7+ years > > > > On Tue, 1 Jun 2004, Saurabh Barve wrote: > > > > > Hi, > > > > > > I know this has been posted before on this list, but the solution > > > suggested does not seem to work for me. > > > > > > I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on > > > it. The kernel version is 2.4.21-4.ELsmp. The motherboard I am using is > > > the Tyan Thunder K8S Pro - 2882 motherboard. > > > > > > I am getting the following error every two minutes or so: > > > > > > GART error 11 > > > Lost an northbridge error > > > NB error address some-hex-number > > > Error uncorrected > > > > > > I checked the various postings on the list, and someone suggested that > > > passing iommu=off option to the kernel solved the problem for him. > > > However, when I tried that, it got the kernel to panic. I read somewhere > > > that a newer kernel would fix these 'bugs' in the default RHEL kernel. > > > However, I am using the onboard SATA controller for my hard disks. This > > > requires binary drivers from Tyan. I already downloaded a newer kernel, > > > however, it breaks the drivers, so I can't boot into the new kernel. > > > > > > Here is my output from lspci: > > > > > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) > > > 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) > > > 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) > > > 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) > > > 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) > > > 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > > (rev 12) > > > 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge > > > (rev 12) > > > 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) > > > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge > > > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > > Gigabit Ethernet (rev 03) > > > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > > > Gigabit Ethernet (rev 03) > > > 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > > 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) > > > 03:05.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD > > > Technology Inc) Silicon Image SiI 3114 SATARaid Controller (rev 02) > > > 03:06.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) > > > 03:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev > > > 10) > > > > > > The dmesg output was too large to include inline, so I am attaching it as > > > a text file. > > > > > > I tried passing the following options to the kernel: > > > > > > iommu=noagp > > > iommu=noforce > > > iommu=off (results in kernel-panic) > > > mce=off > > > mce=0 > > > > > > I tried all the above in various combinations, but none of them worked. > > > The machine doesn't crash, and everything else seems to work fine, but I'd > > > like to get rid of these errors. > > > > > > There are some snippets from the dmesg output that I found to be of > > > interest: > > > > > > ------------------------------------------------------------ > > > Linux agpgart interface v0.99 (c) Jeff Hartmann > > > agpgart: Maximum main memory to use for agp memory: 7956M > > > agpgart: no supported devices found. > > > PCI-DMA: Disabling AGP. > > > PCI-DMA: aperture base @ 10000000 size 65536 KB > > > PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture > > > ----------------------------------------------------------- > > > > > > ----------------------------------------------------------- > > > > > > GART error 11 > > > Lost an northbridge error > > > NB error address 00000000fbfe4398 > > > Error uncorrected > > > Northbridge status a40000000005001b > > > > > > ---------------------------------------------------------- > > > > > > > > > Any suggestions? > > > > > > Thanks, > > > Saurabh. > > > > > > > -- > =============================================================================== > Saurabh Barve Phone: > System Administrator/Data Specialist 970-491-7714 (voice) > Montgomery Research Group, 970-491-8449 (Fax) > Atmospheric Sciences Department, > Fort Collins, Colorado > Colorado State University > > Mail : sa at atmos.colostate.edu > Web : http://fjortoft.atmos.colostate.edu/~sa > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sa at atmos.colostate.edu Fri Jul 16 03:51:22 2004 From: sa at atmos.colostate.edu (Saurabh Barve) Date: Fri, 16 Jul 2004 03:51:22 -0000 Subject: GART Error 11 In-Reply-To: Message-ID: Sorry about the delay in my reply. Just got in to work! Here is the output: > pcitweak -r 0:18:3 0x48 0x0005001B > and > pcitweak -r 0:19:3 0x48 0x00000000 > While you are at it, can you send us status high as well? > > pcitweak -r 0:18:3 0x4c 0xA4000000 > and > pcitweak -r 0:19:3 0x4c 0x00000000 I don't know if this would help, but below is a part of my cronwatch log: --------------------- Init Begin ------------------------ **Unmatched Entries** Trying to re-exec init Trying to re-exec init ---------------------- Init End ------------------------- --------------------- Kernel Begin ------------------------ WARNING: Kernel Errors Present uteval-0098: *** Error: Method executio...: 4Time(s) psparse-1121: *** Error: Method executio...: 8Time(s) Error uncorrected...: 538Time(s) GART error 11...: 538Time(s) Lost an northbridge error...: 538Time(s) NB error address 00000000...: 538Time(s) ---------------------- Kernel End ------------------------- --------------------- ModProbe Begin ------------------------ Can't locate these modules: char-major-10-134: 4 Time(s) sound-service-0-3: 6 Time(s) xp0: 3 Time(s) sound-slot-0: 6 Time(s) char-major-188: 15 Time(s) ---------------------- ModProbe End ------------------------- Thanks, Saurabh. -- =============================================================================== Saurabh Barve Phone: System Administrator/Data Specialist 970-491-7714 (voice) Montgomery Research Group, 970-491-8449 (Fax) Atmospheric Sciences Department, Fort Collins, Colorado Colorado State University Mail : sa at atmos.colostate.edu Web : http://fjortoft.atmos.colostate.edu/~sa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From kernel at linuxfarms.com Fri Jul 16 03:51:44 2004 From: kernel at linuxfarms.com (Arthur Perry) Date: Fri, 16 Jul 2004 03:51:44 -0000 Subject: GART Error 11 In-Reply-To: References: Message-ID: Hi Saurabh, Thanks. It looks like you also have true GART errors as reported by hardware, on CPU0. So our common failure mode here is actual GART errors and not something else being reported as a GART error because of erroneous kernel translation. It's possible that we are using a device driver somewhere that is misbehaving, which is using the GART or IOMMU improperly somehow, or my guess is that is may be the actual AGP device driver used by RedHat. ie, they may have not patched in the most recent version that may contain a lot of fixes. Thanks for your feedback. As of making your messages go away, I would tell you to disable the GartTableWalk in MCE, but that does not seem to work on my machine. I'll let you know what does work without turning off Northbridge MC* entirely once I discover it. -Arthur Perry On Wed, 2 Jun 2004, Saurabh Barve wrote: > Sorry about the delay in my reply. Just got in to work! > Here is the output: > > > pcitweak -r 0:18:3 0x48 > > 0x0005001B > > > and > > pcitweak -r 0:19:3 0x48 > > 0x00000000 > > > While you are at it, can you send us status high as well? > > > > pcitweak -r 0:18:3 0x4c > > 0xA4000000 > > > and > > pcitweak -r 0:19:3 0x4c > > 0x00000000 > > I don't know if this would help, but below is a part of my cronwatch log: > > --------------------- Init Begin ------------------------ > > **Unmatched Entries** > Trying to re-exec init > Trying to re-exec init > > ---------------------- Init End ------------------------- > > > --------------------- Kernel Begin ------------------------ > > > WARNING: Kernel Errors Present > uteval-0098: *** Error: Method executio...: 4Time(s) > psparse-1121: *** Error: Method executio...: 8Time(s) > Error uncorrected...: 538Time(s) > GART error 11...: 538Time(s) > Lost an northbridge error...: 538Time(s) > NB error address 00000000...: 538Time(s) > > ---------------------- Kernel End ------------------------- > > > --------------------- ModProbe Begin ------------------------ > > > Can't locate these modules: > char-major-10-134: 4 Time(s) > sound-service-0-3: 6 Time(s) > xp0: 3 Time(s) > sound-slot-0: 6 Time(s) > char-major-188: 15 Time(s) > > ---------------------- ModProbe End ------------------------- > > > Thanks, > Saurabh. > > -- > =============================================================================== > Saurabh Barve Phone: > System Administrator/Data Specialist 970-491-7714 (voice) > Montgomery Research Group, 970-491-8449 (Fax) > Atmospheric Sciences Department, > Fort Collins, Colorado > Colorado State University > > Mail : sa at atmos.colostate.edu > Web : http://fjortoft.atmos.colostate.edu/~sa > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From kernel at linuxfarms.com Fri Jul 16 03:51:53 2004 From: kernel at linuxfarms.com (Arthur Perry) Date: Fri, 16 Jul 2004 03:51:53 -0000 Subject: GART Error 11 In-Reply-To: References: Message-ID: Or actually, I should say, you "most likely" have this as well, since I asked you to gather the information through the more qurky interface. The bits for this error case match perfectly, so I'd say it's probably a good bet. Arthur Perry On Wed, 2 Jun 2004, Arthur Perry wrote: > Hi Saurabh, > > Thanks. It looks like you also have true GART errors as reported by hardware, on CPU0. > So our common failure mode here is actual GART errors and not something else being reported as a GART error because of erroneous kernel translation. > > It's possible that we are using a device driver somewhere that is misbehaving, which is using the GART or IOMMU improperly somehow, or my guess is that is may be the actual AGP device driver used by RedHat. > ie, they may have not patched in the most recent version that may contain a lot of fixes. > > Thanks for your feedback. > > As of making your messages go away, I would tell you to disable the GartTableWalk in MCE, but that does not seem to work on my machine. > I'll let you know what does work without turning off Northbridge MC* entirely once I discover it. > > -Arthur Perry > > > > On Wed, 2 Jun 2004, Saurabh Barve wrote: > > > Sorry about the delay in my reply. Just got in to work! > > Here is the output: > > > > > pcitweak -r 0:18:3 0x48 > > > > 0x0005001B > > > > > and > > > pcitweak -r 0:19:3 0x48 > > > > 0x00000000 > > > > > While you are at it, can you send us status high as well? > > > > > > pcitweak -r 0:18:3 0x4c > > > > 0xA4000000 > > > > > and > > > pcitweak -r 0:19:3 0x4c > > > > 0x00000000 > > > > I don't know if this would help, but below is a part of my cronwatch log: > > > > --------------------- Init Begin ------------------------ > > > > **Unmatched Entries** > > Trying to re-exec init > > Trying to re-exec init > > > > ---------------------- Init End ------------------------- > > > > > > --------------------- Kernel Begin ------------------------ > > > > > > WARNING: Kernel Errors Present > > uteval-0098: *** Error: Method executio...: 4Time(s) > > psparse-1121: *** Error: Method executio...: 8Time(s) > > Error uncorrected...: 538Time(s) > > GART error 11...: 538Time(s) > > Lost an northbridge error...: 538Time(s) > > NB error address 00000000...: 538Time(s) > > > > ---------------------- Kernel End ------------------------- > > > > > > --------------------- ModProbe Begin ------------------------ > > > > > > Can't locate these modules: > > char-major-10-134: 4 Time(s) > > sound-service-0-3: 6 Time(s) > > xp0: 3 Time(s) > > sound-slot-0: 6 Time(s) > > char-major-188: 15 Time(s) > > > > ---------------------- ModProbe End ------------------------- > > > > > > Thanks, > > Saurabh. > > > > -- > > =============================================================================== > > Saurabh Barve Phone: > > System Administrator/Data Specialist 970-491-7714 (voice) > > Montgomery Research Group, 970-491-8449 (Fax) > > Atmospheric Sciences Department, > > Fort Collins, Colorado > > Colorado State University > > > > Mail : sa at atmos.colostate.edu > > Web : http://fjortoft.atmos.colostate.edu/~sa > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > > > -- > amd64-list mailing list > amd64-list at redhat.com > https://www.redhat.com/mailman/listinfo/amd64-list > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sa at atmos.colostate.edu Fri Jul 16 03:51:58 2004 From: sa at atmos.colostate.edu (Saurabh Barve) Date: Fri, 16 Jul 2004 03:51:58 -0000 Subject: GART Error 11 In-Reply-To: Message-ID: Thanks Arthur! The machine seems to work except for the errors. Is there a way to update the drivers in the OS without having to upgrade the kernel. I guess we'll first have to find out which driver is misbehaving !! I'll try the 'mce=off' and 'iommu=off' options again. I'll keep you posted. Thanks again, Saurabh. On Wed, 2 Jun 2004, Arthur Perry wrote: > Hi Saurabh, > > Thanks. It looks like you also have true GART errors as reported by hardware, on CPU0. > So our common failure mode here is actual GART errors and not something else being reported as a GART error because of erroneous kernel translation. > > It's possible that we are using a device driver somewhere that is misbehaving, which is using the GART or IOMMU improperly somehow, or my guess is that is may be the actual AGP device driver used by RedHat. > ie, they may have not patched in the most recent version that may contain a lot of fixes. > > Thanks for your feedback. > > As of making your messages go away, I would tell you to disable the GartTableWalk in MCE, but that does not seem to work on my machine. > I'll let you know what does work without turning off Northbridge MC* entirely once I discover it. > > -Arthur Perry > > > > On Wed, 2 Jun 2004, Saurabh Barve wrote: > > > Sorry about the delay in my reply. Just got in to work! > > Here is the output: > > > > > pcitweak -r 0:18:3 0x48 > > > > 0x0005001B > > > > > and > > > pcitweak -r 0:19:3 0x48 > > > > 0x00000000 > > > > > While you are at it, can you send us status high as well? > > > > > > pcitweak -r 0:18:3 0x4c > > > > 0xA4000000 > > > > > and > > > pcitweak -r 0:19:3 0x4c > > > > 0x00000000 > > > > I don't know if this would help, but below is a part of my cronwatch log: > > > > --------------------- Init Begin ------------------------ > > > > **Unmatched Entries** > > Trying to re-exec init > > Trying to re-exec init > > > > ---------------------- Init End ------------------------- > > > > > > --------------------- Kernel Begin ------------------------ > > > > > > WARNING: Kernel Errors Present > > uteval-0098: *** Error: Method executio...: 4Time(s) > > psparse-1121: *** Error: Method executio...: 8Time(s) > > Error uncorrected...: 538Time(s) > > GART error 11...: 538Time(s) > > Lost an northbridge error...: 538Time(s) > > NB error address 00000000...: 538Time(s) > > > > ---------------------- Kernel End ------------------------- > > > > > > --------------------- ModProbe Begin ------------------------ > > > > > > Can't locate these modules: > > char-major-10-134: 4 Time(s) > > sound-service-0-3: 6 Time(s) > > xp0: 3 Time(s) > > sound-slot-0: 6 Time(s) > > char-major-188: 15 Time(s) > > > > ---------------------- ModProbe End ------------------------- > > > > > > Thanks, > > Saurabh. > > > > -- > > =============================================================================== > > Saurabh Barve Phone: > > System Administrator/Data Specialist 970-491-7714 (voice) > > Montgomery Research Group, 970-491-8449 (Fax) > > Atmospheric Sciences Department, > > Fort Collins, Colorado > > Colorado State University > > > > Mail : sa at atmos.colostate.edu > > Web : http://fjortoft.atmos.colostate.edu/~sa > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo at vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > > -- ============================================================================= Saurabh Barve Phone: System Administrator/Data Specialist 970-491-7714 (voice) Montgomery Research Group, 970-491-8449 (Fax) Atmospheric Sciences Department, Fort Collins, Colorado Colorado State University Mail : sa at atmos.colostate.edu Web : http://fjortoft.atmos.colostate.edu/~sa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From sa at atmos.colostate.edu Tue Jul 20 21:42:52 2004 From: sa at atmos.colostate.edu (Saurabh Barve) Date: Tue, 20 Jul 2004 15:42:52 -0600 Subject: GART error Message-ID: <40FD91DC.9090803@atmos.colostate.edu> Hi, This is a follow up to my previous message. I have a dual opteron system with 8 GB of RAM. I am running RHEL 3.0 AS on it. The original kernel version was 2.4.21-4.ELsmp. The motherboard I am using is the Tyan Thunder K8S Pro - 2882 motherboard. I was getting the following error every two minutes or so: GART error 11 Lost an northbridge error NB error address some-hex-number Error uncorrected In the hope of resolving this problem, I recently upgraded my system to the 2.4.21-15.ELsmp #1 SMP kernel. However, I keep getting the following error: CPU 0: Silent Northbridge MCE Northbridge status a40000000005001b GART TLB error generic level generic extended error gart error link number 0 error address valid error uncorrected previous error lost error address 00000000fafe2950 With the upgraded kernel though, the error message does not appear on the terminal. There is, however, a little catch. The default login screen for the system is text-based (as opposed to a graphic login screen). When the text login screen is displayed, the error message appears continuously on the screen, similar to the previous error. However, the machine does not beep continuously like it used to earlier. But once the X windows session is started, the error messages no longer appear on the screen like they used to earlier. So, if I changed the login screen to a graphical screen, the message would get totally suppressed. However, the system logs these messages, so I need to get rid of them. I wanted to upgrade my kernel to 2.6.x version since I read that the new kernels have fixes for these kinds of GART errors. I read somewhere that this was a hardware error. However, I have checked my hardware, and I don't seem to have any bad RAM or anything. Also, does anybody know whether upgrading to the 2.6.x kernel would invalidate the license for RHEL that I bought from Red Hat? Thanks. Regards, Saurabh. -- ============================================================================= Saurabh Barve Phone: System Administrator/Data Specialist 970-491-7714 (voice) Montgomery Research Group, 970-491-8449 (Fax) Atmospheric Sciences Department, Fort Collins, Colorado Colorado State University Mail : sa at atmos.colostate.edu Web : http://fjortoft.atmos.colostate.edu/~sa From jeff.johnson at wsm.com Thu Jul 22 18:49:20 2004 From: jeff.johnson at wsm.com (Jeff Johnson) Date: Thu, 22 Jul 2004 11:49:20 -0700 Subject: heavy i/o wait and system hangs dual opteron and pci raid cards Message-ID: <41000C30.3040507@wsm.com> Greetings, Has anyone observed heavy i/o waits, sometimes leading to unrecoverable hangs, when using pci raid cards (3-Ware, Adaptec, LSI) on dual opteron motherboards? I am testing a dual O248 system with a 3-Ware 9500-12 as the boot device. I have seen 3Ware and other raid cards (Adaptec, LSI) show heavy i/o wait but in this case since / and swap are on the 3Ware the i/o wait eventually cripples the system since access to swap and binaries are hindered. I have tried 'noapic pci=noapic noacpi' for boot args but it doesn't seem to do much. If I run iozone on the box 'iozone -a -s 1g' I can cripple the box. Trying to login to vc2 fails since the system cannot access the login binary and the process cannot read passwd, shadow, etc. I am running a 2.4.21-15.ELsmp kernel. I have tried more recent ones but haven't seen any notable improvement. Any ideas, comments or suggestions are appreciated. Jeff -- Best Regards, Jeff Johnson Vice President Engineering/Technology Western Scientific, Inc jeff at wsm.com http://www.wsm.com 9445 Farnham Street - San Diego, CA 92123 Tel 800.443.6699 +001.858.565.6699 Fax +001.858.565.6938 "Rome did not create a great Empire by holding meetings. They did it by killing all those who opposed them."