RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!

Georgios Magklaras georgios at biotek.uio.no
Thu Jun 21 08:04:38 UTC 2012


On 06/18/2012 08:44 AM, raj sourabh wrote:
> Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP:
> [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0:
> transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link
> status definitely down for interface eth0, disabling it Jun 10 19:22:34
> prddbs02 kernel: bonding: bond0: making interface eth2 the new active one.
> Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun
Before the soft lockup, what exactly caused the the NETDEV WATCHDOG 
loose eth0?
For the __smp_call_function_many lockup, there were many fixes between 
5.5 and 5.6 in relation to multipath and other third party drivers
that caused similar lookups. (why are you on 5.5 and not at least 5.6, 
which kernel are you running on)?

Best regards,

-- 
-- 
George Magklaras PhD
RHCE no: 805008309135525

Senior Systems Engineer/IT Manager
Biotechnology Center of Oslo and
the Norwegian Center for Molecular Medicine
EMBnet TMPC Chair

http://folk.uio.no/georgios



> 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
> [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46
> prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU)
> oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler
> rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs
> power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
> acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev
> sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45
> dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log
> dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih
> mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
> ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd
> Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP:
> 0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
> Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46
> prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
> RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10
> 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI:
> 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000
> R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02
> kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16
> Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14:
> ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: FS:
> 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 Jun
...
> Thanks for any help in advance :)
>
> Regards,
> Raj





More information about the redhat-list mailing list