Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

ajay raghuraj ajay.raghuraj at gmail.com
Wed Mar 14 01:55:10 UTC 2012


Also if you are using RHEL 5 / 6, you might want to check under /sys or use
the systool comnand to extract enough info about the HBAs

- Ajay
 On Mar 14, 2012 9:51 AM, "ajay raghuraj" <ajay.raghuraj at gmail.com> wrote:

> Hi
>
> Try ethtool or peek into /proc to check the network  interface stats. You
> could use the HBA vendor's software installed on the OS to support the
> hardware.
>
> Example: If the HBA were from emulex then you could check the hbaanywhere
> cli commands to run a healthcheck
>
> - Ajay
> On Mar 13, 2012 9:39 PM, "unix syzadmin" <unixsyzadmin at gmail.com> wrote:
> >
> > Thanks.
> > I have downloaded and installed the OpenManage from Dell.
> > The following commands say if the health of system components is OK.
> > omreport chassis - health of all main components of the system chassis
> > omreport chassis processors - cpu health
> > omreport chassis memory - memory health
> > omreport chassis pwrsupplies - power supply health
> > omreport storage controller - raid controller health
> >
> > However this leaves out the integrated NIC ports and the HBA adapters.
> > What linux / dell open manage commands can be used to confirm if those
> are
> > healthy as well?
> >
> > Thanks,
> >
> >
> > On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader at linuxscope.com>
> wrote:
> >
> > > On 3/12/12 5:28 PM, unix syzadmin wrote:
> > >
> > >> Hi,
> > >>
> > >> We run redhat linux on intel hardware (mostly Dell, lately dell
> R710s).
> > >> We want to be able to catch any hardware issues when they occur to
> act on
> > >> them as quickly as possible.
> > >>
> > >> My understanding is that all hardware events/issues/errors are logged
> in
> > >> /var/log/mcelog (Machine Check Events log).  Is this correct?  Can't
> > >> stress
> > >> this enough; does it log all hardware issues
> > >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> > >>
> > >> Thanks,
> > >>
> > >
> > > I've used MCElog to catch some CPU events but I think you might want to
> > > check out Dell's OpenManage client.  It will report/monitor a lot more
> > > information.
> > >
> > >
> > > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
> http://linux.dell.com/wiki/index.php/Repository/OMSA>
> > >
> > >
> > > To install:
> > >
> > > # wget -q -O -
> http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
> http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > > # yum install srvadmin-base
> > > # yum install srvadmin-storageservices
> > >
> > > (logout / login for environment variables to take effect)
> > >
> > > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh  start
> > > ...
> > >
> > > # omreport chassis
> > > Health
> > >
> > > Main System Chassis
> > >
> > > SEVERITY : COMPONENT
> > > Ok       : Fans
> > > Ok       : Intrusion
> > > Ok       : Memory
> > > Ok       : Power Supplies
> > > Ok       : Processors
> > > Ok       : Temperatures
> > > Ok       : Voltages
> > > Ok       : Hardware Log
> > > Ok       : Batteries
> > >
> > > # omreport chassis temps
> > > Temperature Probes Information
> > >
> > > ------------------------------**------
> > > Main System Chassis Temperatures: Ok
> > > ------------------------------**------
> > >
> > > Index                     : 0
> > > Status                    : Ok
> > > Probe Name                : System Board Ambient Temp
> > > Reading                   : 20.0 C
> > > Minimum Warning Threshold : 8.0 C
> > > Maximum Warning Threshold : 42.0 C
> > > Minimum Failure Threshold : 3.0 C
> > > Maximum Failure Threshold : 47.0 C
> > >
> > > # omreport storage pdisk controller=0
> > >
> > > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> > >
> > > Controller SAS 6/iR Integrated (Embedded)
> > > ID                        : 0:0:0
> > > Status                    : Ok
> > > Name                      : Physical Disk 0:0:0
> > > State                     : Online
> > > Failure Predicted         : No
> > > Certified                 : Not Applicable
> > > Encryption Capable        : No
> > > Secured                   : Not Applicable
> > > Progress                  : Not Applicable
> > > Bus Protocol              : SAS
> > > Media                     : HDD
> > > Capacity                  : 67.75 GB (72746008576 bytes)
> > > Used RAID Disk Space      : 67.75 GB (72746008576 bytes)
> > > Available RAID Disk Space : 0.00 GB (0 bytes)
> > > Hot Spare                 : No
> > > Vendor ID                 : DELL
> > > Product ID                : ST973402SS
> > > Revision                  : S229
> > >
> > > <snip>
> > >
> > > You get the idea.
> > >
> > > --
> > > redhat-list mailing list
> > > unsubscribe mailto:redhat-list-request@**redhat.com<
> redhat-list-request at redhat.com>
> > > ?subject=unsubscribe
> > > https://www.redhat.com/**mailman/listinfo/redhat-list<
> https://www.redhat.com/mailman/listinfo/redhat-list>
> > >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> > https://www.redhat.com/mailman/listinfo/redhat-list
>



More information about the redhat-list mailing list