Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

unix syzadmin unixsyzadmin at gmail.com
Tue Mar 13 13:36:05 UTC 2012


Thanks.
I have downloaded and installed the OpenManage from Dell.
The following commands say if the health of system components is OK.
omreport chassis - health of all main components of the system chassis
omreport chassis processors - cpu health
omreport chassis memory - memory health
omreport chassis pwrsupplies - power supply health
omreport storage controller - raid controller health

However this leaves out the integrated NIC ports and the HBA adapters.
What linux / dell open manage commands can be used to confirm if those are
healthy as well?

Thanks,


On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader at linuxscope.com> wrote:

> On 3/12/12 5:28 PM, unix syzadmin wrote:
>
>> Hi,
>>
>> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
>> We want to be able to catch any hardware issues when they occur to act on
>> them as quickly as possible.
>>
>> My understanding is that all hardware events/issues/errors are logged in
>> /var/log/mcelog (Machine Check Events log).  Is this correct?  Can't
>> stress
>> this enough; does it log all hardware issues
>> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
>>
>> Thanks,
>>
>
> I've used MCElog to catch some CPU events but I think you might want to
> check out Dell's OpenManage client.  It will report/monitor a lot more
> information.
>
>
> http://linux.dell.com/wiki/**index.php/Repository/OMSA<http://linux.dell.com/wiki/index.php/Repository/OMSA>
>
>
> To install:
>
> # wget -q -O - http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> # yum install srvadmin-base
> # yum install srvadmin-storageservices
>
> (logout / login for environment variables to take effect)
>
> # /opt/dell/srvadmin/sbin/**srvadmin-services.sh  start
> ...
>
> # omreport chassis
> Health
>
> Main System Chassis
>
> SEVERITY : COMPONENT
> Ok       : Fans
> Ok       : Intrusion
> Ok       : Memory
> Ok       : Power Supplies
> Ok       : Processors
> Ok       : Temperatures
> Ok       : Voltages
> Ok       : Hardware Log
> Ok       : Batteries
>
> # omreport chassis temps
> Temperature Probes Information
>
> ------------------------------**------
> Main System Chassis Temperatures: Ok
> ------------------------------**------
>
> Index                     : 0
> Status                    : Ok
> Probe Name                : System Board Ambient Temp
> Reading                   : 20.0 C
> Minimum Warning Threshold : 8.0 C
> Maximum Warning Threshold : 42.0 C
> Minimum Failure Threshold : 3.0 C
> Maximum Failure Threshold : 47.0 C
>
> # omreport storage pdisk controller=0
>
> List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
>
> Controller SAS 6/iR Integrated (Embedded)
> ID                        : 0:0:0
> Status                    : Ok
> Name                      : Physical Disk 0:0:0
> State                     : Online
> Failure Predicted         : No
> Certified                 : Not Applicable
> Encryption Capable        : No
> Secured                   : Not Applicable
> Progress                  : Not Applicable
> Bus Protocol              : SAS
> Media                     : HDD
> Capacity                  : 67.75 GB (72746008576 bytes)
> Used RAID Disk Space      : 67.75 GB (72746008576 bytes)
> Available RAID Disk Space : 0.00 GB (0 bytes)
> Hot Spare                 : No
> Vendor ID                 : DELL
> Product ID                : ST973402SS
> Revision                  : S229
>
> <snip>
>
> You get the idea.
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@**redhat.com<redhat-list-request at redhat.com>
> ?subject=unsubscribe
> https://www.redhat.com/**mailman/listinfo/redhat-list<https://www.redhat.com/mailman/listinfo/redhat-list>
>



More information about the redhat-list mailing list