High IO issue

jose nuno neto jose.neto at liber4e.com
Wed Aug 31 15:40:27 UTC 2011


Hello

Last days we had some crash on redhat system related to high io, I dont
see a evident cause so perhaps someone on the list can give some pointers
where to look for.


We have several sybase database running, and database backup process
Seems there was a heavy io operations and sybase crashed
We have atop running, where we can see sybase using ram, and dying, also
on sar we see heavy io.
No swap has used I search the logs for oom kill's but didn't found any

The machine was very unresponsive when the issue occur, we have redhat
cluster running and cluster token got lost from this node during this
time.

I'd like to know if something killed the app, and how to fine tune
memory/disk access to prevent this

output from atop and sar follow,

thanks in advanced

28Ag

12:00:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
04:10:01 AM         1       656      6.36      7.08      4.59
04:20:01 AM         3       656      6.59      6.47      5.52
04:30:01 AM         3       658      6.03      6.37      5.96
04:40:01 AM         3       658      8.30      8.03      6.96
04:50:01 AM         3       665      9.74      8.11      7.42
05:00:01 AM         3       664      4.44      6.53      7.28
05:10:02 AM         2       664      6.45      6.85      6.99


12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree
kbswpused  %swpused  kbswpcad
03:50:01 AM   6874984  17786660     72.12    303028   6609768   8193016   
     0      0.00         0
04:00:01 AM   6867236  17794408     72.15    309484   6610364   8193016   
     0      0.00         0
04:10:01 AM   3970096  20691548     83.90    358904   9419084   8193016   
     0      0.00         0
04:20:01 AM   3074460  21587184     87.53    369916  10284048   8193016   
     0      0.00         0
04:30:01 AM   1690900  22970744     93.14    381072  11656316   8193016   
     0      0.00         0
04:40:01 AM    711644  23950000     97.11    389804  12620860   8193016   
     0      0.00         0
04:50:01 AM     88152  24573492     99.64    229812  13402336   8192772   
   244      0.00       244
05:00:01 AM     76592  24585052     99.69    178196  13474908   8192772   
   244      0.00         0
05:10:02 AM     80212  24581432     99.67    174196  13468696   8192772   
   244      0.00         0
05:20:01 AM   6772944  17888700     72.54     74380   7027120   8192772   
   244      0.00         0

12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal   
 %idle
03:50:01 AM       all      2.29      0.00      2.73      0.88      0.00   
 94.10
04:00:01 AM       all      2.31      0.00      2.77      0.93      0.00   
 93.99
04:10:01 AM       all      5.46      0.00      5.57     10.47      0.00   
 78.49
04:20:01 AM       all      7.18      0.00      6.33      6.76      0.00   
 79.73
04:30:01 AM       all      5.88      0.00      5.90      6.87      0.00   
 81.35
04:40:01 AM       all      5.16      0.00      5.77      8.09      0.00   
 80.98
04:50:01 AM       all      4.88      0.00      5.35      7.82      0.00   
 81.95
05:00:01 AM       all      4.59      0.00      5.29      7.95      0.00   
 82.17
05:10:02 AM       all      4.99      0.00      5.64      7.09      0.00   
 82.28
05:20:01 AM       all      4.36      0.00      4.50      6.52      0.00   
 84.62
05:30:01 AM       all      0.01      0.00      0.02      0.13      0.00   
 99.84

tps         Total number of transfers per second that were issued to
physical devices.  A transfer is an I/O request to a physical device.
rtps        Total number of read requests per second issued to physical
devices.
wtps        Total number of write requests per second issued to physical
devices.
bread/s     Total amount of data read from the devices in blocks per
second. 512 bytes.
bwrtn/s     Total amount of data written to devices in blocks per second.


12:00:01 AM       tps      rtps      wtps   bread/s   bwrtn/s
1:10:01 AM   7996.72    321.85   7674.87   9120.26 187734.23
01:20:01 AM   7260.18    239.13   7021.05   7151.23 129573.36
01:30:01 AM  10541.89    333.87  10208.02   3724.24 260710.61
01:40:01 AM   8298.28     59.59   8238.69    949.20 200802.65
01:50:01 AM  12443.27     30.84  12412.43    141.97 310002.60
02:00:01 AM  10878.28    134.28  10744.00   3120.10 252048.75
02:10:01 AM   7191.81    245.43   6946.39   4632.28 155588.73
03:40:01 AM   9971.54      2.17   9969.37      9.80 194855.04
03:50:01 AM   9615.50      3.80   9611.70     16.28 195641.46
04:00:01 AM  10615.95      2.11  10613.84      9.04 201471.62
04:10:01 AM  10579.62   3619.13   6960.49 130869.08 143336.99
04:20:01 AM  10598.33   2381.38   8216.94  98763.15 176191.20
04:30:01 AM  11677.21   3535.63   8141.58 233305.68 174777.83
04:40:01 AM  10890.26   3150.32   7739.95 112731.87 170365.89
04:50:01 AM  12778.49   4054.31   8724.18 137983.84 184082.99
05:00:01 AM  13507.21   3830.58   9676.63 104143.93 203367.17
05:10:02 AM  12670.16   2266.25  10403.91 103696.19 210170.15
05:20:01 AM 5380199.36 6489408.45 6061020.12 6428917.69 3338784.18
05:30:01 AM     33.70      0.11     33.59      1.55    373.81


ATOP - dc2-x6270-m                                         2011/08/28 
05:15:02                                         --x---                   
                      5m0s elapsed
PRC | sys    6m29s  | user   5m58s  |               | #proc    525  |
#trun      4  | #tslpi   649 |  #tslpu     1 |  #zombie    0 |  clones 
5264 |               |  #exit   3749 |
CPU | sys     128%  | user    120%  | irq       4%  |               | idle
  1968%  | wait    179% |               |  steal     0% |  guest     0% | 
curf 1.71GHz |  curscal  58% |
CPL | avg1    6.29  | avg5    7.04  |               | avg15   7.07  |     
         |              |  csw  1976989 |  intr 1392566 |               | 
             |  numcpu    24 |
MEM | tot    23.5G  | free   78.8M  | cache  12.8G  | dirty   0.3M  | buff
 170.0M  |              |  slab  242.7M |               |               | 
             |               |
SWP | tot     7.8G  | free    7.8G  |               |               |     
         |              |               |               |               | 
vmcom   7.2G |  vmlim  14.6G |
PAG | scan  974496  |               | stall      0  |               |     
         |              |               |  swin       0 |               | 
             |  swout      0 |
LVM | b_yorick_dat  | busy     98%  | read  162041  | write    390  |
KiB/r     17  |              |  KiB/w      3 |  MBr/s   8.97 |  MBw/s  
0.00 |  avq     6.69 |  avio 1.81 ms |
LVM | -dc2-tier1-d  | busy     90%  | read       2  | write 315874  |
KiB/r      4  |              |  KiB/w     16 |  MBr/s   0.00 |  MBw/s 
16.65 |  avq    10.59 |  avio 0.85 ms |
LVM | c2-tier1-dp2  | busy     82%  | read       2  | write 200145  |
KiB/r      4  |              |  KiB/w      2 |  MBr/s   0.00 |  MBw/s  
1.68 |  avq    11.35 |  avio 1.22 ms |
LVM | c1-tier1-bp2  | busy     56%  | read   86300  | write   9191  |
KiB/r     16  |              |  KiB/w     13 |  MBr/s   4.56 |  MBw/s  
0.41 |  avq     8.12 |  avio 1.75 ms |
LVM | -dc1-tier1-b  | busy     56%  | read   86300  | write   9191  |
KiB/r     16  |              |  KiB/w     13 |  MBr/s   4.56 |  MBw/s  
0.41 |  avq     8.12 |  avio 1.75 ms |
MDD |          md2  | busy      0%  | read      16  | write   3115  |
KiB/r      5  |              |  KiB/w      4 |  MBr/s   0.00 |  MBw/s  
0.04 |  avq     0.00 |  avio 0.00 ms |
DSK |          sdp  | busy     45%  | read       2  | write 150283  |
KiB/r      4  |              |  KiB/w     16 |  MBr/s   0.00 |  MBw/s  
8.26 |  avq     9.92 |  avio 0.90 ms |
DSK |          sdf  | busy     45%  | read       0  | write 150323  |
KiB/r      0  |              |  KiB/w     17 |  MBr/s   0.00 |  MBw/s  
8.39 |  avq     9.47 |  avio 0.90 ms |
DSK |          sds  | busy     28%  | read   37748  | write   1237  |
KiB/r     19  |              |  KiB/w     42 |  MBr/s   2.36 |  MBw/s  
0.17 |  avq     5.44 |  avio 2.18 ms |
NET | transport     | tcpi  263485  | tcpo  109968  | udpi    1031  | udpo
    679  | tcpao     13 |  tcppo      2 |  tcprs     10 |  tcpie      0 | 
tcpor      0 |  udpip      0 |
NET | network       | ipi   269371  | ipo   110682  | ipfrw      0  |
deliv 269302  |              |               |               |            
  |  icmpi     48 |  icmpo     25 |
NET | eth1     27%  | pcki  257856  | pcko  681449  | si  460 Kbps  | so  
27 Mbps  | coll       0 |  mlti       4 |  erri       0 |  erro       0 | 
drpi       0 |  drpo       0 |
NET | eth5      0%  | pcki   15501  | pcko       0  | si   26 Kbps  | so  
 0 Kbps  | coll       0 |  mlti       0 |  erri   30598 |  erro       0 | 
drpi       0 |  drpo       0 |
NET | eth3      0%  | pcki   69250  | pcko   29053  | si 1354 Kbps  | so  
42 Kbps  | coll       0 |  mlti      59 |  erri       0 |  erro       0 | 
drpi       0 |  drpo       0 |

  PID     RUID          EUID           THR       SYSCPU       USRCPU      
VGROW       RGROW      RDDSK       WRDSK      ST      EXC      S     
CPUNR       CPU     CMD        1/95
 5362     sybase        sybase           7        3m25s       94.35s      
   0K          0K     924.9M      119.3M      --        -      R         
8      100%     dataserver
 6172     sybase        sybase           7        2m39s        2m09s      
   0K          0K       248K      474.2M      --        -      S         
1       96%     dataserver
13489     sybase        sybase           1        1.92s        1m45s      
   0K          0K         0K          0K      --        -      D        
19       36%     sybmultbuf
 6970     sybase        sybase           7       17.89s       27.88s      
   0K      23164K        26K      26340K      --        -      S        
22       15%     dataserver
 3215     root          root             1        2.36s        0.00s      
   0K          0K         0K          0K      --        -      S         
9        1%     kmirrord
13490     sybase        sybase           1        0.97s        0.43s     
-70.0M        -56K       1.7G          0K      --        -      S         
7        0%     sybmultbuf
 4861     sybase        sybase           7        0.81s        0.44s      
   0K          0K         0K         54K      --        -      S         
4        0%     dataserver
 3256     root          root             1        0.88s        0.00s      
   0K          0K         0K          0K      --        -      S         
9        0%     kmirrord


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the redhat-list mailing list