High IO issue
jose nuno neto
jose.neto at liber4e.com
Wed Aug 31 15:40:27 UTC 2011
Hello
Last days we had some crash on redhat system related to high io, I dont
see a evident cause so perhaps someone on the list can give some pointers
where to look for.
We have several sybase database running, and database backup process
Seems there was a heavy io operations and sybase crashed
We have atop running, where we can see sybase using ram, and dying, also
on sar we see heavy io.
No swap has used I search the logs for oom kill's but didn't found any
The machine was very unresponsive when the issue occur, we have redhat
cluster running and cluster token got lost from this node during this
time.
I'd like to know if something killed the app, and how to fine tune
memory/disk access to prevent this
output from atop and sar follow,
thanks in advanced
28Ag
12:00:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
04:10:01 AM 1 656 6.36 7.08 4.59
04:20:01 AM 3 656 6.59 6.47 5.52
04:30:01 AM 3 658 6.03 6.37 5.96
04:40:01 AM 3 658 8.30 8.03 6.96
04:50:01 AM 3 665 9.74 8.11 7.42
05:00:01 AM 3 664 4.44 6.53 7.28
05:10:02 AM 2 664 6.45 6.85 6.99
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree
kbswpused %swpused kbswpcad
03:50:01 AM 6874984 17786660 72.12 303028 6609768 8193016
0 0.00 0
04:00:01 AM 6867236 17794408 72.15 309484 6610364 8193016
0 0.00 0
04:10:01 AM 3970096 20691548 83.90 358904 9419084 8193016
0 0.00 0
04:20:01 AM 3074460 21587184 87.53 369916 10284048 8193016
0 0.00 0
04:30:01 AM 1690900 22970744 93.14 381072 11656316 8193016
0 0.00 0
04:40:01 AM 711644 23950000 97.11 389804 12620860 8193016
0 0.00 0
04:50:01 AM 88152 24573492 99.64 229812 13402336 8192772
244 0.00 244
05:00:01 AM 76592 24585052 99.69 178196 13474908 8192772
244 0.00 0
05:10:02 AM 80212 24581432 99.67 174196 13468696 8192772
244 0.00 0
05:20:01 AM 6772944 17888700 72.54 74380 7027120 8192772
244 0.00 0
12:00:01 AM CPU %user %nice %system %iowait %steal
%idle
03:50:01 AM all 2.29 0.00 2.73 0.88 0.00
94.10
04:00:01 AM all 2.31 0.00 2.77 0.93 0.00
93.99
04:10:01 AM all 5.46 0.00 5.57 10.47 0.00
78.49
04:20:01 AM all 7.18 0.00 6.33 6.76 0.00
79.73
04:30:01 AM all 5.88 0.00 5.90 6.87 0.00
81.35
04:40:01 AM all 5.16 0.00 5.77 8.09 0.00
80.98
04:50:01 AM all 4.88 0.00 5.35 7.82 0.00
81.95
05:00:01 AM all 4.59 0.00 5.29 7.95 0.00
82.17
05:10:02 AM all 4.99 0.00 5.64 7.09 0.00
82.28
05:20:01 AM all 4.36 0.00 4.50 6.52 0.00
84.62
05:30:01 AM all 0.01 0.00 0.02 0.13 0.00
99.84
tps Total number of transfers per second that were issued to
physical devices. A transfer is an I/O request to a physical device.
rtps Total number of read requests per second issued to physical
devices.
wtps Total number of write requests per second issued to physical
devices.
bread/s Total amount of data read from the devices in blocks per
second. 512 bytes.
bwrtn/s Total amount of data written to devices in blocks per second.
12:00:01 AM tps rtps wtps bread/s bwrtn/s
1:10:01 AM 7996.72 321.85 7674.87 9120.26 187734.23
01:20:01 AM 7260.18 239.13 7021.05 7151.23 129573.36
01:30:01 AM 10541.89 333.87 10208.02 3724.24 260710.61
01:40:01 AM 8298.28 59.59 8238.69 949.20 200802.65
01:50:01 AM 12443.27 30.84 12412.43 141.97 310002.60
02:00:01 AM 10878.28 134.28 10744.00 3120.10 252048.75
02:10:01 AM 7191.81 245.43 6946.39 4632.28 155588.73
03:40:01 AM 9971.54 2.17 9969.37 9.80 194855.04
03:50:01 AM 9615.50 3.80 9611.70 16.28 195641.46
04:00:01 AM 10615.95 2.11 10613.84 9.04 201471.62
04:10:01 AM 10579.62 3619.13 6960.49 130869.08 143336.99
04:20:01 AM 10598.33 2381.38 8216.94 98763.15 176191.20
04:30:01 AM 11677.21 3535.63 8141.58 233305.68 174777.83
04:40:01 AM 10890.26 3150.32 7739.95 112731.87 170365.89
04:50:01 AM 12778.49 4054.31 8724.18 137983.84 184082.99
05:00:01 AM 13507.21 3830.58 9676.63 104143.93 203367.17
05:10:02 AM 12670.16 2266.25 10403.91 103696.19 210170.15
05:20:01 AM 5380199.36 6489408.45 6061020.12 6428917.69 3338784.18
05:30:01 AM 33.70 0.11 33.59 1.55 373.81
ATOP - dc2-x6270-m 2011/08/28
05:15:02 --x---
5m0s elapsed
PRC | sys 6m29s | user 5m58s | | #proc 525 |
#trun 4 | #tslpi 649 | #tslpu 1 | #zombie 0 | clones
5264 | | #exit 3749 |
CPU | sys 128% | user 120% | irq 4% | | idle
1968% | wait 179% | | steal 0% | guest 0% |
curf 1.71GHz | curscal 58% |
CPL | avg1 6.29 | avg5 7.04 | | avg15 7.07 |
| | csw 1976989 | intr 1392566 | |
| numcpu 24 |
MEM | tot 23.5G | free 78.8M | cache 12.8G | dirty 0.3M | buff
170.0M | | slab 242.7M | | |
| |
SWP | tot 7.8G | free 7.8G | | |
| | | | |
vmcom 7.2G | vmlim 14.6G |
PAG | scan 974496 | | stall 0 | |
| | | swin 0 | |
| swout 0 |
LVM | b_yorick_dat | busy 98% | read 162041 | write 390 |
KiB/r 17 | | KiB/w 3 | MBr/s 8.97 | MBw/s
0.00 | avq 6.69 | avio 1.81 ms |
LVM | -dc2-tier1-d | busy 90% | read 2 | write 315874 |
KiB/r 4 | | KiB/w 16 | MBr/s 0.00 | MBw/s
16.65 | avq 10.59 | avio 0.85 ms |
LVM | c2-tier1-dp2 | busy 82% | read 2 | write 200145 |
KiB/r 4 | | KiB/w 2 | MBr/s 0.00 | MBw/s
1.68 | avq 11.35 | avio 1.22 ms |
LVM | c1-tier1-bp2 | busy 56% | read 86300 | write 9191 |
KiB/r 16 | | KiB/w 13 | MBr/s 4.56 | MBw/s
0.41 | avq 8.12 | avio 1.75 ms |
LVM | -dc1-tier1-b | busy 56% | read 86300 | write 9191 |
KiB/r 16 | | KiB/w 13 | MBr/s 4.56 | MBw/s
0.41 | avq 8.12 | avio 1.75 ms |
MDD | md2 | busy 0% | read 16 | write 3115 |
KiB/r 5 | | KiB/w 4 | MBr/s 0.00 | MBw/s
0.04 | avq 0.00 | avio 0.00 ms |
DSK | sdp | busy 45% | read 2 | write 150283 |
KiB/r 4 | | KiB/w 16 | MBr/s 0.00 | MBw/s
8.26 | avq 9.92 | avio 0.90 ms |
DSK | sdf | busy 45% | read 0 | write 150323 |
KiB/r 0 | | KiB/w 17 | MBr/s 0.00 | MBw/s
8.39 | avq 9.47 | avio 0.90 ms |
DSK | sds | busy 28% | read 37748 | write 1237 |
KiB/r 19 | | KiB/w 42 | MBr/s 2.36 | MBw/s
0.17 | avq 5.44 | avio 2.18 ms |
NET | transport | tcpi 263485 | tcpo 109968 | udpi 1031 | udpo
679 | tcpao 13 | tcppo 2 | tcprs 10 | tcpie 0 |
tcpor 0 | udpip 0 |
NET | network | ipi 269371 | ipo 110682 | ipfrw 0 |
deliv 269302 | | | |
| icmpi 48 | icmpo 25 |
NET | eth1 27% | pcki 257856 | pcko 681449 | si 460 Kbps | so
27 Mbps | coll 0 | mlti 4 | erri 0 | erro 0 |
drpi 0 | drpo 0 |
NET | eth5 0% | pcki 15501 | pcko 0 | si 26 Kbps | so
0 Kbps | coll 0 | mlti 0 | erri 30598 | erro 0 |
drpi 0 | drpo 0 |
NET | eth3 0% | pcki 69250 | pcko 29053 | si 1354 Kbps | so
42 Kbps | coll 0 | mlti 59 | erri 0 | erro 0 |
drpi 0 | drpo 0 |
PID RUID EUID THR SYSCPU USRCPU
VGROW RGROW RDDSK WRDSK ST EXC S
CPUNR CPU CMD 1/95
5362 sybase sybase 7 3m25s 94.35s
0K 0K 924.9M 119.3M -- - R
8 100% dataserver
6172 sybase sybase 7 2m39s 2m09s
0K 0K 248K 474.2M -- - S
1 96% dataserver
13489 sybase sybase 1 1.92s 1m45s
0K 0K 0K 0K -- - D
19 36% sybmultbuf
6970 sybase sybase 7 17.89s 27.88s
0K 23164K 26K 26340K -- - S
22 15% dataserver
3215 root root 1 2.36s 0.00s
0K 0K 0K 0K -- - S
9 1% kmirrord
13490 sybase sybase 1 0.97s 0.43s
-70.0M -56K 1.7G 0K -- - S
7 0% sybmultbuf
4861 sybase sybase 7 0.81s 0.44s
0K 0K 0K 54K -- - S
4 0% dataserver
3256 root root 1 0.88s 0.00s
0K 0K 0K 0K -- - S
9 0% kmirrord
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the redhat-list
mailing list