Options to stop processes that can't be killed -9 other than reboot

sunhux G sunhux at gmail.com
Tue Sep 13 14:01:51 UTC 2011


Hi Trever,


It just happened again on another Linux media server
& looks like it's a "D" (uninterruptible sleep) :


# ps axfu | grep 7892
root     17369  0.0  0.0  5024  640 pts/2    S+   20:39   0:00
             \_ grep 7892
root      7892  0.0  0.1  6316 2808 ?        Ds   Sep12   0:13
/opt/omni/lbin/vbda -bmaname HP:Ultrium 4-SCSI_3 -type 2 -start
1315817521 -level 0 -access 1 0 -protection 2 2332800 -load 1.000000
-name hostname.ss.de [/] -ma hostname.ss.de 22000 -id 1315817438
-volume / -profile -trees / -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
hostname.ss.de:/ // hostname.ss.de [/] -no_aligned
root     15604  0.0  0.0  5480  560 ?        D    15:05   0:00 lsof -a -p 7892
root     15636  0.0  0.0  5400  560 ?        D    15:10   0:00 lsof -a -p 7892


# lsof -a -p 7892
< above command just pauses/hangs there;  it's been over 2 hrs already >

# strace -p7892
Process 7892 attached - interrupt to quit
 < it pauses there & Ctrl-C did not yield any response >
(have to do 'pkill -9 strace' to exit it)


# kill -9 7892
(& it's still there as shown below: )

# ps axfu |grep bma | grep omni |grep 7892
root      7892  0.0  0.1  6316 2808 ?        Ds   Sep12   0:13
/opt/omni/lbin/vbda -bmaname HP:Ultrium 4-SCSI_3 -type 2 -start
1315817521 -level 0 -access 1 0 -protection 2 2332800 -load 1.000000
-name hostname.ss.de [/] -ma hostname.ss.de 22000 -id 1315817438
-volume / -profile -trees / -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy  2 -no_nthlink -archattr -share_info -objname 02
hostname.ss.de:/ // hostname.ss.de [/] -no_aligned


There's several 'Closed-wait' sessions which had been there for hours:
 I also login to the remote server cellmgsvr (Win 2003 server) & issue
"netstat -ano" to search for pid of those sessions' pids (eg 3215, 3453,
 2578) but none were there :

# lsof -i :5555
COMMAND   PID USER   FD   TYPE  DEVICE SIZE NODE NAME
vbda     1235 root    0u  IPv4 1824346       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:4033 (ESTABLISHED)
vbda     1235 root    1u  IPv4 1824346       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:4033 (ESTABLISHED)
vbda     3998 root    0u  IPv4 1942430       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3366 (ESTABLISHED)
vbda     3998 root    1u  IPv4 1942430       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3366 (ESTABLISHED)
vbda     4757 root    0u  IPv4 1832833       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2798 (ESTABLISHED)
vbda     4757 root    1u  IPv4 1832833       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2798 (ESTABLISHED)
vbda     7892 root    0u  IPv4 1950188       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda     7892 root    1u  IPv4 1950188       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda     9475 root    0u  IPv4 1955789       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3215 (CLOSE_WAIT)
vbda     9475 root    1u  IPv4 1955789       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3215 (CLOSE_WAIT)
vbda    10177 root    0u  IPv4 1956998       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3453 (CLOSE_WAIT)
vbda    10177 root    1u  IPv4 1956998       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:3453 (CLOSE_WAIT)
fsbrda  14948 root    0u  IPv4 1969734       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2578 (CLOSE_WAIT)
fsbrda  14948 root    1u  IPv4 1969734       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2578 (CLOSE_WAIT)
xinetd  15500 root    5u  IPv4 1971072       TCP *:omni (LISTEN)
#
#
# lsof -i :2356
COMMAND  PID USER   FD   TYPE  DEVICE SIZE NODE NAME
vbda    7892 root    0u  IPv4 1950188       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda    7892 root    1u  IPv4 1950188       TCP
hostname.ss.de:omni->cellsvmgr.ss.de:2356 (ESTABLISHED)




More information about the redhat-list mailing list