Question regarding pthread_cancel and pthread_cond_timedwait
Eric Bruno
eric at ebruno.org
Fri Mar 25 07:03:36 UTC 2005
We have a threading library which has been in production for
six years and currently functions
on Solaris 2.6-2.9 Sparc, Solaris 2.7-2.10 x86, HP-UX 11.00,
Tru64 5.1(a,b), AIX 4.3.x and AIX 5.x.
The library starts up within the current process 5-8 threads,
the operation runs to completion (with or without error), the
threads complete or are canceled and then complete depending on
what happened during processing.
At some latter time this repeated N times without the main process
exiting. The threads are NOT detached.
The problem occurs on Fedora Core 3 if thread has exited exited and
pthread_cancel is called with a thread id of a thread which has completed.
If thread has exited and we call pthread_cancel with that thread id on
Fedora Core 3
( version info
getconf GNU_LIBPTHREAD_VERSION
NPTL 2.3.4
>uname -a
Linux irl-73-26 2.6.10-1.770_FC3 #1 Thu Feb 24 14:00:06 EST 2005 i686
i686 i386 GNU/Linux
)
the application segfaults. Is this the expected behavior?
I am also getting a segfault when pthread_cond_timedwait is called, I
still determining the
exact state when the segfault occurred. The back trace shows
#0 0x005c57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00839dbc in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/tls/libpthread.so.0
The directory listing shows:
ls -l /lib/tls/
total 1936
drwxr-xr-x 2 root root 4096 Mar 23 04:03 i486
drwxr-xr-x 2 root root 4096 Mar 23 04:03 i586
drwxr-xr-x 2 root root 4096 Mar 23 04:03 i686
-rwxr-xr-x 1 root root 1524828 Dec 21 02:04 libc-2.3.4.so
lrwxrwxrwx 1 root root 13 Mar 22 18:42 libc.so.6 -> libc-2.3.4.so
-rwxr-xr-x 1 root root 215272 Dec 21 02:04 libm-2.3.4.so
lrwxrwxrwx 1 root root 13 Mar 22 18:42 libm.so.6 -> libm-2.3.4.so
-rwxr-xr-x 1 root root 108560 Dec 21 02:04 libpthread-2.3.4.so
lrwxrwxrwx 1 root root 19 Mar 22 18:42 libpthread.so.0 ->
libpthread-2.3.4.so
-rwxr-xr-x 1 root root 50984 Dec 21 02:04 librt-2.3.4.so
lrwxrwxrwx 1 root root 14 Mar 22 18:42 librt.so.1 -> librt-2.3.4.so
-rwxr-xr-x 1 root root 32308 Dec 21 02:04 libthread_db-1.0.so
lrwxrwxrwx 1 root root 19 Mar 22 18:42 libthread_db.so.1 ->
libthread_db-1.0.so
Is this what NPTL on Fedora Core 3 does TODAY? or is there a problem
in the sequence of releasing mutex's or condition variables that would
cause this behavior in our code on Fedora Core 3.
We maintain internal thread exit status so I can skip cancelling the
threads which have succesfully exited. We normally just cancel
everything we started just
as a big hammer to make sure every thread shuts down and exits. We can
make the abort function a bit smarter since it has access to our
internal thread status if need be.
On the OS's I mentioned above 0 is returned on success, on failure:
On HP-UX 11.00 pthread_cancel returns the value ERSCH, errno is NOT set.
On Solaris SPARC and x86 same as HP-UX 11.00
AIX same as HP-UX an Solaris.
On Tru64 pthread_cancel returns EINVAL or ESRCH, errno is not set.
Eric Bruno.
More information about the fedora-list
mailing list