PATCH: SSL.SysCallError fix for plague-0.5.0

Dan Williams dcbw at redhat.com
Tue Oct 31 16:45:56 UTC 2006


On Fri, 2006-10-27 at 00:32 -0400, Joe Todaro wrote:
> 
> Hi, 
> 
> Has anyone ever seen this error before in their *plague-0.5.0* build
> environment?   It surfaced last week shortly after we started
> stress-testing our buildsystem.   In fact, there were three such
> errors in all, which I will post separately to avoid any confusion.
> This is one of three.   It was triggered when we requested status
> about a job we killed before it actually got handed-off to archjobs. 
> 
> ====== THE ERROR ======- 
> Request to enqueue 'stacker' tag 'stacker-1_3-5' for target
> 'oc-rhel4-dev' (user 'jtodaro at pok.ibm.com') 
> 66 (stacker): Starting tag 'stacker-1_3-5' on target 'oc-rhel4-dev' 
> 66 (stacker): Requesting depsolve... 
> 66 (stacker): Starting depsolve for arches: ['i686']. 
> 66 (stacker): Finished depsolve (successful), requesting archjobs. 
> 66 (stacker/i686): https://lnxbuild1.pok.ibm.com.:8888 - UID is
> 9adf56cdd15bfae2388966b08837250d3bf6772c 
> ---------------------------------------- 
> Exception happened during processing of request from ('10.63.82.73',
> 49136) 
> Traceback (most recent call last): 
>   File "/usr/lib64/python2.3/SocketServer.py", line 463, in
> process_request_thread 
>     self.finish_request(request, client_address) 
>   File "/usr/lib64/python2.3/SocketServer.py", line 254, in
> finish_request 
>     self.RequestHandlerClass(request, client_address, self) 
>   File "/usr/lib64/python2.3/SocketServer.py", line 521, in __init__ 
>     self.handle() 
>   File "/usr/lib64/python2.3/BaseHTTPServer.py", line 324, in handle 
>     self.handle_one_request() 
>   File "/usr/lib64/python2.3/BaseHTTPServer.py", line 307, in
> handle_one_request 
>     self.raw_requestline = self.rfile.readline() 
>   File "/usr/lib64/python2.3/socket.py", line 338, in readline 
>     data = self._sock.recv(self._rbufsize) 
>   File "/usr/lib/python2.3/site-packages/plague/SSLConnection.py",
> line 142, in recv 
>     return con.recv(bufsize, flags) 
> SysCallError: (-1, 'Unexpected EOF') 
> ---------------------------------------- 
> 
> ====== OUR FIX ======  
> We added lines 147-148 to the *recv* method of the
> */usr/lib/python2.3/site-packages/plague/SSLConnection.py* module.
> Here's the patch: 
> 
> 
> So, can someone please review the above fix.. We want to make sure it
> won't come back to *bite* us later on / or possibly evn be *masking* a
> larger problem.   Thank you. 

This one makes me a bit nervous.  The SSL stuff is pretty fragile, since
SSL in general adds yet another protocol layer on top of everything
that's subject to more handshakes and state over just TCP/IP.

The traceback here shouldn't really have an effect, since it just
terminates the current thread, and plague's state machine is built to be
resilient to dropped and dead connection threads.  I'd like to hide the
traceback (or at least just print a one-line message) but that's not
possible since plague code isn't anywhere in the traceback and therefore
would require more subclassing.

Furthermore, it technically is an error (that the other side closed the
socket prematurely or something broke the connection) but one that we
should ignore and retry, which plague will do.

However, if this fix seems to work OK for you for a while, I'd be
interested in revisiting the issue.

Dan

> -Joe 
> --
> Fedora-buildsys-list mailing list
> Fedora-buildsys-list at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-buildsys-list




More information about the Fedora-buildsys-list mailing list