From grendel at caudium.net Tue Jul 20 12:29:47 2004 From: grendel at caudium.net (Marek Habersack) Date: Tue, 20 Jul 2004 14:29:47 +0200 Subject: Userspace module return value? Message-ID: <20040720122947.GA5516@beowulf.thanes.org> Hello list, I have a problem with a userspace module I have written for Tux. Tux is configured to forward all the connections to that module which, in turn, decides whether the visitor is authorized to access the server and, if yes, how to handle the connection - whether to have tux send a static file or the backend handle a dynamic resource. Everything works fine except for one "detail" - once every few requests, the following happens: Jul 20 06:28:47 localhost kernel: Possibly unexpected TUX-thread exit(0) at c0117da3? Jul 20 06:28:47 localhost kernel: TUX: thread 0 stopping ... Jul 20 06:28:47 localhost kernel: TUX: thread 0 stopped. I have determined that this happens when I return a wrong value from the handle_events function. The problem is, I have no idea what is "wrong" or "good" value to return from the function? The first version of the module was returning whatever it got from the tux() syscall, but the above problem was appearing even then. It seems to happen only after the _last_ resource pertaining to a single request is sent (e.g. I am serving a static HTML page with several images, when the last image request is satisfied, the module exits in the way shown in the log excerpt above). It's definitely a case of an incorrect return value from the handle_events function - but I can't find any documentation on what the return value means in this case. I have found a solution which takes care of the problem, but feels like a hack - I'm returning TUX_RETURN_USERSPACE_REQUEST every time except when closing the connection, when I return the value returned from tux(TUX_ACTION_FINISH_CLOSE_REQ, req). That seems to work so far, but is it the right solution? Could anybody, please, shed some light on the handle_events return value? thanks, marek -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From mingo at elte.hu Tue Jul 20 13:09:25 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 20 Jul 2004 15:09:25 +0200 Subject: Userspace module return value? In-Reply-To: <20040720122947.GA5516@beowulf.thanes.org> References: <20040720122947.GA5516@beowulf.thanes.org> Message-ID: <20040720130925.GA32198@elte.hu> * Marek Habersack wrote: > [...] I have found a solution which takes care of the problem, but > feels like a hack - I'm returning TUX_RETURN_USERSPACE_REQUEST every > time except when closing the connection, when I return the value > returned from tux(TUX_ACTION_FINISH_CLOSE_REQ, req). That seems to > work so far, but is it the right solution? Could anybody, please, shed > some light on the handle_events return value? you should return whatever the tux() call returns. Normally tux() would give you a value of 0 (TUX_RETURN_USERSPACE_REQUEST), but occasionally it could return TUX_RETURN_SIGNAL (when a signal is caught) or TUX_RETURN_EXIT (when Tux is being shut down). These are the only values that are returned currently. To debug this, could you log all tux() return values that are not 0? To do this just start up the Tux daemon manually without the 'daemon' prefix. The simplest way is to modify /etc/rc.d/init.d/tux and remove the 'daemon' word. This means 'service tux start' will 'hang' after startup but you'll get all printfs to that console. You can then abort the daemon via Ctrl-C and 'service tux stop'. Ingo From grendel at caudium.net Tue Jul 20 13:43:36 2004 From: grendel at caudium.net (Marek Habersack) Date: Tue, 20 Jul 2004 15:43:36 +0200 Subject: Userspace module return value? In-Reply-To: <20040720130925.GA32198@elte.hu> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> Message-ID: <20040720134336.GB5516@beowulf.thanes.org> On Tue, Jul 20, 2004 at 03:09:25PM +0200, Ingo Molnar scribbled: > > * Marek Habersack wrote: > > > [...] I have found a solution which takes care of the problem, but > > feels like a hack - I'm returning TUX_RETURN_USERSPACE_REQUEST every > > time except when closing the connection, when I return the value > > returned from tux(TUX_ACTION_FINISH_CLOSE_REQ, req). That seems to > > work so far, but is it the right solution? Could anybody, please, shed > > some light on the handle_events return value? > > you should return whatever the tux() call returns. Normally tux() would > give you a value of 0 (TUX_RETURN_USERSPACE_REQUEST), but occasionally > it could return TUX_RETURN_SIGNAL (when a signal is caught) or > TUX_RETURN_EXIT (when Tux is being shut down). These are the only values > that are returned currently. > > To debug this, could you log all tux() return values that are not 0? To > do this just start up the Tux daemon manually without the 'daemon' > prefix. The simplest way is to modify /etc/rc.d/init.d/tux and remove > the 'daemon' word. This means 'service tux start' will 'hang' after > startup but you'll get all printfs to that console. You can then abort > the daemon via Ctrl-C and 'service tux stop'. I am using syslog to debug the module, here's what gets output when I return only the values received from tux() (this is the whole request for the default Debian apache index.html - note that the graphics files referenced from the index.html do not exist in the document root): Jul 20 15:28:23 localhost pch_mod[1406]: req->objectname == index.html Jul 20 15:28:23 localhost pch_mod[1406]: Validating session Jul 20 15:28:23 localhost pch_mod[1406]: session_is_valid Jul 20 15:28:23 localhost pch_mod[1406]: Static request, getting the object (index.html) Jul 20 15:28:23 localhost pch_mod[1406]: Sending the object (index.html) Jul 20 15:28:23 localhost pch_mod[1406]: normal: closing the connection Jul 20 15:28:23 localhost pch_mod[1406]: req->objectname == icons/jhe061.png Jul 20 15:28:23 localhost pch_mod[1406]: Validating session Jul 20 15:28:23 localhost pch_mod[1406]: session_is_valid Jul 20 15:28:23 localhost pch_mod[1406]: Static request, getting the object (icons/jhe061.png) Jul 20 15:28:23 localhost pch_mod[1406]: content_type: req->objectname == icons/jhe061.png Jul 20 15:28:23 localhost pch_mod[1406]: ext == png Jul 20 15:28:23 localhost pch_mod[1406]: abort: closing the connection Jul 20 15:28:23 localhost pch_mod[1406]: req->objectname == icons/apache_pb.png Jul 20 15:28:23 localhost pch_mod[1406]: Validating session Jul 20 15:28:23 localhost pch_mod[1406]: session_is_validicons/debian/openlogo-25.jpg Jul 20 15:28:23 localhost pch_mod[1406]: Validating session Jul 20 15:28:23 localhost pch_mod[1406]: session_is_valid Jul 20 15:28:23 localhost pch_mod[1406]: Static request, getting the object (icons/debian/openlogo-25.jpg) Jul 20 15:28:23 localhost pch_mod[1406]: content_type: req->objectname == icons/apache_pb.png Jul 20 15:28:23 localhost pch_mod[1406]: ext == png Jul 20 15:28:23 localhost pch_mod[1406]: abort: closing the connection Jul 20 15:28:23 localhost pch_mod[1406]: Sending the object (icons/debian/openlogo-25.jpg) Jul 20 15:28:23 localhost pch_mod[1406]: tux() returned -1 Jul 20 17:28:23 localhost kernel: Possibly unexpected TUX-thread exit(0) at c0117da3? Jul 20 17:28:23 localhost kernel: TUX: thread 0 stopping ... Jul 20 17:28:23 localhost kernel: TUX: thread 0 stopped. Everything works fine until I do several quick Shift-Reloads from the browser. You can see above that the requests are then overlapped - the apache_pb.png and openlogo-25.jpg requests are interlaced. The latter file should never cause the 'Sending the object (icons/debian/openlogo-25.jpg)' message to appear, since tux returns an error for it. Here's the code that handles the situation: if (rval == REQ_STATIC) { req->event = 1; do_syslog("Static request, getting the object (%s)", req->objectname); rval = tux(TUX_ACTION_GET_OBJECT, req); if (rval < 0 || req->error) { req->event = 2; if (content_type(req) == CONTENT_NOTIFY) return send_failure(req, LOG_ERR_OBJECT_NOT_FOUND); goto abort; } return rval; } abort: do_syslog("abort: closing the connection"); return tux(TUX_ACTION_FINISH_CLOSE_REQ, req); So, what seems to be happening is that there is a race condition somewhere so that the req->priv data gets munged and the module code gets confused. Am I supposed to do locking anywhere? Note that the above does not happen with using my "hack" described in the previous mail. The test server is running inside vmware, one thread, the 2.4.26-ow1 kernel with tux patch 2.4.23-A3 rediffed for the kernel and 3.2.16 tux userland. regards marek p.s. send_failure is not called in the case of .jpg or .png, it is called for html/html/xhtml/xhtm and uses TUX_ACTION_SEND_BUFFER to send the message returning the value received from tux(). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From mingo at elte.hu Tue Jul 20 14:00:21 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 20 Jul 2004 16:00:21 +0200 Subject: Userspace module return value? In-Reply-To: <20040720134336.GB5516@beowulf.thanes.org> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> <20040720134336.GB5516@beowulf.thanes.org> Message-ID: <20040720140021.GB1267@elte.hu> * Marek Habersack wrote: > if (rval == REQ_STATIC) { > req->event = 1; > do_syslog("Static request, getting the object (%s)", req->objectname); > rval = tux(TUX_ACTION_GET_OBJECT, req); > if (rval < 0 || req->error) { > req->event = 2; > if (content_type(req) == CONTENT_NOTIFY) > return send_failure(req, LOG_ERR_OBJECT_NOT_FOUND); > > goto abort; > } > return rval; > } This code doesnt handle events properly. When tux() returns there might be another request active (with a different ->priv value) - you need to return so that your event loop can be re-called with the proper request pointer. the req->event code can be used to distinguish between the various phases a particular request is in. (you can also track your request's state via the ->priv pointer) demo2.c shows a 3-phase request. (for simplicity the demo code uses write() but a truly atomic module should use TUX_ACTION_SEND_BUFFER to write to the socket. A write(), if the send buffers are set to be small on your system, might block your thread and hence all requests might be blocked by the remote client.) it all looks a bit complex but that is how event-based programming is ... Ingo From grendel at caudium.net Tue Jul 20 14:19:01 2004 From: grendel at caudium.net (Marek Habersack) Date: Tue, 20 Jul 2004 16:19:01 +0200 Subject: Userspace module return value? In-Reply-To: <20040720140021.GB1267@elte.hu> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> <20040720134336.GB5516@beowulf.thanes.org> <20040720140021.GB1267@elte.hu> Message-ID: <20040720141901.GC5516@beowulf.thanes.org> On Tue, Jul 20, 2004 at 04:00:21PM +0200, Ingo Molnar scribbled: > > * Marek Habersack wrote: > > > if (rval == REQ_STATIC) { > > req->event = 1; > > do_syslog("Static request, getting the object (%s)", req->objectname); > > rval = tux(TUX_ACTION_GET_OBJECT, req); > > if (rval < 0 || req->error) { > > req->event = 2; > > if (content_type(req) == CONTENT_NOTIFY) > > return send_failure(req, LOG_ERR_OBJECT_NOT_FOUND); > > > > goto abort; > > } > > return rval; > > } > > This code doesnt handle events properly. When tux() returns there might > be another request active (with a different ->priv value) - you need to > return so that your event loop can be re-called with the proper request > pointer. OK, I see the problem now. Above, I'm calling tux(TUX_ACTION_GET_OBJECT, req) and if it fails I immediately call either tux(TUX_ACTION_SEND_BUFFER, req) or tux(TUX_ACTION_FINISH_CLOSE_REQ, req) (in the 'abort; label) - instead I should return immediately after the TUX_ACTION_GET_OBJECT call fails and send the buffer or close the connection only the next time handle_events is called. Did I get it right? thanks, marek -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From mingo at elte.hu Tue Jul 20 14:50:28 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 20 Jul 2004 16:50:28 +0200 Subject: Userspace module return value? In-Reply-To: <20040720141901.GC5516@beowulf.thanes.org> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> <20040720134336.GB5516@beowulf.thanes.org> <20040720140021.GB1267@elte.hu> <20040720141901.GC5516@beowulf.thanes.org> Message-ID: <20040720145028.GB3610@elte.hu> * Marek Habersack wrote: > > > if (rval == REQ_STATIC) { > > > req->event = 1; > > > do_syslog("Static request, getting the object (%s)", req->objectname); > > > rval = tux(TUX_ACTION_GET_OBJECT, req); > > > if (rval < 0 || req->error) { > > > req->event = 2; > > > if (content_type(req) == CONTENT_NOTIFY) > > > return send_failure(req, LOG_ERR_OBJECT_NOT_FOUND); > > > > > > goto abort; > > > } > > > return rval; > > > } > > > > This code doesnt handle events properly. When tux() returns there might > > be another request active (with a different ->priv value) - you need to > > return so that your event loop can be re-called with the proper request > > pointer. > > OK, I see the problem now. Above, I'm calling > tux(TUX_ACTION_GET_OBJECT, req) and if it fails I immediately call > either tux(TUX_ACTION_SEND_BUFFER, req) or > tux(TUX_ACTION_FINISH_CLOSE_REQ, req) (in the 'abort; label) - instead > I should return immediately after the TUX_ACTION_GET_OBJECT call fails > and send the buffer or close the connection only the next time > handle_events is called. Did I get it right? almost, with the following qualifications: the tux() call cannot 'fail'. req->error after a tux() call might be for another request, in a completely different state - that is not a result of the tux() call you just did. So you should always return after doing a tux() call, and let Tux restart your event loop. (alternatively you could also loop yourself and only return if you see a non-TUX_RETURN_USERSPACE_REQUEST return code from the tux() call - but it's more readable to just return - the daemon will re-call your module's event loop immediately. Check out tux.c of the Tux userspace source code.) generally i'd suggest to shape your event loop like the demo modules do: switch(req->event), and try to return as early as possible when doing a tux() call. Most demo modules do a 'return tux(...);' to finish an event and drive the state-machine forward. Ingo From grendel at caudium.net Tue Jul 20 15:07:11 2004 From: grendel at caudium.net (Marek Habersack) Date: Tue, 20 Jul 2004 17:07:11 +0200 Subject: Userspace module return value? In-Reply-To: <20040720145028.GB3610@elte.hu> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> <20040720134336.GB5516@beowulf.thanes.org> <20040720140021.GB1267@elte.hu> <20040720141901.GC5516@beowulf.thanes.org> <20040720145028.GB3610@elte.hu> Message-ID: <20040720150711.GD5516@beowulf.thanes.org> On Tue, Jul 20, 2004 at 04:50:28PM +0200, Ingo Molnar scribbled: > > * Marek Habersack wrote: > > > > > if (rval == REQ_STATIC) { > > > > req->event = 1; > > > > do_syslog("Static request, getting the object (%s)", req->objectname); > > > > rval = tux(TUX_ACTION_GET_OBJECT, req); > > > > if (rval < 0 || req->error) { > > > > req->event = 2; > > > > if (content_type(req) == CONTENT_NOTIFY) > > > > return send_failure(req, LOG_ERR_OBJECT_NOT_FOUND); > > > > > > > > goto abort; > > > > } > > > > return rval; > > > > } > > > > > > This code doesnt handle events properly. When tux() returns there might > > > be another request active (with a different ->priv value) - you need to > > > return so that your event loop can be re-called with the proper request > > > pointer. > > > > OK, I see the problem now. Above, I'm calling > > tux(TUX_ACTION_GET_OBJECT, req) and if it fails I immediately call > > either tux(TUX_ACTION_SEND_BUFFER, req) or > > tux(TUX_ACTION_FINISH_CLOSE_REQ, req) (in the 'abort; label) - instead > > I should return immediately after the TUX_ACTION_GET_OBJECT call fails > > and send the buffer or close the connection only the next time > > handle_events is called. Did I get it right? > > almost, with the following qualifications: the tux() call cannot 'fail'. > > req->error after a tux() call might be for another request, in a > completely different state - that is not a result of the tux() call you > just did. So you should always return after doing a tux() call, and let > Tux restart your event loop. I see, one more question though: Jul 20 16:45:47 localhost pch_mod[1750]: tux() returned -1 Jul 20 18:45:47 localhost kernel: Possibly unexpected TUX-thread exit(0) at c0117da3? Jul 20 18:45:47 localhost kernel: TUX: thread 0 stopping ... Jul 20 18:45:47 localhost kernel: TUX: thread 0 stopped. What should I do when the tux() call itself returns -1? Do I return TUX_RETURN_USERSPACE_REQUEST myself then since returning -1 seems to shut the thread down? thanks, marek -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From mingo at elte.hu Tue Jul 20 19:04:36 2004 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 20 Jul 2004 21:04:36 +0200 Subject: Userspace module return value? In-Reply-To: <20040720150711.GD5516@beowulf.thanes.org> References: <20040720122947.GA5516@beowulf.thanes.org> <20040720130925.GA32198@elte.hu> <20040720134336.GB5516@beowulf.thanes.org> <20040720140021.GB1267@elte.hu> <20040720141901.GC5516@beowulf.thanes.org> <20040720145028.GB3610@elte.hu> <20040720150711.GD5516@beowulf.thanes.org> Message-ID: <20040720190436.GA7957@elte.hu> * Marek Habersack wrote: > Jul 20 16:45:47 localhost pch_mod[1750]: tux() returned -1 > Jul 20 18:45:47 localhost kernel: Possibly unexpected TUX-thread exit(0) at c0117da3? > Jul 20 18:45:47 localhost kernel: TUX: thread 0 stopping ... > Jul 20 18:45:47 localhost kernel: TUX: thread 0 stopped. > > What should I do when the tux() call itself returns -1? Do I return > TUX_RETURN_USERSPACE_REQUEST myself then since returning -1 seems to > shut the thread down? if it returns -1 then that means that somehow the syscall was illegal - e.g. a READ_OBJECT is done without first doing a successful GET_OBJECT. (in this sense tux() indeed 'fails') could you also print out errno? -1 is a generic 'system call failed' value, errno will have the (per thread) value of the real reason. (to see the precise reason for failure you'd have to compile with CONFIG_TUX_DEBUG and enable Dprintk in /proc/sys/net/tux/, and look at the large logs that get produced.) Ingo From mcd at daviesinc.com Wed Jul 28 15:03:20 2004 From: mcd at daviesinc.com (Chris Davies) Date: Wed, 28 Jul 2004 11:03:20 -0400 Subject: Tux for 2.6.7 (or 2.6.8) Message-ID: <1091027000.23836.42.camel@mcdlp.pbi.daviesinc.com> I notice on http://people.redhat.com/mingo/TUX-patches/ the latest 2.6 kernel supported is 2.6.5 I tried applying the patches to 2.6.7, but they don't apply cleanly. When 2.6.8 is released, will there be an updated patch file made available? Also, are there any 3.0 docs available or are we limited to RTFS? I've seen a few new options show up in tux's /proc that look intriguing. The latest docs I can find are for 2.2 here: http://www.redhat.com/docs/manuals/tux/ Maybe I'm looking in the wrong place. From williama_lovaton at coomeva.com.co Wed Jul 28 16:13:32 2004 From: williama_lovaton at coomeva.com.co (William Lovaton) Date: 28 Jul 2004 11:13:32 -0500 Subject: Bug 125091 updated Message-ID: <1091031212.32380.25.camel@localhost.localdomain> Hi Ingo, I updated this bug a few days ago but it seems bugzilla didn't send any email. http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125091#c22 There is a new log too. Regards, -William From mcd at daviesinc.com Fri Jul 30 14:49:02 2004 From: mcd at daviesinc.com (Chris Davies) Date: Fri, 30 Jul 2004 10:49:02 -0400 Subject: Documentation error Message-ID: <1091198942.1614.30.camel@mcdlp.pbi.daviesinc.com> on page: http://www.redhat.com/docs/manuals/tux/TUX-2.2-Manual/virtual- hosting.html string_host_tail The strip_host_tail tunable strips off hostname components, starting at the end of the hostname. If the value is set to 0, this tunable is disabled. If the value is set to 1: http://www.some.site.com/a.html => $DOCROOT/some.site/a.html If the value is set to 2: http://www.some.site.com/a.html => $DOCROOT/site/a.html and so on... ------------------ However, looking at the code and doing some testing, it appears that the example for value set to 2 is incorrect. In reality what I am seeing is: If the value is set to 2: http://www.some.site.com/a.html => $DOCROOT/some/a.html however, in my current situation, the documentation would have done what I wanted. :) Also, the header says string_host_tail, but the description mentions strip_host_tail. The header should say strip_host_tail.