From Tortore.Duilio at irts-lr.net Sun Jan 4 02:02:29 2009 From: Tortore.Duilio at irts-lr.net (Qureshi.Zahir) Date: Sun, 04 Jan 2009 02:02:29 +0000 Subject: Prescription free! Message-ID: <57e901c96e10$1b683f11$c026c77b@[123.199.38.192]> what is the differences? Vixen Inhabit Appeareth Grandam Recompense Appeareth Lucina Endanger Vixen Inhabit Term Recompense Appeareth Cracked Inhabit Appeareth Lucina Inhabit Sack all the solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From Adams.Matt at hypashots.com Sun Jan 4 02:11:26 2009 From: Adams.Matt at hypashots.com (Ilkka.Jari) Date: Sun, 04 Jan 2009 02:11:26 +0000 Subject: Britney Spears Favorite L.A. Hotel Message-ID: <692501c96e11$1eea9c1e$c311694f@not-defined-pppoe.amur.ru> which one is better than other Varied Information Adventure Graciously Ruins Adventure Lowest Expositor Varied Information Thornier Ruins Adventure Curiously Information Adventure Lowest Information Solyman read about it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From correo.comercial at telefonica.net Mon Jan 5 20:29:13 2009 From: correo.comercial at telefonica.net (Correo Comercial) Date: Mon, 5 Jan 2009 21:29:13 +0100 Subject: Reg =?iso-8859-1?q?=E1?= late un caprichito... Message-ID: <495FDCCF000D062D@ctsmtpout2.frontal.correo> (added by postmaster@telefonica.net) PUBLI Publicidad Adistech Europe, S.L. *Por la compra de dos unidades o m?s, precios especiales!!! Cons?ltanos al 93 481 4162. Adistech Europe, S.L. adistech.europesl at gmail.com PD: Para cualquier consulta, puedes ponerte en contacto con nuestro equipo al tel. (+34) 93 481 4162 Si deseas darte de baja de nuestras listas de distribuciones, por favor pulsa aqu? (poniendo en el asunto la palabra "baja"). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Oferta LCD.jpg Type: image/jpeg Size: 84537 bytes Desc: not available URL: From amandinha at lives.com Tue Jan 6 01:35:36 2009 From: amandinha at lives.com (amandinha at lives.com) Date: Mon, 5 Jan 2009 23:35:36 -0200 Subject: oi Message-ID: <20090106013531.7380820000B8@manticoke.hst.terra.com.br> An HTML attachment was scrubbed... URL: From confirm-s2-qnvruz3ybnpkja4bmab5sdqoh2zmayts-utrace-devel=redhat.com at yahoogrupos.com.br Tue Jan 6 19:55:46 2009 From: confirm-s2-qnvruz3ybnpkja4bmab5sdqoh2zmayts-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos) Date: 6 Jan 2009 19:55:46 -0000 Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo de_amigo_para_amigo Message-ID: <1231271746.17.82765.w114@yahoogrupos.com.br> Ol? utrace-devel at redhat.com, Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo do Yahoo! Grupos, um servi?o de comunidades online gratuito e super f?cil de usar. Este pedido expirar? em 7 dias. PARA ENTRAR NESTE GRUPO: 1) V? para o site do Yahoo! Grupos clicando neste link: http://br.groups.yahoo.com/i?i=qnvruz3ybnpkja4bmab5sdqoh2zmayts&e=utrace-devel%40redhat%2Ecom (Se n?o funcionar, use os comandos para cortar e colar o link acima na barra de endere?o do seu navegador.) -OU- 2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar", no seu programa de e-mail. Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo de_amigo_para_amigo, por favor, ignore esta mensagem. Sauda??es, Atendimento ao usu?rio do Yahoo! Grupos O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html From confirm-s2-xax5jes2z4w41ohf4lfq2iqvyol5xkky-utrace-devel=redhat.com at yahoogrupos.com.br Tue Jan 6 19:56:20 2009 From: confirm-s2-xax5jes2z4w41ohf4lfq2iqvyol5xkky-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos) Date: 6 Jan 2009 19:56:20 -0000 Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo de_amigo_para_amigo Message-ID: <1231271780.22.29196.w107@yahoogrupos.com.br> Ol? utrace-devel at redhat.com, Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo do Yahoo! Grupos, um servi?o de comunidades online gratuito e super f?cil de usar. Este pedido expirar? em 7 dias. PARA ENTRAR NESTE GRUPO: 1) V? para o site do Yahoo! Grupos clicando neste link: http://br.groups.yahoo.com/i?i=xax5jes2z4w41ohf4lfq2iqvyol5xkky&e=utrace-devel%40redhat%2Ecom (Se n?o funcionar, use os comandos para cortar e colar o link acima na barra de endere?o do seu navegador.) -OU- 2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar", no seu programa de e-mail. Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo de_amigo_para_amigo, por favor, ignore esta mensagem. Sauda??es, Atendimento ao usu?rio do Yahoo! Grupos O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html From jkenisto at us.ibm.com Tue Jan 6 22:23:09 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Tue, 06 Jan 2009 14:23:09 -0800 Subject: newly created engine immediately notified of exec already in progress In-Reply-To: <20081217092122.55879FC3D1@magilla.sf.frob.com> References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com> <20081217092122.55879FC3D1@magilla.sf.frob.com> Message-ID: <1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com> On Wed, 2008-12-17 at 01:21 -0800, Roland McGrath wrote: > > The current implementation is that if I create a new engine in response > > to an exec (when called from some other engine's report_exec callback), > > and set that engine's flags to be notified of execs, the new engine gets > > notified of the exec that's already underway. This turns out to be > > rather inconvenient for uprobes, but is it counterintuitive? > > To clarify, this is not specific to exec. Every kind of event callback > constitutes what I call a "reporting pass", and they all behave the same. > A normal reporting pass is the loop across all engines, in which interested > ones get first a report_quiesce(eventbit) and then a report_event(). > A resume reporting pass is the similar loop where engines get either just > report_quiesce(0) or just report_signal(). > > The question is what happens in the current reporting pass when a callback > attaches a new engine to current and sets its event mask to include the > event that elicited this reporting pass. > > The current behavior is that the new engine goes immediately on the end of > the list of engines to get callbacks, so the reporting pass already in > progress will later get to all the new engines before it's done. > > The alternative behavior would be that any new engines attached after a > reporting pass has begun will not be included in that pass. They will be > included in the next reporting pass of any kind. A side effect is that if > there was not going to be any other report before returning to user mode, > there will be a resume reporting pass (that the new engine will see). > That is the same effect of utrace_control(UTRACE_REPORT) being done when > the utrace_attach_task() is done. > > Originally I had thought of the current behavior as being desireably > consistent with the fact that an engine's report_quiesce(eventbit) callback > can use utrace_set_events() on that same engine to enable/disable the > immediately following report_event() callback in the very same step of the > same reporting pass. > > But another way to look at it is that any utrace_attach_task() call from > any other task behaves this (alternative) way. That is, if some reporting > pass has already begun, the new engine is not included, but a UTRACE_REPORT > is done instead to get the new engine fully signed on "soon". So it would > be simply consistent for any attach made during a reporting pass > (synchronously or asynchronously) not to take effect during that same pass. > > I was musing about adding a UTRACE_ATTACH_* flag bit to let you select the > behavior. But that seems overly fiddly for no good reason. > > So I don't mind changing this as Jim prefers. The actual change is simple, > just remove the "splice_attaching" case from utrace_attach_task. Yes, I'd prefer that you make the requested change, if you haven't already. Just before I went on vacation (about when you posted this), I coded a tentative fix to uprobes to work with the existing utrace behavior. It's about a 250-line patch, and I haven't tested it yet. It'd be nice if I could drop that. > > Jim, can you look through the kerneldoc comments and the Documentation/ > files and cite any places where the description of this behavior now needs > to be corrected or explained more clearly and explicitly? 1. On the "Events and Callbacks" page, paragraph 3 says: "When a thread has an event, each engine gets a callback if it has set the event flag for that event type." Either here or at the end of that page, you could add something like: [If you implement the requested behavior...] In response to an event, one engine's callback may create a new engine for the same task. This new engine will not be notified of the event already in progress, even if you immediately set its event flag for that type of event. [If not...] In response to an event of type UTRACE_EVENT(x), one engine's callback may create a new engine for the same task. This new engine will be appended to that task's list of engines; and if you set its event flag for UTRACE_EVENT(x), it will be notified in turn of the event already in progress. 2. Something similar could be added to the description of utrace_set_events(). > > > Thanks, > Roland Thanks. Jim From Chen.Yihua at jolieseins.com Wed Jan 7 15:44:51 2009 From: Chen.Yihua at jolieseins.com (Marcel.Fabio) Date: Wed, 07 Jan 2009 15:44:51 +0000 Subject: Hit or Miss: Around the Globe Message-ID: <041201c970de$1496a76a$1685dd55@c133-22.icpnet.pl> which is better and why? Vagrom Incremental Agrees Gallop Resides Agrees La Exaction Vagrom Incremental Tevil Resides Agrees Comparisonhad Incremental Agrees La Incremental Stinkingly we sale it -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mehdi.Kamran at ibn.ten.lt Wed Jan 7 15:47:10 2009 From: Mehdi.Kamran at ibn.ten.lt (Alvarez.Lorena) Date: Wed, 07 Jan 2009 15:47:10 +0000 Subject: Be able to perform! Message-ID: <0cae01c970df$0753681e$3639db5a@5adb3936.bb.sky.com> don't just buy, compare! Vouches Imprisond Attachd Guiltiness Recently Attachd Lenity Experimental Vouches Imprisond Toiling Recently Attachd Charitable Imprisond Attachd Lenity Imprisond Spread ordering page -------------- next part -------------- An HTML attachment was scrubbed... URL: From Faria.Alex at hotblowjobs.sensualwriter.com Wed Jan 7 15:48:50 2009 From: Faria.Alex at hotblowjobs.sensualwriter.com (Chang.Justin) Date: Wed, 07 Jan 2009 15:48:50 +0000 Subject: 9 Reasons Xxoozero Sucks Message-ID: <3bbd01c970df$13ccce27$418c505c@[92.80.140.65]> what is your favorite? Virulent Inventory Amounts Glance Retaind Amounts Lioness Embassage Virulent Inventory Trimming Retaind Amounts Conveniently Inventory Amounts Lioness Inventory Sorer official website -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tomic.Dejan at ifkmariestad.com Wed Jan 7 15:49:35 2009 From: Tomic.Dejan at ifkmariestad.com (Khan.Talib) Date: Wed, 07 Jan 2009 15:49:35 +0000 Subject: Sexual health and fitness booster! Message-ID: <74b501c970df$308da710$d5cad359@[89.211.202.213]> which one is cheaper? Vicomte Ignored Auroras Grosser Remonstrance Auroras Lionshath Excrement Vicomte Ignored Till Remonstrance Auroras Cheeks Ignored Auroras Lionshath Ignored Strokes the comparison is here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Moxley.Kevin at helpdesk.77millionbookdeal.com Wed Jan 7 15:49:46 2009 From: Moxley.Kevin at helpdesk.77millionbookdeal.com (Vaughn.William) Date: Wed, 07 Jan 2009 15:49:46 +0000 Subject: Angelina Jolie's Pants-Splitting Premiere Message-ID: <12f501c970df$0eca6330$ef8652c3@[195.82.134.239]> who is the best? Vigitant Imperceiverant Antipholus Gaoler Rebukes Antipholus Loath Excepted Vigitant Imperceiverant Thanks Rebukes Antipholus Consolate Imperceiverant Antipholus Loath Imperceiverant Seduced here it is -------------- next part -------------- An HTML attachment was scrubbed... URL: From Baldock.Craig at i-love-de-kaulits-twins-4-ever.expertpagina.nl Wed Jan 7 15:24:51 2009 From: Baldock.Craig at i-love-de-kaulits-twins-4-ever.expertpagina.nl (Vitale.Amy) Date: Wed, 07 Jan 2009 15:24:51 +0000 Subject: Do it like you want!!! Message-ID: <699101c970dc$0d8a9880$1a081bbe@adsl190-027000026.dyn.etb.net.co> just choose and it's on! Valley Immoderate Arkharovs Greatness Removing Arkharovs Lords Earls Valley Immoderate Thatthat Removing Arkharovs Curbd Immoderate Arkharovs Lords Immoderate Sleeps they are all here -------------- next part -------------- An HTML attachment was scrubbed... URL: From David.Cleer at herbalbiz.net Wed Jan 7 15:54:09 2009 From: David.Cleer at herbalbiz.net (Honey.Honey) Date: Wed, 07 Jan 2009 15:54:09 +0000 Subject: Simply the best! Message-ID: <0f4901c970e0$01dde81e$99b00cc4@[196.12.176.153]> which one is better Verses Indies Alabaster Glad Rascally Alabaster Loyalst Escalus Verses Indies Tookst Rascally Alabaster Covert Indies Alabaster Loyalst Indies Strayd get to it -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwiatkowski.Dariusz at heatlaminators.com Wed Jan 7 15:54:30 2009 From: Kwiatkowski.Dariusz at heatlaminators.com (Kaez.David) Date: Wed, 07 Jan 2009 15:54:30 +0000 Subject: Better orgasm now! Message-ID: <3be101c970e0$13103724$3c121553@eeg60.neoplus.adsl.tpnet.pl> who prefer what and why? Varld Indiscretion Angel Good Removedbear Angel Lovejuice Enfold Varld Indiscretion Tune Removedbear Angel Confident Indiscretion Angel Lovejuice Indiscretion Sirrah read about it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivanaev.Slava at hynesqrs.com Wed Jan 7 15:53:24 2009 From: Ivanaev.Slava at hynesqrs.com (Rodrigues.Pedro) Date: Wed, 07 Jan 2009 15:53:24 +0000 Subject: You deserve it! Message-ID: <601301c970e0$1db2fc0b$763ef55c@h92-245-62-118.bashtel.ru> what is the best for you Videlicit Imperfection Athwart Greatly Recoverys Athwart Love Extemporal Videlicit Imperfection Trouts Recoverys Athwart Churchmen Imperfection Athwart Love Imperfection Social there is only one ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kimsey.Greg at icon-associates.com Wed Jan 7 15:58:02 2009 From: Kimsey.Greg at icon-associates.com (Nikolova.Juliana) Date: Wed, 07 Jan 2009 15:58:02 +0000 Subject: Gives you the sexual power and pleasure you demand! Message-ID: <69b301c970e0$01e557a3$8e06093a@ppp-58-9-6-142.revip2.asianet.co.th> which one is best for you Vainly Intolerable Ambassador Gosling Rattles Ambassador Liable Each Vainly Intolerable Tells Rattles Ambassador Calumny Intolerable Ambassador Liable Intolerable Sovereignty official website -------------- next part -------------- An HTML attachment was scrubbed... URL: From Vojnovic.Goran at jhmba.com Wed Jan 7 16:01:19 2009 From: Vojnovic.Goran at jhmba.com (Heeswiek.Eric) Date: Wed, 07 Jan 2009 16:01:19 +0000 Subject: Johnny Depp Talks About Daughter's Illness Message-ID: <185401c970e1$00cb4346$4a9bba4f@aefz74.neoplus.adsl.tpnet.pl> each one is better than other Verses Indebted Afore Governd Reproof Afore Lawless Ears Verses Indebted Temple Reproof Afore Costlier Indebted Afore Lawless Indebted Selffigured we sale it -------------- next part -------------- An HTML attachment was scrubbed... URL: From Reinbothe.Marcus at kadiya.org Wed Jan 7 15:24:09 2009 From: Reinbothe.Marcus at kadiya.org (Garcia.Andres) Date: Wed, 07 Jan 2009 15:24:09 +0000 Subject: Ellen Cancels NY Tapings Message-ID: <1d8301c970db$0ffcbf8e$45d354be@Dynamic-IP-1908421169.cable.net.co> select your preferee Vileness Insinuation Arabian Gaily Raught Arabian Libertines Eaning Vileness Insinuation Tablesport Raught Arabian Continued Insinuation Arabian Libertines Insinuation Simular compare it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marquetti.Paulo at i-statelines.net Wed Jan 7 16:04:15 2009 From: Marquetti.Paulo at i-statelines.net (Wratahski.Jaidyah) Date: Wed, 07 Jan 2009 16:04:15 +0000 Subject: Pranks and Falls at the American Music Awards Message-ID: <2fe901c970e1$22221849$d65c403a@[58.64.92.214]> leading brand? Veras Impaired Approvers Greensickness Remembergive Approvers Leanfaced Enjoind Veras Impaired Toyshop Remembergive Approvers Childbed Impaired Approvers Leanfaced Impaired Sorcerers get to it -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hague.Thomas at israel-yacht.com Wed Jan 7 16:05:12 2009 From: Hague.Thomas at israel-yacht.com (Finch.Tricia) Date: Wed, 07 Jan 2009 16:05:12 +0000 Subject: 9 Reasons Xxoozero Sucks Message-ID: <5bed01c970e1$1f012245$50acfdbe@[190.253.172.80]> choose your solution Venom Injurious Am Greasy Riches Am Licentious Extremity Venom Injurious Trouble Riches Am Conjectures Injurious Am Licentious Injurious Snuffbox read about it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cuyvers.Ann at jsams.com Wed Jan 7 16:07:16 2009 From: Cuyvers.Ann at jsams.com (Fennell.Randy) Date: Wed, 07 Jan 2009 16:07:16 +0000 Subject: Those pills are something! Message-ID: <708c01c970e2$16537c06$70b31c5e@node-179-112.domolink.tula.net> what is the best for you Vicomtes Inquisitive Accidental Griffin Repaid Accidental Lustrous Employer Vicomtes Inquisitive Thereof Repaid Accidental Cleopatras Inquisitive Accidental Lustrous Inquisitive Slightly answer: see here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dolinski.Mateusz at journal-auto.com Wed Jan 7 16:07:23 2009 From: Dolinski.Mateusz at journal-auto.com (Blue.Velvet) Date: Wed, 07 Jan 2009 16:07:23 +0000 Subject: Nicole and Joel Donate Gifts Message-ID: <52f501c970e2$068da7a4$43a01a55@[85.26.160.67]> what brand is the leader Virgins Inadvertently Amazingly Goodness Router Amazingly Liquors Exeunt Virgins Inadvertently Transparent Router Amazingly Congratulate Inadvertently Amazingly Liquors Inadvertently Sports all the answers -------------- next part -------------- An HTML attachment was scrubbed... URL: From Aiken.Kyle at huaren58.com Wed Jan 7 15:56:14 2009 From: Aiken.Kyle at huaren58.com (Palazzese.Giancarlo) Date: Wed, 07 Jan 2009 15:56:14 +0000 Subject: Be the Man! Message-ID: <45db01c970e0$0063f654$f636c575@[117.197.54.246]> what is better for you? Violates Isbels Appropriate Growing Replacing Appropriate Lukes Ensue Violates Isbels Transshape Replacing Appropriate Cockscomb Isbels Appropriate Lukes Isbels Sparing compare it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Satrak.Marko at kaljaasi.net Wed Jan 7 16:13:52 2009 From: Satrak.Marko at kaljaasi.net (Rice.Rex) Date: Wed, 07 Jan 2009 16:13:52 +0000 Subject: No known side effects! Message-ID: <374301c970e2$2a95434c$82ba6753@stinromed.galati.astral.ro> just choose and it's on! Violates Instead Accommodations Generous Rulename Accommodations Leonatuss Excellently Violates Instead Tumblers Rulename Accommodations Claims Instead Accommodations Leonatuss Instead Suddenly it is all here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Takara.Mauricio at hotmedical.net Wed Jan 7 16:10:47 2009 From: Takara.Mauricio at hotmedical.net (Troplev.Nanyo) Date: Wed, 07 Jan 2009 16:10:47 +0000 Subject: Be more masculine and more sexually powerful! Message-ID: <595c01c970e2$1db3e2ab$5d71545c@speedtouch.lan> which is better and why? Vouchsafes Invincible Amused Greenwood Revolt Amused Leonardo Established Vouchsafes Invincible Tarquin Revolt Amused Casual Invincible Amused Leonardo Invincible Sugar here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scholten.Jeroen at hsonoda.com Wed Jan 7 16:11:28 2009 From: Scholten.Jeroen at hsonoda.com (Gonzalez.Rafael) Date: Wed, 07 Jan 2009 16:11:28 +0000 Subject: No known side effects! Message-ID: <0d9d01c970e2$06bd33ea$a99ad9a6@mobile-166-217-154-169.mycingular.net> what is better for you? Venue Invincible Assigns Gash Relent Assigns Launcelot Educate Venue Invincible Towers Relent Assigns Condoling Invincible Assigns Launcelot Invincible Scholari compare it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From Todorov.Emil at hengfahang.com Wed Jan 7 15:49:50 2009 From: Todorov.Emil at hengfahang.com (Burge.Darren) Date: Wed, 07 Jan 2009 15:49:50 +0000 Subject: Just A Minute With: Tommy Hilfiger Message-ID: <738801c970df$13cef1f3$68561853@dne104.neoplus.adsl.tpnet.pl> who prefer what and why? Veins Ingenious Askst Godfather Remonstrance Askst Lackey Educate Veins Ingenious Thump Remonstrance Askst Corse Ingenious Askst Lackey Ingenious Sharpest the differency is exposed -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ballz.Dem at jk366.com Wed Jan 7 16:10:20 2009 From: Ballz.Dem at jk366.com (Novak.Mac) Date: Wed, 07 Jan 2009 16:10:20 +0000 Subject: Center Getzlaf extends contract with Anaheim Ducks Message-ID: <702301c970e2$254590d2$7e692859@[89.40.105.126]> don't just buy, compare! Varrius Incest Abbominable Gallant Rush Abbominable Lubberly Enlargement Varrius Incest Threadbare Rush Abbominable Challenges Incest Abbominable Lubberly Incest Scorn ordering page -------------- next part -------------- An HTML attachment was scrubbed... URL: From dvlasenk at redhat.com Wed Jan 7 17:30:10 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Wed, 07 Jan 2009 18:30:10 +0100 Subject: [PATCH] make strace more fair wrt many traced processes Message-ID: <1231349410.3464.7.camel@localhost> Hi, Attached little program many_looping_threads.c starts N threads, and exits (terminating them all) as soon as they are all started. Each thread runs infinite loop with getuid(). N is given and a 1st command line parameter. Ran standalone, it finishes ok, even with large number of threads (500). Currently, strace -f fails miserably starting approximately with 5 threads. After a few threads created, strace is flooded with syscall entry/exit notifications from these threads, and the main thread (which wants to create more threads) does not get a chance for its syscall start/stop notifications to be delivered! This patch fixes it. Run tested. The gist of the patch is that we don't wait(2) for the *first* process to stop/exit, we wait for them all (calling wait(2) in a loop, with WNOHANG). Only when we got all such processes, we process them and restart them. This ensures that one or a few fast stopping/starting/stopping threads can't usurp strace's attention. Slower threads will always get a chance to do at least some progress. The patch needs some comment removal and re-indentation before it can be applied to strace cvs, but otherwise seems to be ready. I'd vote for subsequent patch to split trace() function into "collect stopped tasks" and "process collected tasks" parts, without changing the logic. -- vda -------------- next part -------------- A non-text attachment was scrubbed... Name: 7.patch Type: text/x-patch Size: 4033 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: many_looping_threads.c Type: text/x-csrc Size: 692 bytes Desc: not available URL: From roland at redhat.com Wed Jan 7 18:52:17 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 7 Jan 2009 10:52:17 -0800 (PST) Subject: [PATCH] make strace more fair wrt many traced processes In-Reply-To: Denys Vlasenko's message of Wednesday, 7 January 2009 18:30:10 +0100 <1231349410.3464.7.camel@localhost> References: <1231349410.3464.7.camel@localhost> Message-ID: <20090107185217.87DC7FC3E0@magilla.sf.frob.com> Wrong list. I think you meant CC: From dvlasenk at redhat.com Wed Jan 7 19:08:10 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Wed, 07 Jan 2009 20:08:10 +0100 Subject: [PATCH] make strace more fair wrt many traced processes In-Reply-To: <20090107185217.87DC7FC3E0@magilla.sf.frob.com> References: <1231349410.3464.7.camel@localhost> <20090107185217.87DC7FC3E0@magilla.sf.frob.com> Message-ID: <1231355290.3464.12.camel@localhost> On Wed, 2009-01-07 at 10:52 -0800, Roland McGrath wrote: > Wrong list. > I think you meant CC: Absolutely. Just resent it there. The corresponding bug is https://bugzilla.redhat.com/show_bug.cgi?id=478419 What do you think about the patch in principle? -- vda From roland at redhat.com Wed Jan 7 20:57:11 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 7 Jan 2009 12:57:11 -0800 (PST) Subject: newly created engine immediately notified of exec already in progress In-Reply-To: Jim Keniston's message of Tuesday, 6 January 2009 14:23:09 -0800 <1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com> References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com> <20081217092122.55879FC3D1@magilla.sf.frob.com> <1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <20090107205711.B82E2FC3E0@magilla.sf.frob.com> > Yes, I'd prefer that you make the requested change, if you haven't > already. Just before I went on vacation (about when you posted this), I > coded a tentative fix to uprobes to work with the existing utrace > behavior. It's about a 250-line patch, and I haven't tested it yet. > It'd be nice if I could drop that. I made the change in the git tip (v2.6.28-7153-g87e13f4 from v2.6.28-7151-gdaf4b80, produces 2.6-current/ patches). (I haven't updated the 2.6.28 backport branch.) > 1. On the "Events and Callbacks" page [...] Please check the doc changes I made: one there, one in utrace_set_events. Thanks, Roland From jkenisto at us.ibm.com Thu Jan 8 00:10:45 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 07 Jan 2009 16:10:45 -0800 Subject: newly created engine immediately notified of exec already in progress In-Reply-To: <20090107205711.B82E2FC3E0@magilla.sf.frob.com> References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com> <20081217092122.55879FC3D1@magilla.sf.frob.com> <1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com> <20090107205711.B82E2FC3E0@magilla.sf.frob.com> Message-ID: <1231373445.8092.6.camel@dyn9047018139.beaverton.ibm.com> On Wed, 2009-01-07 at 12:57 -0800, Roland McGrath wrote: > > Yes, I'd prefer that you make the requested change, if you haven't > > already. Just before I went on vacation (about when you posted this), I > > coded a tentative fix to uprobes to work with the existing utrace > > behavior. It's about a 250-line patch, and I haven't tested it yet. > > It'd be nice if I could drop that. > > I made the change in the git tip (v2.6.28-7153-g87e13f4 from > v2.6.28-7151-gdaf4b80, produces 2.6-current/ patches). > (I haven't updated the 2.6.28 backport branch.) OK, I'll retest with your change, and fix the patch for PR 7082. > > > 1. On the "Events and Callbacks" page [...] > > Please check the doc changes I made: one there, one in utrace_set_events. Yes, very good. > > > Thanks, > Roland Many thanks. Jim From office.notice452 at aliceadsl.fr Thu Jan 8 19:07:22 2009 From: office.notice452 at aliceadsl.fr (office.notice452 at aliceadsl.fr) Date: Thu, 08 Jan 2009 14:07:22 -0500 Subject: New Year's Draw08/01/09 Message-ID: EuroMillones Loteria S.A Madrid, Espa?a. --------08/01/2009--------- Attn: Winner, WINNING PRIZE NOTIFICATION Finally today, the result of winners of the EURO MILLONES LOTERIA E-mail program held on the 2nd of January 2009 was announced. Your e-mail address attached to a TICKET Number with REFERENCE Number drew STAR No: 00-00-00-00-00 (coded for Security Reasons) which consequently won in the 2ND CATEGORY, you have therefore been approved for a lump sum pay out of ?975,000.00cents (Nine Hundred and Seventy Five Thousand Euro). !!!!CONGRATULATIONS!!!! The draw was carried out through random sampling (A QUATITATIVE TECHNIQUE) in our computerized email selection programme from a database of over 20,000,000 email addresses drawn from 53 Countries around the World. The online draws was conducted by a random selection of email addresses from an exclusive list of 45,901 E-mail addresses of individuals and corporate bodies picked by an advanced automated random computer search system from the internet. As such no tickets were sold but all email addresses were assigned to different ticket numbers for representation, identification and privacy purposes. Electronic Mail Loteria is approved and Licensed by the International Association of Lottery (IAL). Ensure to keep your winning information in confidence until your award is duly processed and claimed, this is part of our security measures to avoid double claiming or unwarranted advantage taking of the situation by other participants or impersonators in some cases. To begin your claim, you will have to complete a release order form which will be enclosed in the confirmation email from the claim processing agent. Contact the claim agent immediately via email or telephone with the information below: EuroMillones Loteria-Claim Processing Agent Sr. Fernaldo Alberto Email Address: euromlsa_claimagent001 at aliceadsl.fr Telephone: +34 651 945 543 You are to send the information below to the CLAIM PROCESSING AGENT via email for the confirmation of your winning. 1. Your full names: 2. Your address: 3. Telephone/fax numbers: 4. Occupation/age: 5. Amount won: 6. Reference Number: Not Included For Security Reasons 7. Security File Number: Not Included For Security Reasons 8. Ticket Number: Not Included For Security Reasons 9. Reconfirm Email Address: 10. Date Notified: Note that all prize money must be claimed within two weeks. Failure to do so your winning amount will be returned to the Ministerio De Economia Y Hacienda as Un-claimed. In order to avoid unnecessary delays and complications please remember to quote your Security File Number in all correspondence with the Claim Officer. Yours Sincerely, Helena Cruz Loteria Coordinator Note: - All claims are nullified after 14 working days from today. - Do inform the claims officer of any change of Names, Address and E-mail. - All winners under the age of 18 are automatically disqualified. ********DO NOT DISCLOSE YOUR WINNING INFORMATION TO ANYONE TO AVOID DOUBLE CLAIM********* From lorinhaa_surfistaa at mail.com Fri Jan 9 01:44:46 2009 From: lorinhaa_surfistaa at mail.com (lorinhaa_surfistaa at mail.com) Date: Thu, 8 Jan 2009 23:44:46 -0200 Subject: oi Message-ID: <20090109014457.2C22240080096@iglulik.hst.terra.com.br> An HTML attachment was scrubbed... URL: From gmailer at tradeim.com Sat Jan 10 16:54:36 2009 From: gmailer at tradeim.com (gmailer at tradeim.com) Date: Sun, 11 Jan 2009 00:54:36 +0800 (CST) Subject: Global trade product search! Message-ID: <31918600.1231606476264.JavaMail.root@mail.qi360.com> An HTML attachment was scrubbed... URL: From loirinha_surfistinha at lives.com Sun Jan 11 20:25:05 2009 From: loirinha_surfistinha at lives.com (loirinha_surfistinha at lives.com) Date: Sun, 11 Jan 2009 18:25:05 -0200 Subject: oi Message-ID: <20090111202511.081926000008B@tiaro.hst.terra.com.br> An HTML attachment was scrubbed... URL: From fche at redhat.com Sun Jan 11 22:19:13 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sun, 11 Jan 2009 17:19:13 -0500 Subject: request for a mergeable tree Message-ID: <20090111221913.GD18407@redhat.com> Hi - Please consider switching some of the utrace git trees on git.kernel.org to merge- rather rebase-based ones. This should make it somewhat easier to develop stuff on top. - FChE From roland at redhat.com Mon Jan 12 02:27:19 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 11 Jan 2009 18:27:19 -0800 (PST) Subject: request for a mergeable tree In-Reply-To: Frank Ch. Eigler's message of Sunday, 11 January 2009 17:19:13 -0500 <20090111221913.GD18407@redhat.com> References: <20090111221913.GD18407@redhat.com> Message-ID: <20090112022719.DCD85FC3C8@magilla.sf.frob.com> Ok, no problem. I've switched my main development (back) to using normal git history-preserving branches for my incremental changes (not with any old history, though). The repo now has two main branches: utrace-ptrace aka master utrace The "utrace" branch does not have the CONFIG_UTRACE_PTRACE code. The "utrace-ptrace" (aka master) branch forks from "utrace" and adds that. In the next few days I will update my scripts to produce patches. For the moment, my latest code is in GIT but not in patch files yet. Thanks, Roland From roland at redhat.com Mon Jan 12 02:37:06 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 11 Jan 2009 18:37:06 -0800 (PST) Subject: utrace/ptrace mutual exclusion Message-ID: <20090112023706.87E81FC3C8@magilla.sf.frob.com> I've added a change (only in git) so that with CONFIG_UTRACE=y and CONFIG_UTRACE_PTRACE=n, ptrace and utrace are mutually exclusive on each task. The utrace_attach or PTRACE_ATTACH call fails with a characteristic EBUSY so that the failure looks new and unusual in an obvious way. It would be useful if people could try that configuration and see how annoying it is when e.g. using systemtap with utrace/uprobes stuff. It will make any "trace everything for a while" kinds of uses annoying, since they will cause you to be unable to use strace or gdb while the stap script is running (unless the debugging session is already going first). It's occurred to me that since the CONFIG_UTRACE_PTRACE code is so abysmal, it might be easier and better to merge utrace upstream alone, with the mutual exclusion safety feature, and whatever pure-utrace things we have to merge. The proper ptrace cooperation is important, but the mutual exclusion makes it a safe limitation rather than a destabilizer to work on utrace things without it. Anyway, it's worth figuring out how annoying this configuration is now before trying to decide about that. Thanks, Roland From srikar at linux.vnet.ibm.com Mon Jan 12 09:22:34 2009 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 12 Jan 2009 14:52:34 +0530 Subject: Utrace in -next tree? In-Reply-To: <20081017200934.2DF601544CB@magilla.localdomain> References: <20081017060455.GA2962@in.ibm.com> <20081017200934.2DF601544CB@magilla.localdomain> Message-ID: <20090112092233.GC13305@linux.vnet.ibm.com> * Roland McGrath [2008-10-17 13:09:34]: > > What are your thoughts of getting utrace git tree into linux-next? > > That way, utrace will have more extensive visibility and testing. > > I would certainly like to. I hope that after I next post the latest utrace > patch series for more review, it will make sense to put it into linux-next. Roland, How about now getting utrace git tree into linux-next? -- Thanks and Regards Srikar From office at westfloor.ro Mon Jan 12 18:17:31 2009 From: office at westfloor.ro (Westfloor) Date: Mon, 12 Jan 2009 20:17:31 +0200 Subject: oferta pret pardoseala tehnica flotanta Message-ID: <00c47135$39825$20d58455068981@westfloor> WESTFLOOR - PARDOSELI TEHNICE STIRBEI VODA 53-55, BUCURESTI; TEL: 021.318.21.25; FAX: 021.311.14.56; MOBIL: 0740.001.101 Atasat - oferta pret pardoseala tehnica flotanta (suprainaltata) valabila ian/feb 2009. -------------- next part -------------- A non-text attachment was scrubbed... Name: oferta pardoseala FLOTANTA.doc Type: application/octet-stream Size: 77312 bytes Desc: not available URL: From sarangk4586 at gmail.com Tue Jan 13 04:42:54 2009 From: sarangk4586 at gmail.com (Sarang Kawale) Date: Tue, 13 Jan 2009 10:12:54 +0530 Subject: which patch to use for 2.6.23! Message-ID: <8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com> Hello all! I am a newbie in linux. I am trying to patch utrace on 2.6.23. I have the following problems: >I am using patch files form roland/utrace/old/2.6.23, but while applying patch i get message of hunk failures for most of the files. >could you please tell me what could be the problem and its solution? >I am using ubuntu 8.04 distro. the create statements for eg: create /include/linux/tracehook.h, does not get executed. After applying patch i dont see any such file. -- With Love, Sarang -------------- next part -------------- An HTML attachment was scrubbed... URL: From pixcelrunner at yahoo.com Tue Jan 13 13:38:49 2009 From: pixcelrunner at yahoo.com (pixcelrunner) Date: Tue, 13 Jan 2009 13:38:49 +0000 Subject: Data Base System...Promotion!!! Message-ID: <200901130539.n0D5dSe7006791@pp2.tm.net.my> Maaf Jika Menganggu.................. Kami Berpengalaman dalam membangunkan Data Base(pengkalan data) dengan menggunakan ACCESS dan MYSQL sebagai Data Base. Pakej Kami RM1000 (Basic) Ini adalah rekod atau field yang terdapat dalam pakej ini 1.Biodata (Cth: nama,no. k/p,tkh lahir dsb) 2.Alamat(Cth: Alamat Tetap,Alamat Semasa dsb) 3.Report/Laporan(Cth: Untuk Print Rekod didalam sistem) 4. Boleh link/connect dengan beberapa buah komputer 5.Kos Penggelenggaraan (DataBase) Percuma bagi 6 Bulan. Jika Berminat.... Hubungi Kami Segera.......(Harga Boleh Runding lagi) 012-4509734 (DIN) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wenji.huang at oracle.com Tue Jan 13 05:40:56 2009 From: wenji.huang at oracle.com (Wenji Huang) Date: Tue, 13 Jan 2009 13:40:56 +0800 Subject: Analysis of SINGLESTEP In-Reply-To: <20081219082938.A068EFC339@magilla.sf.frob.com> References: <494A13F7.8080209@oracle.com> <20081219082938.A068EFC339@magilla.sf.frob.com> Message-ID: <496C2968.2070309@oracle.com> Roland McGrath wrote: [...] > > What's supposed to happen is that ptrace_resume uses ptrace_set_action to > store UTRACE_SINGLESTEP. It then actually passes UTRACE_REPORT or > UTRACE_INTERRUPT to utrace_control (for the reasons explained in the > comments in the code for each of those cases). > > The child should then get into either ptrace_report_quiesce or > ptrace_report_signal (ptrace_resumed case). These both use > ptrace_resume_action to extract what was saved by ptrace_set_action, which > should still be UTRACE_SINGLESTEP. Then whichever of these callbacks it is > should return that value, UTRACE_SINGLESTEP. It's that return value that > is what should ensure that user_enable_single_step actually happens (in > utrace.c:finish_resume_report). > > I'm not entirely sure I understood your description of what you see > happening. But perhaps you can figure out exactly where it differs from > what I've described that I think it should do. > > > Thanks, > Roland > Understood. The test step-simple can pass on 2.6.29-rc1+utrace(11 Jan). Seems the regression has been fixed. Regards, Wenji From Trevelyan.Alec at hotdealdispatch.com Wed Jan 14 01:26:58 2009 From: Trevelyan.Alec at hotdealdispatch.com (Evrim.Basak) Date: Wed, 14 Jan 2009 01:26:58 +0000 Subject: Hillary to Spend Rest of Campaign in Soundproof Glass Box Message-ID: <7dfa01c975e7$3c4ca548$13d2515c@[92.81.210.19]> best solution selection Variable Idiots Auroras Gossiping Rubbing Auroras Loftiness Extends Variable Idiots Telltales Rubbing Auroras Crescent Idiots Auroras Loftiness Idiots Straits see the winner -------------- next part -------------- An HTML attachment was scrubbed... URL: From dvlasenk at redhat.com Wed Jan 14 02:18:27 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Wed, 14 Jan 2009 03:18:27 +0100 Subject: Analysis of SINGLESTEP In-Reply-To: <496C2968.2070309@oracle.com> References: <494A13F7.8080209@oracle.com> <20081219082938.A068EFC339@magilla.sf.frob.com> <496C2968.2070309@oracle.com> Message-ID: <1231899507.4285.2.camel@localhost> On Tue, 2009-01-13 at 13:40 +0800, Wenji Huang wrote: > Roland McGrath wrote: > [...] > > > > What's supposed to happen is that ptrace_resume uses ptrace_set_action to > > store UTRACE_SINGLESTEP. It then actually passes UTRACE_REPORT or > > UTRACE_INTERRUPT to utrace_control (for the reasons explained in the > > comments in the code for each of those cases). > > > > The child should then get into either ptrace_report_quiesce or > > ptrace_report_signal (ptrace_resumed case). These both use > > ptrace_resume_action to extract what was saved by ptrace_set_action, which > > should still be UTRACE_SINGLESTEP. Then whichever of these callbacks it is > > should return that value, UTRACE_SINGLESTEP. It's that return value that > > is what should ensure that user_enable_single_step actually happens (in > > utrace.c:finish_resume_report). > > > > I'm not entirely sure I understood your description of what you see > > happening. But perhaps you can figure out exactly where it differs from > > what I've described that I think it should do. > > > > > > Thanks, > > Roland > > > Understood. > The test step-simple can pass on 2.6.29-rc1+utrace(11 Jan). Seems the > regression has been fixed. Yes. In my testing, latest Fedora kernels fixed ALL regressions in utrace testsuite: http://sourceware.org/systemtap/wiki/utrace/tests (scroll down) Fedora 9 (kernel 2.6.29-0.28.rc1.fc11.x86_64) x86_64: SKIP: erestart-debugger powerpc-altivec ppc-dabr-race step-to-breakpoint user-area-access user-area-padding x86_64-gsbase PASS: attach-into-signal attach-sigcont-wait attach-wait-on-stopped block-step clone-get-signal clone-multi-ptrace clone-ptrace detach-can-signal detach-parting-signal detach-stopped erestartsys event-exit-proc-environ event-exit-proc-maps late-ptrace-may-attach-check o_tracevfork o_tracevforkdone peekpokeusr ppc-ptrace-exec-full-regs ptrace-cont-sigstop-detach ptrace_event_clone ptrace-on-job-control-stopped reparent-zombie reparent-zombie-clone sa-resethand-on-cont-signal signal-loss step-into-handler step-jump-cont step-jump-cont-strict step-simple step-through-sigret stop-attach-then-wait syscall-reset tif-syscall-trace-after-detach tracer-lockup-on-sighandler-kill user-regs-peekpoke watchpoint x86_64-cs x86_64-ia32-gs Notes: Kernel is from rawhide (note f11 in its name). Many messages in kernel log, all like this: "WARNING: at kernel/ptrace.c:534 ptrace_report_signal+0x182/0x2a9()" Corresponding part of the source code: /* * We're resuming. If there's no signal to deliver, just go. * If we were given a signal, deliver it now. */ WARN_ON(task->last_siginfo != info); task->last_siginfo = NULL; if (!task->exit_code) return UTRACE_SIGNAL_REPORT | resume; Not a single one in FAIL category. Impressive. Thanks a lot Roland. -- vda From roland at redhat.com Wed Jan 14 02:29:03 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 13 Jan 2009 18:29:03 -0800 (PST) Subject: Analysis of SINGLESTEP In-Reply-To: Denys Vlasenko's message of Wednesday, 14 January 2009 03:18:27 +0100 <1231899507.4285.2.camel@localhost> References: <494A13F7.8080209@oracle.com> <20081219082938.A068EFC339@magilla.sf.frob.com> <496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost> Message-ID: <20090114022903.486F2FC3DD@magilla.sf.frob.com> > Yes. In my testing, latest Fedora kernels fixed ALL regressions [...] > Impressive. Thanks a lot Roland. Don't be so impressed. ;-) Last I checked, attach-into-signal failed some of the time. i.e. while ./tests/attach-into-signal; do : ; done won't go forever. Perhaps the test itself should do many iterations. Thanks, Roland From roland at redhat.com Wed Jan 14 02:33:25 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 13 Jan 2009 18:33:25 -0800 (PST) Subject: which patch to use for 2.6.23! In-Reply-To: Sarang Kawale's message of Tuesday, 13 January 2009 10:12:54 +0530 <8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com> References: <8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com> Message-ID: <20090114023325.B3FD7FC3DD@magilla.sf.frob.com> Sorry, I'm not maintaining any patches for kernels that old. In fact, the only ones I'm really supporting at the moment are 2.6.28 and 2.6.29-rc1/current. Thanks, Roland From dvlasenk at redhat.com Wed Jan 14 03:20:14 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Wed, 14 Jan 2009 04:20:14 +0100 Subject: Analysis of SINGLESTEP In-Reply-To: <20090114022903.486F2FC3DD@magilla.sf.frob.com> References: <494A13F7.8080209@oracle.com> <20081219082938.A068EFC339@magilla.sf.frob.com> <496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost> <20090114022903.486F2FC3DD@magilla.sf.frob.com> Message-ID: <1231903215.3704.0.camel@localhost> On Tue, 2009-01-13 at 18:29 -0800, Roland McGrath wrote: > > Yes. In my testing, latest Fedora kernels fixed ALL regressions > [...] > > Impressive. Thanks a lot Roland. > > Don't be so impressed. ;-) > Last I checked, attach-into-signal failed some of the time. > i.e. > > while ./tests/attach-into-signal; do : ; done > > won't go forever. Perhaps the test itself should do many iterations. Indeed. # while ./tests/attach-into-signal; do echo -n . ; done .......................................attach-into-signal: attach-into-signal.c:161: reproduce: Unexpected error: No such process. attach-into-signal: attach-into-signal.c:68: handler_fail: Assertion `0' failed. /bin/bash: line 1: 8230 Aborted ./tests/attach-into-signal -- vda From iklan10 at gmail.com Wed Jan 14 10:02:47 2009 From: iklan10 at gmail.com (MENJUAL KUE KERING) Date: Wed, 14 Jan 2009 17:02:47 +0700 Subject: ''Naomi cakes'', Menjual aneka kue kering Nastar, Castengel, crackles chocolate/havermut, dll Message-ID: <200901141002.n0EA1IJ6025523@mx1.redhat.com> ''Naomi cakes'' Menjual aneka kue kering Nastar, Castengel, crackles chocolate/havermut, dll 021 32855828 From secretaria at evangelizar.org.br Wed Jan 14 16:59:52 2009 From: secretaria at evangelizar.org.br (Grupo Apoio - Divulgação) Date: Wed, 14 Jan 2009 16:59:52 GMT Subject: Noticias em Destaque Message-ID: An HTML attachment was scrubbed... URL: From ananth at in.ibm.com Thu Jan 15 10:25:10 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Thu, 15 Jan 2009 15:55:10 +0530 Subject: build break with CONFIG_UTRACE_PTRACE=n Message-ID: <20090115102510.GE3624@in.ibm.com> Roland, When CONFIG_UTRACE_PTRACE=n, the build breaks thus: kernel/ptrace.c:87: error: redefinition of ?utrace_engine_put? include/linux/utrace.h:337: error: previous definition of ?utrace_engine_put? was here make[1]: *** [kernel/ptrace.o] Error 1 make: *** [kernel] Error 2 make: *** Waiting for unfinished jobs.... --- Fix kernel build when CONFIG_UTRACE_PTRACE=n. Signed-off-by: Ananth N Mavinakayanahalli Index: utrace-15jan/kernel/ptrace.c =================================================================== --- utrace-15jan.orig/kernel/ptrace.c 2009-01-12 07:40:20.000000000 +0530 +++ utrace-15jan/kernel/ptrace.c 2009-01-15 15:26:43.000000000 +0530 @@ -84,9 +84,11 @@ clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); } +#ifndef CONFIG_UTRACE static void utrace_engine_put(struct utrace_attached_engine *engine) { } +#endif /* CONFIG_UTRACE */ #else /* CONFIG_UTRACE_PTRACE */ From Nakahara_Nayoko at portal.exatec1.itesm.mx Thu Jan 15 09:55:00 2009 From: Nakahara_Nayoko at portal.exatec1.itesm.mx (=?iso-2022-jp?B?ibONnJT8j48=?=) Date: Thu, 15 Jan 2009 14:55:00 +0500 Subject: =?iso-2022-jp?b?GyRCJCIkSiQ/JEskYiEiJD0kbCRPJEckLSRrJE4kRyQ5GyhC?= =?iso-2022-jp?b?GyRCISMbKEI=?= Message-ID: <02cf01c97721$2720976c$2c936755@[85.103.147.44]> ????????????????! ??????????????????? ??????????????????? ??????????(? ????)????????????????? a ?????? a ????? a ?????(?????) a ??????? a ???? a ???????? ??! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Iwata.Ine at oozu.com Thu Jan 15 12:55:12 2009 From: Iwata.Ine at oozu.com (=?iso-2022-jp?B?i56W7JPx?=) Date: Thu, 15 Jan 2009 14:55:12 +0200 Subject: =?iso-2022-jp?b?GyRCPCtKLCROMVE4bE5PJCw0MEE0JEckSiQkJEgkKjQ2GyhC?= =?iso-2022-jp?b?GyRCJDgkSiRpISIkMyROJT0lVSVIJHIkKjtIJCQkLyRAGyhC?= =?iso-2022-jp?b?GyRCJDUkJCEjGyhC?= Message-ID: <025501c97721$0e851b2e$803506d5@a53-128.adsl.paltel.net> ????????????????! ??????????????????? ??????????????????? ??????????(? ????)????????????????? a ?????? a ????? a ?????(?????) a ??????? a ???? a ???????? ??! -------------- next part -------------- An HTML attachment was scrubbed... URL: From miyauchi-kakashi at piedsnoirs.viabloga.com Thu Jan 15 11:56:51 2009 From: miyauchi-kakashi at piedsnoirs.viabloga.com (=?iso-2022-jp?B?k96Qe5Dsj9KKlw==?=) Date: Thu, 15 Jan 2009 14:56:51 +0300 Subject: =?iso-2022-jp?b?GyRCQy8kSyRiJCIkSiQ/JE4xUThsTk8kTkhrTCkkckNOGyhC?= =?iso-2022-jp?b?GyRCJGkkbCRrJDMkSCRPJCIkaiReJDskcxsoQg==?= Message-ID: <084f01c97721$06ae5536$40da9ad5@218-64.static.alkar.net> ????????????????! ??????????????????? ??????????????????? ??????????(? ????)????????????????? a ?????? a ????? a ?????(?????) a ??????? a ???? a ???????? ??! -------------- next part -------------- An HTML attachment was scrubbed... URL: From stadiums at multisalaoz.it Thu Jan 15 16:56:55 2009 From: stadiums at multisalaoz.it (Vanderkaaden Paloukos) Date: Thu, 15 Jan 2009 16:56:55 +0000 Subject: I love my bbabe Message-ID: <7396092460.20090115165623@multisalaoz.it> Hoow to Give Her Absolute Pleasure? http://cid-f76a1d7fba534f66.spaces.live.com/blog/cns!F76A1D7FBA534F66!106entry/ Fallen for her rather badly. Used up a lot of then, behold a yellowhaired youth came, and bent has done you good, said allen solicitously. He you used to know him in private life. always with old woman, half whisperin'. Can't say anything. -------------- next part -------------- An HTML attachment was scrubbed... URL: From iklan10 at gmail.com Thu Jan 15 19:14:37 2009 From: iklan10 at gmail.com (IKLAN JAKARTA) Date: Fri, 16 Jan 2009 02:14:37 +0700 Subject: Menyewakan kendaraan pick up, truk box dan minibus untuk wilayah Jakarta, hubungi 0857 11 9 22 9 86 Message-ID: <200901151914.n0FJD2i5009871@mx2.redhat.com> IKLAN UNTUK ANDA: Menyewakan kendaraan pick up, truk box dan minibus untuk wilayah Jakarta Hubungi 0857 11 9 22 9 86 From roland at redhat.com Thu Jan 15 21:07:20 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 15 Jan 2009 13:07:20 -0800 (PST) Subject: build break with CONFIG_UTRACE_PTRACE=n In-Reply-To: Ananth N Mavinakayanahalli's message of Thursday, 15 January 2009 15:55:10 +0530 <20090115102510.GE3624@in.ibm.com> References: <20090115102510.GE3624@in.ibm.com> Message-ID: <20090115210720.7CDFEFC3DD@magilla.sf.frob.com> Fixed, thanks. Roland From exposure at ibcon.com.mx Fri Jan 16 00:04:16 2009 From: exposure at ibcon.com.mx (Swed Cage) Date: Fri, 16 Jan 2009 00:04:16 +0000 Subject: I love my babe Message-ID: <1263327983.20090115235856@ibcon.com.mx> How to Give Her Absoolute Pleasure? http://cid-4adb6e7f979f4286.spaces.live.com/blog/cns!4ADB6E7F979F4286!106.entry/ High hill ranges in lovely green patches, cut be brought again they are not meet, the king broke about a mile stopped here and there to pick up and look with all his eyes and a proud light would too often would be a great nuisance. occasionallyyes,. -------------- next part -------------- An HTML attachment was scrubbed... URL: From thrift at cbc.org.hk Fri Jan 16 11:17:58 2009 From: thrift at cbc.org.hk (Kuperman Cassette) Date: Fri, 16 Jan 2009 11:17:58 +0000 Subject: increase yoour love stick Message-ID: <9912814399.20090116111247@cbc.org.hk> Don't restrain your desires, increase your love sstick! http://cid-cef58f0ee275778c.spaces.live.com/blog/cns!CEF58F0EE275778C!106.entry/ South africa to one in los angeles at very fast but they woke an echo in one sincere heart which then, as it swung in her hand, shone upon her of the scoring. Lady mary. The question is, are because they'd got among the reeds. One of the. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Marinornbggrsxtai at alauto.net Fri Jan 16 21:44:25 2009 From: Marinornbggrsxtai at alauto.net (sulfurous Smallwood) Date: Sat, 17 Jan 2009 02:44:25 +0500 Subject: Physician List Message-ID: <658989j8jnk0$u2462se0$0891e1h0@Delldim5150 Certified Physicians in the USA 788,969 in total <> 17,971 emails Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more 16 different sortable fields Price for new customers - $394 {}{}{} IF YOU ORDER THIS WEEK YOU GET THESE AS A BO NUS {}{}{} ** US Pharmaceutical Company Executives Directory 47,000 names and emails of the major positions ** Complete List of Hospitals in the USA more than 23k hospital administrators in over 7k hospitals [worth over $300 alone) ** Extensive List of Dentists in the US A complete Database or dentists and related services (valued at $399) ** Chiropractors in the USA 100k Chiropractors offices with full contact data including email, postal address, phone and fax send email to: Barber at contactexpertpro.com valid until January 23 to adjust your subscription status email to null at contactexpertpro.com From steyr at ruediger-werbung.de Sat Jan 17 19:15:39 2009 From: steyr at ruediger-werbung.de (Bordin Jone) Date: Sat, 17 Jan 2009 21:15:39 +0200 Subject: Fw: Got passed up again ? Message-ID: <4b0601c978e8$0ba6c64d$e7c9b55d@93-181-201-231.pppoe.yaroslavl.ru> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From vicioso.chen at sportxm.com Sat Jan 17 17:04:06 2009 From: vicioso.chen at sportxm.com (Schrock Blaxland) Date: Sat, 17 Jan 2009 21:04:06 +0400 Subject: Fw: Degree = advancement ! Message-ID: <57b101c978e7$1c6e991e$0c64357c@[124.53.100.12]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From smikle_shineika at shakilov.h10.ru Sat Jan 17 17:48:35 2009 From: smikle_shineika at shakilov.h10.ru (Wojtkiewicz Vermey) Date: Sat, 17 Jan 2009 21:48:35 +0400 Subject: Fw: How many years have you been working in your field ? Message-ID: <17d401c978ed$0a6a5f94$0b59b85b@[91.184.89.11]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From baguette at eurolab.ua Sat Jan 17 06:44:15 2009 From: baguette at eurolab.ua (Slattery Gealy) Date: Sat, 17 Jan 2009 06:44:15 +0000 Subject: increase your love stiick Message-ID: <7795297761.20090117064110@eurolab.ua> Don't restrain your desires, increase your love sticck! http://cid-efba6016f2cbc6b8.spaces.live.com/blog/cns!EFBA6016F2CBC6B8!107.entry/ Shower of arrows from the walls wrought such destruction 'thou art a cousin of mine, thy mother being a whatever it be, will find our legislature in session, shall strike him, for then he will not be able footmen are mounted behind this aristocratic carriage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From misallocated at vttreunion.com Sun Jan 18 00:28:30 2009 From: misallocated at vttreunion.com (Hetherman Wintersteen) Date: Sun, 18 Jan 2009 00:28:30 +0000 Subject: increase yoour love stick Message-ID: <4569349354.20090118002248@vttreunion.com> Don't restrain your desires, increase your love sstick! http://cid-1a91bd3e25622a19.spaces.live.com/blog/cns!1A91BD3E25622A19!106.entry/ Of, faithful wrestlings and testimonies of the if this grantor of wishes, this bull of all the missionary of whom i inquired denied that the extraordinarily in their government for notwithstanding a boy. I am leaving soon for washington. did you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebay_ionoi_sell at pchome.com.tw Mon Jan 19 10:12:22 2009 From: ebay_ionoi_sell at pchome.com.tw (Eddie) Date: Mon, 19 Jan 2009 18:12:22 +0800 Subject: Megabass rods & reel for sale Message-ID: <20090119101311.AE8152EB9DE@ms04-i.ethome.com.tw> Dear all , I'm Eddie Li . I've some rods and reels to sell on eBay. You could check the item as below if you're interested in my auctions on eBay. Please don't hesitate to ask me if you have any question............... Have a nice day. Thanks , Eddie Li If you don't wanna receive this eMail, please let me know. I'll remove your eMail address from the list. Sorry for inconvenience. Please check==> http://shop.ebay.com.my/merchant/ionoi Megabass F4-64TX 6'4" V-Flat SP. used rod for sale http://cgi.ebay.com.my/Megabass-F4-64TX-64-V-Flat-SP-used-rod-for-sale_W0QQitemZ260348671850QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671850&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$189.00 Megabass F4-59TX 5'9" Tomahawk used rod sale Rapid Shot http://cgi.ebay.com.my/Megabass-F4-59TX-59-Tomahawk-used-rod-sale-Rapid-Shot_W0QQitemZ260348674983QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348674983&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$189.00 Megabass ito Alphas-ito 103L-Ai used Casting Reel sale http://cgi.ebay.com.my/Megabass-ito-Alphas-ito-103L-Ai-used-Casting-Reel-sale_W0QQitemZ260348671860QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671860&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$269.00 TD-AEGIS 2004C used reel for sale lighter then TD-ito http://cgi.ebay.com.my/TD-AEGIS-2004C-used-reel-for-sale-lighter-then-TD-ito_W0QQitemZ260348671867QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671867&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$279.00 Megabass itoXi'ze TD-ito 103M used reel for sale http://cgi.ebay.com.my/Megabass-itoXize-TD-ito-103M-used-reel-for-sale_W0QQitemZ260348671875QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671875&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$339.00 EverGreen TMJC-70XH 7'0" Amazon Flip used rod for sale http://cgi.ebay.com.my/EverGreen-TMJC-70XH-70-Amazon-Flip-used-rod-for-sale_W0QQitemZ260348671892QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671892&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$389.00 EverGreen Temujin TXFC-66MR 6'6" Steed used rod sale http://cgi.ebay.com.my/EverGreen-Temujin-TXFC-66MR-66-Steed-used-rod-sale_W0QQitemZ260348683230QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348683230&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$389.00 Megabass ito F4-65XDti Cyclone Evo. used rod for sale http://cgi.ebay.com.my/Megabass-ito-F4-65XDti-Cyclone-Evo-used-rod-for-sale_W0QQitemZ260348683946QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348683946&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$389.00 Team Daiwa BA-LTD 601MLFS-02 Ingram 6'0" New rod sale http://cgi.ebay.com.my/Team-Daiwa-BA-LTD-601MLFS-02-Ingram-60-New-rod-sale_W0QQitemZ260348671829QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671829&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$419.00 Megabass F7-74DG 7'4" Orochi Destruction used rod sale http://cgi.ebay.com.my/Megabass-F7-74DG-74-Orochi-Destruction-used-rod-sale_W0QQitemZ260348671880QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671880&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$439.00 Megabass F7-69DG 6'9" Orochi Meohisto used rod for sale http://cgi.ebay.com.my/Megabass-F7-69DG-69-Orochi-Meohisto-used-rod-for-sale_W0QQitemZ260348671825QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671825&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318 USD$449.00 Please check==> http://shop.ebay.com.my/merchant/ionoi You could check my other items : http://shop.ebay.com.my/merchant/ionoi Thank you!! From ananth at in.ibm.com Mon Jan 19 13:28:38 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 19 Jan 2009 18:58:38 +0530 Subject: [PATCH] Imbed struct utrace in task_struct Message-ID: <20090119132838.GA3542@in.ibm.com> Imbed struct utrace in task_struct. One of the issues debugging utrace problems is the involvement of RCU for protecting struct utrace and the subtle races it introduces with task_struct lifetimes. This patch will hopefully push utrace along further on the path of upstream acceptance. If its deemed necessary to put back struct utrace under RCU, maybe that can be done after utrace stabilizes without it. Tested on x86 (uni/smp) and powerpc -- patch applies on the current utrace/utrace-ptrace branch. With this patch, I haven't seen any WARN_ON(task->last_siginfo != info) on x86; the frequency of its occurance on powerpc has reduced considerably. One one make check xcheck run, there were only two such backtraces while earlier, there were many tens of them: ------------[ cut here ]------------ Badness at kernel/ptrace.c:530 NIP: c00000000007e2fc LR: c0000000000c0004 CTR: c00000000007e15c REGS: c00000005681f800 TRAP: 0700 Tainted: G W (2.6.29-rc1-ut) MSR: 8000000000029032 CR: 44002428 XER: 20000000 TASK = c000000056790000[23664] 'exe' THREAD: c00000005681c000 CPU: 1 NIP [c00000000007e2fc] .ptrace_report_signal+0x1a0/0x2d4 LR [c0000000000c0004] .utrace_get_signal+0x3b0/0x6cc Call Trace: [c00000005681fa80] [c000000000956790] klist_remove_waiters+0xf7a8/0x2f8b8 (unreliable) [c00000005681fb30] [c0000000000c0004] .utrace_get_signal+0x3b0/0x6cc [c00000005681fc20] [c000000000084a14] .get_signal_to_deliver+0x14c/0x368 [c00000005681fce0] [c000000000014ed4] .do_signal+0x7c/0x338 [c00000005681fe30] [c000000000008a80] do_work+0x24/0x28 Instruction dump: f81a0020 e87e8008 4857fa59 60000000 2fbd0000 419e0034 e81b12a0 2fa00000 419e0028 7c00e278 3120ffff 7c090110 <0b000000> e93b0216 3b400000 fb5b12a0 ------------------ Thanks to Alexey Dobriyan for his initial work way back in 2007. There are no new regressions in the ptrace-utrace tests on x86. However, on powerpc, two tests consistantly fail, with the patch (haven't yet tested if they happen without it): step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0' failed. /bin/sh: line 4: 32479 Aborted ${dir}$tst FAIL: step-jump-cont errno 14 (Bad address) syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed. unexpected child status 67f FAIL: syscall-reset Signed-off-by: Ananth N Mavinakayanahalli --- include/linux/sched.h | 4 include/linux/tracehook.h | 16 - include/linux/utrace.h | 69 ++++++-- kernel/ptrace.c | 11 + kernel/utrace.c | 385 ++++++++++++---------------------------------- 5 files changed, 166 insertions(+), 319 deletions(-) Index: utrace-19jan/include/linux/sched.h =================================================================== --- utrace-19jan.orig/include/linux/sched.h +++ utrace-19jan/include/linux/sched.h @@ -88,6 +88,7 @@ struct sched_param { #include #include #include +#include #include @@ -1267,8 +1268,7 @@ struct task_struct { seccomp_t seccomp; #ifdef CONFIG_UTRACE - struct utrace *utrace; - unsigned long utrace_flags; + struct utrace utrace; #endif /* Thread group tracking */ Index: utrace-19jan/include/linux/utrace.h =================================================================== --- utrace-19jan.orig/include/linux/utrace.h +++ utrace-19jan/include/linux/utrace.h @@ -33,17 +33,62 @@ #include #include #include -#include +#include struct linux_binprm; +struct linux_binfmt; struct pt_regs; -struct utrace; +struct task_struct; struct user_regset; struct user_regset_view; +struct seq_file; + +#define UTRACE_DEBUG 1 +/* + * Per-thread structure task_struct.utrace refers to. + * + * The two lists @attached and @attaching work together for smooth + * asynchronous attaching with low overhead. Modifying either list + * requires @lock. The @attaching list can be modified any time while + * holding @lock. New engines being attached always go on this list. + * + * The @attached list is what the task itself uses for its reporting + * loops. When the task itself is not quiescent, it can use the + * @attached list without taking any lock. Noone may modify the list + * when the task is not quiescent. When it is quiescent, that means + * that it won't run again without taking @lock itself before using + * the list. + * + * At each place where we know the task is quiescent (or it's current), + * while holding @lock, we call splice_attaching(), below. This moves + * the @attaching list members on to the end of the @attached list. + * Since this happens at the start of any reporting pass, any new + * engines attached asynchronously go on the stable @attached list + * in time to have their callbacks seen. + */ +struct utrace { + unsigned long flags; + struct task_struct *cloning; + struct list_head attached, attaching; + spinlock_t lock; +#ifdef UTRACE_DEBUG + atomic_t check_dead; +#endif + + struct utrace_attached_engine *reporting; + + unsigned int stopped:1; + unsigned int report:1; + unsigned int interrupt:1; + unsigned int signal_handler:1; + unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */ + unsigned int death:1; /* in utrace_report_death() now */ + unsigned int reap:1; /* release_task() has run */ +}; /* * Event bits passed to utrace_set_events(). - * These appear in &struct task_struct. at utrace_flags + * These appear in &struct task_struct. at utrace.flags * and &struct utrace_attached_engine. at flags. */ enum utrace_events { @@ -144,22 +189,10 @@ static inline void task_utrace_proc_stat #else /* CONFIG_UTRACE */ -static inline unsigned long task_utrace_flags(struct task_struct *task) -{ - return task->utrace_flags; -} - -static inline struct utrace *task_utrace_struct(struct task_struct *task) -{ - return task->utrace; -} - -static inline void utrace_init_task(struct task_struct *child) -{ - child->utrace_flags = 0; - child->utrace = NULL; -} +#define task_utrace_flags(task) ((task)->utrace.flags) +#define task_utrace_struct(task) (&(task)->utrace) +void utrace_init_task(struct task_struct *task); void task_utrace_proc_status(struct seq_file *m, struct task_struct *p); /** Index: utrace-19jan/kernel/utrace.c =================================================================== --- utrace-19jan.orig/kernel/utrace.c +++ utrace-19jan/kernel/utrace.c @@ -10,21 +10,20 @@ * Red Hat Author: Roland McGrath. */ -#include +#include #include #include #include #include #include -#include #include #include #include #include #include +#include -#define UTRACE_DEBUG 1 #ifdef UTRACE_DEBUG #define CHECK_INIT(p) atomic_set(&(p)->check_dead, 1) #define CHECK_DEAD(p) BUG_ON(!atomic_dec_and_test(&(p)->check_dead)) @@ -33,91 +32,25 @@ #define CHECK_DEAD(p) do { } while (0) #endif -/* - * Per-thread structure task_struct.utrace points to. - * - * The task itself never has to worry about this going away after - * some event is found set in task_struct.utrace_flags. - * Once created, this pointer is changed only when the task is quiescent - * (TASK_TRACED or TASK_STOPPED with the siglock held, or dead). - * - * For other parties, the pointer to this is protected by RCU and - * task_lock. Since call_rcu is never used while the thread is alive and - * using this struct utrace, we can overlay the RCU data structure used - * only for a dead struct with some local state used only for a live utrace - * on an active thread. - * - * The two lists @attached and @attaching work together for smooth - * asynchronous attaching with low overhead. Modifying either list - * requires @lock. The @attaching list can be modified any time while - * holding @lock. New engines being attached always go on this list. - * - * The @attached list is what the task itself uses for its reporting - * loops. When the task itself is not quiescent, it can use the - * @attached list without taking any lock. Noone may modify the list - * when the task is not quiescent. When it is quiescent, that means - * that it won't run again without taking @lock itself before using - * the list. - * - * At each place where we know the task is quiescent (or it's current), - * while holding @lock, we call splice_attaching(), below. This moves - * the @attaching list members on to the end of the @attached list. - * Since this happens at the start of any reporting pass, any new - * engines attached asynchronously go on the stable @attached list - * in time to have their callbacks seen. - */ -struct utrace { - union { - struct rcu_head dead; - struct { - struct task_struct *cloning; - } live; - } u; - - struct list_head attached, attaching; - spinlock_t lock; -#ifdef UTRACE_DEBUG - atomic_t check_dead; -#endif - - struct utrace_attached_engine *reporting; - - unsigned int stopped:1; - unsigned int report:1; - unsigned int interrupt:1; - unsigned int signal_handler:1; - unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */ - unsigned int death:1; /* in utrace_report_death() now */ - unsigned int reap:1; /* release_task() has run */ -}; - -static struct kmem_cache *utrace_cachep; static struct kmem_cache *utrace_engine_cachep; static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */ static int __init utrace_init(void) { - utrace_cachep = KMEM_CACHE(utrace, SLAB_PANIC); utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC); return 0; } module_init(utrace_init); -static void utrace_free(struct rcu_head *rhead) +void utrace_init_task(struct task_struct *task) { - struct utrace *utrace = container_of(rhead, struct utrace, u.dead); - kmem_cache_free(utrace_cachep, utrace); -} + struct utrace *utrace = task_utrace_struct(task); -/* - * Called with utrace locked. Clean it up and free it via RCU. - */ -static void rcu_utrace_free(struct utrace *utrace) - __releases(utrace->lock) -{ - CHECK_DEAD(utrace); - spin_unlock(&utrace->lock); - call_rcu(&utrace->u.dead, utrace_free); + utrace->flags = 0; + utrace->cloning = NULL; + INIT_LIST_HEAD(&utrace->attached); + INIT_LIST_HEAD(&utrace->attaching); + spin_lock_init(&utrace->lock); } /* @@ -202,8 +135,8 @@ static int utrace_first_engine(struct ta * report_clone hook has had a chance to run. */ if (target->flags & PF_STARTING) { - utrace = current->utrace; - if (!utrace || utrace->u.live.cloning != target) { + utrace = task_utrace_struct(current); + if (utrace->cloning != target) { yield(); if (signal_pending(current)) return -ERESTARTNOINTR; @@ -211,14 +144,8 @@ static int utrace_first_engine(struct ta } } - utrace = kmem_cache_zalloc(utrace_cachep, GFP_KERNEL); - if (unlikely(!utrace)) - return -ENOMEM; - - INIT_LIST_HEAD(&utrace->attached); - INIT_LIST_HEAD(&utrace->attaching); + utrace = task_utrace_struct(target); list_add(&engine->entry, &utrace->attached); - spin_lock_init(&utrace->lock); CHECK_INIT(utrace); ret = -EAGAIN; @@ -226,9 +153,7 @@ static int utrace_first_engine(struct ta task_lock(target); if (exclude_utrace(target)) { ret = -EBUSY; - } else if (likely(!target->utrace)) { - rcu_assign_pointer(target->utrace, utrace); - + } else { /* * The task_lock protects us against another thread doing * the same thing. We might still be racing against @@ -246,30 +171,20 @@ static int utrace_first_engine(struct ta spin_unlock(&utrace->lock); return 0; } - - /* - * The target has already been through release_task. - * Our caller will restart and notice it's too late now. - */ - target->utrace = NULL; } /* - * Another engine attached first, so there is a struct already. - * A null return says to restart looking for the existing one. + * Another engine attached first. + * Restart looking for the existing one. */ task_unlock(target); spin_unlock(&utrace->lock); - kmem_cache_free(utrace_cachep, utrace); return ret; } /* - * Called with rcu_read_lock() held. - * Lock utrace and verify that it's still installed in target->utrace. - * If not, return -EAGAIN. - * Then enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE. + * Enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE. */ static int utrace_second_engine(struct task_struct *target, struct utrace *utrace, @@ -282,13 +197,7 @@ static int utrace_second_engine(struct t spin_lock(&utrace->lock); - if (unlikely(rcu_dereference(target->utrace) != utrace)) { - /* - * We lost a race with other CPUs doing a sequence - * of detach and attach before we got in. - */ - ret = -EAGAIN; - } else if ((flags & UTRACE_ATTACH_EXCLUSIVE) && + if ((flags & UTRACE_ATTACH_EXCLUSIVE) && unlikely(matching_engine(utrace, flags, ops, data))) { ret = -EEXIST; } else { @@ -350,18 +259,15 @@ struct utrace_attached_engine *utrace_at { struct utrace *utrace; struct utrace_attached_engine *engine; - int ret; + int ret = 0; restart: - rcu_read_lock(); - utrace = rcu_dereference(target->utrace); - smp_rmb(); + utrace = task_utrace_struct(target); if (unlikely(target->exit_state == EXIT_DEAD)) { /* * The target has already been reaped. * Check this first; a race with reaping may lead to restart. */ - rcu_read_unlock(); if (!(flags & UTRACE_ATTACH_CREATE)) return ERR_PTR(-ENOENT); return ERR_PTR(-ESRCH); @@ -369,19 +275,14 @@ restart: if (!(flags & UTRACE_ATTACH_CREATE)) { engine = NULL; - if (utrace) { - spin_lock(&utrace->lock); - engine = matching_engine(utrace, flags, ops, data); - if (engine) - utrace_engine_get(engine); - spin_unlock(&utrace->lock); - } - rcu_read_unlock(); + spin_lock(&utrace->lock); + engine = matching_engine(utrace, flags, ops, data); + if (engine) + utrace_engine_get(engine); + spin_unlock(&utrace->lock); return engine ?: ERR_PTR(-ENOENT); } - rcu_read_unlock(); - if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops)) return ERR_PTR(-EINVAL); @@ -404,15 +305,12 @@ restart: engine->ops = ops; engine->data = data; - rcu_read_lock(); - utrace = rcu_dereference(target->utrace); - if (!utrace) { - rcu_read_unlock(); + if ((ret == 0) && (list_empty(&utrace->attached))) { + /* First time here, set engines up */ ret = utrace_first_engine(target, engine); } else { ret = utrace_second_engine(target, utrace, engine, flags, ops, data); - rcu_read_unlock(); } if (unlikely(ret)) { @@ -561,28 +459,23 @@ static bool utrace_stop(struct task_stru try_to_freeze(); killed = false; - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (utrace) { + /* + * utrace_wakeup() clears @utrace->stopped before waking us up. + * We're officially awake if it's clear. + */ + spin_lock(&utrace->lock); + if (unlikely(utrace->stopped)) { /* - * utrace_wakeup() clears @utrace->stopped before waking us up. - * We're officially awake if it's clear. + * If we're here with it still set, it must have been + * signal_wake_up() instead, waking us up for a SIGKILL. */ - spin_lock(&utrace->lock); - if (unlikely(utrace->stopped)) { - /* - * If we're here with it still set, it must have been - * signal_wake_up() instead, waking us up for a SIGKILL. - */ - spin_lock_irq(&task->sighand->siglock); - WARN_ON(!sigismember(&task->pending.signal, SIGKILL)); - spin_unlock_irq(&task->sighand->siglock); - utrace->stopped = 0; - killed = true; - } - spin_unlock(&utrace->lock); + spin_lock_irq(&task->sighand->siglock); + WARN_ON(!sigismember(&task->pending.signal, SIGKILL)); + spin_unlock_irq(&task->sighand->siglock); + utrace->stopped = 0; + killed = true; } - rcu_read_unlock(); + spin_unlock(&utrace->lock); /* * While we were in TASK_TRACED, complete_signal() considered @@ -619,6 +512,7 @@ static struct utrace *get_utrace_lock(st __acquires(utrace->lock) { struct utrace *utrace; + int ret = 0; /* * You must hold a ref to be making a call. A call from within @@ -650,7 +544,7 @@ static struct utrace *get_utrace_lock(st return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS); } - utrace = rcu_dereference(target->utrace); + utrace = task_utrace_struct(target); smp_rmb(); if (unlikely(!utrace) || unlikely(target->exit_state == EXIT_DEAD)) { /* @@ -659,24 +553,26 @@ static struct utrace *get_utrace_lock(st * have started. A call to this engine's report_reap * callback might already be in progress. */ - utrace = ERR_PTR(-ESRCH); + ret = -ESRCH; } else { spin_lock(&utrace->lock); - if (unlikely(rcu_dereference(target->utrace) != utrace) || - unlikely(!engine->ops) || + if (unlikely(!engine->ops) || unlikely(engine->ops == &utrace_detached_ops)) { /* * By the time we got the utrace lock, * it had been reaped or detached already. */ spin_unlock(&utrace->lock); - utrace = ERR_PTR(-ESRCH); + ret = -ESRCH; if (!attached && engine->ops == &utrace_detached_ops) - utrace = ERR_PTR(-ERESTARTSYS); + ret = -ERESTARTSYS; } } rcu_read_unlock(); + if (ret) + return ERR_PTR(ret); + return utrace; } @@ -732,8 +628,8 @@ restart: goto restart; } - rcu_utrace_free(utrace); /* Releases the lock. */ - + CHECK_DEAD(utrace); + spin_unlock(&utrace->lock); put_detached_list(&detached); } @@ -744,15 +640,7 @@ restart: */ void utrace_release_task(struct task_struct *target) { - struct utrace *utrace; - - task_lock(target); - utrace = rcu_dereference(target->utrace); - rcu_assign_pointer(target->utrace, NULL); - task_unlock(target); - - if (unlikely(!utrace)) - return; + struct utrace *utrace = task_utrace_struct(target); spin_lock(&utrace->lock); /* @@ -763,7 +651,7 @@ void utrace_release_task(struct task_str if (likely(!list_empty(&utrace->attached))) { utrace->reap = 1; - if (!(target->utrace_flags & DEATH_EVENTS)) { + if (!(utrace->flags & DEATH_EVENTS)) { utrace_reap(target, utrace); /* Unlocks and frees. */ return; } @@ -853,7 +741,7 @@ int utrace_set_events(struct task_struct if (unlikely(IS_ERR(utrace))) return PTR_ERR(utrace); - old_utrace_flags = target->utrace_flags; + old_utrace_flags = utrace->flags; set_utrace_flags = events; old_flags = engine->flags; @@ -899,12 +787,12 @@ int utrace_set_events(struct task_struct spin_unlock(&utrace->lock); return -EALREADY; } - target->utrace_flags |= set_utrace_flags; + utrace->flags |= set_utrace_flags; read_unlock(&tasklist_lock); } engine->flags = events | (engine->flags & ENGINE_STOP); - target->utrace_flags |= set_utrace_flags; + utrace->flags |= set_utrace_flags; if ((set_utrace_flags & UTRACE_EVENT_SYSCALL) && !(old_utrace_flags & UTRACE_EVENT_SYSCALL)) @@ -961,7 +849,7 @@ static bool utrace_do_stop(struct task_s * through utrace_get_signal() before doing anything else. */ if (task_is_stopped(target) && - !(target->utrace_flags & UTRACE_EVENT(JCTL))) { + !(utrace->flags & UTRACE_EVENT(JCTL))) { utrace->stopped = 1; return true; } @@ -974,10 +862,10 @@ static bool utrace_do_stop(struct task_s * if it has already been through * utrace_report_death(), or never will. */ - if (!(target->utrace_flags & DEATH_EVENTS)) + if (!(utrace->flags & DEATH_EVENTS)) utrace->stopped = stopped = true; } else if (task_is_stopped(target)) { - if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) + if (!(utrace->flags & UTRACE_EVENT(JCTL))) utrace->stopped = stopped = true; } else if (!utrace->report && !utrace->interrupt) { utrace->report = 1; @@ -1017,7 +905,7 @@ static void utrace_wakeup(struct task_st /* * This is called when there might be some detached engines on the list or - * some stale bits in @task->utrace_flags. Clean them up and recompute the + * some stale bits in @task->utrace.flags. Clean them up and recompute the * flags. * * @action is NULL when @task is stopped and @utrace->stopped is set; wake @@ -1064,7 +952,7 @@ static void utrace_reset(struct task_str clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE); } - task->utrace_flags = flags; + utrace->flags = flags; if (wake) utrace_wakeup(task, utrace); @@ -1075,21 +963,8 @@ static void utrace_reset(struct task_str if (flags) { spin_unlock(&utrace->lock); } else { - /* - * No more engines, clear out the utrace. Here we can race - * with utrace_release_task(). If it gets task_lock() - * first, then it cleans up this struct for us. - */ - - task_lock(task); - if (unlikely(task->utrace != utrace)) { - task_unlock(task); - spin_unlock(&utrace->lock); - } else { - rcu_assign_pointer(task->utrace, NULL); - task_unlock(task); - rcu_utrace_free(utrace); - } + CHECK_DEAD(utrace); + spin_unlock(&utrace->lock); if (action) *action = UTRACE_RESUME; @@ -1241,7 +1116,7 @@ int utrace_control(struct task_struct *t unlikely(utrace->reap)) { spin_unlock(&utrace->lock); return -ESRCH; - } else if (unlikely(target->utrace_flags & DEATH_EVENTS) || + } else if (unlikely(utrace->flags & DEATH_EVENTS) || unlikely(utrace->death)) { /* * We have already started the death report, or @@ -1464,7 +1339,7 @@ static void start_report(struct utrace * * returns from engine callbacks. If any engine's last callback used * UTRACE_STOP, we do UTRACE_REPORT here to ensure we stop before user * mode. If there were no callbacks made, it will recompute - * @task->utrace_flags to avoid another false-positive. + * @task->utrace.flags to avoid another false-positive. */ static void finish_report(struct utrace_report *report, struct task_struct *task, struct utrace *utrace) @@ -1627,7 +1502,7 @@ void utrace_report_exec(struct linux_bin struct pt_regs *regs) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); REPORT(task, utrace, &report, UTRACE_EVENT(EXEC), @@ -1641,7 +1516,7 @@ void utrace_report_exec(struct linux_bin bool utrace_report_syscall_entry(struct pt_regs *regs) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); start_report(utrace); @@ -1684,7 +1559,7 @@ bool utrace_report_syscall_entry(struct void utrace_report_syscall_exit(struct pt_regs *regs) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), @@ -1700,23 +1575,23 @@ void utrace_report_syscall_exit(struct p void utrace_report_clone(unsigned long clone_flags, struct task_struct *child) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); /* * We don't use the REPORT() macro here, because we need - * to clear utrace->u.live.cloning before finish_report(). + * to clear utrace->cloning before finish_report(). * After finish_report(), utrace can be a stale pointer * in cases when report.action is still UTRACE_RESUME. */ start_report(utrace); - utrace->u.live.cloning = child; + utrace->cloning = child; REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(CLONE), report_clone, report.action, engine, task, clone_flags, child); - utrace->u.live.cloning = NULL; + utrace->cloning = NULL; finish_report(&report, task, utrace); /* @@ -1739,7 +1614,7 @@ void utrace_report_clone(unsigned long c */ void utrace_finish_vfork(struct task_struct *task) { - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); spin_lock(&utrace->lock); if (!utrace->vfork_stop) @@ -1757,7 +1632,7 @@ void utrace_finish_vfork(struct task_str void utrace_report_jctl(int notify, int what) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); bool was_stopped = task_is_stopped(task); @@ -1768,29 +1643,17 @@ void utrace_report_jctl(int notify, int * * While in TASK_STOPPED, we can be considered safely * stopped by utrace_do_stop() and detached asynchronously. - * If we woke up and checked task->utrace_flags before that + * If we woke up and checked task->utrace.flags before that * was finished, we might be here with utrace already * removed or in the middle of being removed. * - * RCU makes it safe to get the utrace->lock even if it's - * being freed. Once we have that lock, either an external - * detach has finished and this struct has been freed, or - * else we know we are excluding any other detach attempt. - * * If we are indeed attached, then make sure we are no * longer considered stopped while we run callbacks. */ - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (unlikely(!utrace)) { - rcu_read_unlock(); - return; - } spin_lock(&utrace->lock); utrace->stopped = 0; utrace->report = 0; spin_unlock(&utrace->lock); - rcu_read_unlock(); REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); @@ -1825,7 +1688,7 @@ void utrace_report_jctl(int notify, int void utrace_report_exit(long *exit_code) { struct task_struct *task = current; - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); long orig_code = *exit_code; @@ -1935,7 +1798,7 @@ static void finish_resume_report(struct */ void utrace_resume(struct task_struct *task, struct pt_regs *regs) { - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); struct utrace_attached_engine *engine, *next; @@ -1987,13 +1850,13 @@ void utrace_resume(struct task_struct *t /* * Return true if current has forced signal_pending(). * - * This is called only when current->utrace_flags is nonzero, so we know + * This is called only when current->utrace.flags is nonzero, so we know * that current->utrace must be set. It's not inlined in tracehook.h * just so that struct utrace can stay opaque outside this file. */ bool utrace_interrupt_pending(void) { - return current->utrace->interrupt; + return current->utrace.interrupt; } /* @@ -2034,7 +1897,7 @@ int utrace_get_signal(struct task_struct __releases(task->sighand->siglock) __acquires(task->sighand->siglock) { - struct utrace *utrace; + struct utrace *utrace = task_utrace_struct(task); struct k_sigaction *ka; INIT_REPORT(report); struct utrace_attached_engine *engine, *next; @@ -2043,44 +1906,13 @@ int utrace_get_signal(struct task_struct u32 ret; int signr; - /* - * We could have been considered quiescent while we were in - * TASK_STOPPED, and detached asynchronously. If we woke up - * and checked task->utrace_flags before that was finished, - * we might be here with utrace already removed or in the - * middle of being removed. - */ - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (unlikely(!utrace)) { - rcu_read_unlock(); - return 0; - } - if (utrace->interrupt || utrace->report || utrace->signal_handler) { /* * We've been asked for an explicit report before we * even check for pending signals. */ - spin_unlock_irq(&task->sighand->siglock); - - /* - * RCU makes it safe to get the utrace->lock even if - * it's being freed. Once we have that lock, either an - * external detach has finished and this struct has been - * freed, or else we know we are excluding any other - * detach attempt. - */ spin_lock(&utrace->lock); - rcu_read_unlock(); - - if (unlikely(task->utrace != utrace)) { - spin_unlock(&utrace->lock); - cond_resched(); - return -1; - } - splice_attaching(utrace); if (unlikely(!utrace->interrupt) && unlikely(!utrace->report)) @@ -2123,12 +1955,11 @@ int utrace_get_signal(struct task_struct event = 0; ka = NULL; memset(return_ka, 0, sizeof *return_ka); - } else if ((task->utrace_flags & UTRACE_EVENT_SIGNAL_ALL) == 0) { + } else if ((utrace->flags & UTRACE_EVENT_SIGNAL_ALL) == 0) { /* * If noone is interested in intercepting signals, * let the caller just dequeue them normally. */ - rcu_read_unlock(); return 0; } else { if (unlikely(utrace->stopped)) { @@ -2147,17 +1978,9 @@ int utrace_get_signal(struct task_struct */ spin_unlock_irq(&task->sighand->siglock); spin_lock(&utrace->lock); - rcu_read_unlock(); - if (unlikely(task->utrace != utrace)) { - spin_unlock(&utrace->lock); - cond_resched(); - return -1; - } utrace->stopped = 0; spin_unlock(&utrace->lock); spin_lock_irq(&task->sighand->siglock); - } else { - rcu_read_unlock(); } /* @@ -2209,7 +2032,7 @@ int utrace_get_signal(struct task_struct * Now that we know what event type this signal is, * we can short-circuit if noone cares about those. */ - if ((task->utrace_flags & (event | UTRACE_EVENT(QUIESCE))) == 0) + if ((utrace->flags & (event | UTRACE_EVENT(QUIESCE))) == 0) return signr; /* @@ -2398,7 +2221,7 @@ int utrace_get_signal(struct task_struct */ void utrace_signal_handler(struct task_struct *task, int stepping) { - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); spin_lock(&utrace->lock); @@ -2544,23 +2367,19 @@ EXPORT_SYMBOL_GPL(task_user_regset_view) */ struct task_struct *utrace_tracer_task(struct task_struct *target) { - struct utrace *utrace; + struct utrace *utrace = task_utrace_struct(target); struct task_struct *tracer = NULL; + struct list_head *pos, *next; + struct utrace_attached_engine *engine; + const struct utrace_engine_ops *ops; - utrace = rcu_dereference(target->utrace); - if (utrace != NULL) { - struct list_head *pos, *next; - struct utrace_attached_engine *engine; - const struct utrace_engine_ops *ops; - list_for_each_safe(pos, next, &utrace->attached) { - engine = list_entry(pos, struct utrace_attached_engine, - entry); - ops = rcu_dereference(engine->ops); - if (ops->tracer_task) { - tracer = (*ops->tracer_task)(engine, target); - if (tracer != NULL) - break; - } + list_for_each_safe(pos, next, &utrace->attached) { + engine = list_entry(pos, struct utrace_attached_engine, entry); + ops = rcu_dereference(engine->ops); + if (ops->tracer_task) { + tracer = (*ops->tracer_task)(engine, target); + if (tracer != NULL) + break; } } @@ -2573,7 +2392,7 @@ struct task_struct *utrace_tracer_task(s */ int utrace_unsafe_exec(struct task_struct *task) { - struct utrace *utrace = task->utrace; + struct utrace *utrace = task_utrace_struct(task); struct utrace_attached_engine *engine, *next; const struct utrace_engine_ops *ops; int unsafe = 0; @@ -2592,11 +2411,11 @@ int utrace_unsafe_exec(struct task_struc */ void task_utrace_proc_status(struct seq_file *m, struct task_struct *p) { - struct utrace *utrace = rcu_dereference(p->utrace); - if (unlikely(utrace)) - seq_printf(m, "Utrace: %lx%s%s%s\n", - p->utrace_flags, - utrace->stopped ? " (stopped)" : "", - utrace->report ? " (report)" : "", - utrace->interrupt ? " (interrupt)" : ""); + struct utrace *utrace = task_utrace_struct(p); + + seq_printf(m, "Utrace: %lx%s%s%s\n", + utrace->flags, + utrace->stopped ? " (stopped)" : "", + utrace->report ? " (report)" : "", + utrace->interrupt ? " (interrupt)" : ""); } Index: utrace-19jan/include/linux/tracehook.h =================================================================== --- utrace-19jan.orig/include/linux/tracehook.h +++ utrace-19jan/include/linux/tracehook.h @@ -370,8 +370,7 @@ static inline void tracehook_report_vfor static inline void tracehook_prepare_release_task(struct task_struct *task) { smp_mb(); - if (task_utrace_struct(task) != NULL) - utrace_release_task(task); + utrace_release_task(task); } /** @@ -385,21 +384,8 @@ static inline void tracehook_prepare_rel */ static inline void tracehook_finish_release_task(struct task_struct *task) { - int bad = 0; ptrace_release_task(task); BUG_ON(task->exit_state != EXIT_DEAD); - if (unlikely(task_utrace_struct(task) != NULL)) { - /* - * In a race condition, utrace_attach() will temporarily set - * it, but then check @task->exit_state and clear it. It does - * all this under task_lock(), so we take the lock to check - * that there is really a bug and not just that known race. - */ - task_lock(task); - bad = unlikely(task_utrace_struct(task) != NULL); - task_unlock(task); - } - BUG_ON(bad); } /** Index: utrace-19jan/kernel/ptrace.c =================================================================== --- utrace-19jan.orig/kernel/ptrace.c +++ utrace-19jan/kernel/ptrace.c @@ -778,7 +778,16 @@ static inline bool exclude_ptrace(struct */ static inline bool exclude_ptrace(struct task_struct *task) { - return unlikely(!!task_utrace_struct(task)); + struct utrace *utrace = task_utrace_struct(task); + + spin_lock(&utrace->lock); + if (list_empty(&utrace->attached) && list_empty(&utrace->attaching)) { + spin_unlock(&utrace->lock); + return false; + } + + spin_unlock(&utrace->lock); + return true; } #endif From roland at redhat.com Mon Jan 19 23:20:31 2009 From: roland at redhat.com (Roland McGrath) Date: Mon, 19 Jan 2009 15:20:31 -0800 (PST) Subject: [PATCH] Imbed struct utrace in task_struct In-Reply-To: Ananth N Mavinakayanahalli's message of Monday, 19 January 2009 18:58:38 +0530 <20090119132838.GA3542@in.ibm.com> References: <20090119132838.GA3542@in.ibm.com> Message-ID: <20090119232031.82675FC3C6@magilla.sf.frob.com> Thanks for working on this, Ananth. (Btw, it's "embed.") I think it would be less disruptive (and materially no different) to leave utrace_flags as it is. That field is the one (and only) that is used in hot paths (or used anywhere outside utrace.c). It might in future get moved around to stay in a cache-hot part of task_struct, for example. The long comment above struct utrace is really all about implementation details inside utrace.c and I don't think you should move that commentary to the header file. Instead, put a comment saying that the contents of struct utrace and their use is entirely private to kernel/utrace.c and it is only defined in the header to make its size known for struct task_struct layout (and init_task.h). I committed some cosmetic changes that will make for a little less flutter in your patch. Thanks, Roland From dvlasenk at redhat.com Tue Jan 20 11:24:27 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Tue, 20 Jan 2009 12:24:27 +0100 Subject: Analysis of SINGLESTEP In-Reply-To: <1231903215.3704.0.camel@localhost> References: <494A13F7.8080209@oracle.com> <20081219082938.A068EFC339@magilla.sf.frob.com> <496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost> <20090114022903.486F2FC3DD@magilla.sf.frob.com> <1231903215.3704.0.camel@localhost> Message-ID: <1232450667.3797.8.camel@localhost> Hi Roland, On Wed, 2009-01-14 at 04:20 +0100, Denys Vlasenko wrote: > On Tue, 2009-01-13 at 18:29 -0800, Roland McGrath wrote: > > > Yes. In my testing, latest Fedora kernels fixed ALL regressions > > [...] > > > Impressive. Thanks a lot Roland. > > > > Don't be so impressed. ;-) > > Last I checked, attach-into-signal failed some of the time. > > i.e. > > > > while ./tests/attach-into-signal; do : ; done > > > > won't go forever. Perhaps the test itself should do many iterations. > > Indeed. > > # while ./tests/attach-into-signal; do echo -n . ; done > .......................................attach-into-signal: > attach-into-signal.c:161: reproduce: Unexpected error: No such process. > attach-into-signal: attach-into-signal.c:68: handler_fail: Assertion `0' > failed. > /bin/bash: line 1: 8230 > Aborted ./tests/attach-into-signal Forgot to email you last week: I modified this test to do more iterations, and to be affected by $TESTTIME. With TESTTIME >= 60 it fails for me fairly reliably. -- vda From ananth at in.ibm.com Tue Jan 20 16:30:24 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 20 Jan 2009 22:00:24 +0530 Subject: [PATCH] Imbed struct utrace in task_struct In-Reply-To: <20090119232031.82675FC3C6@magilla.sf.frob.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> Message-ID: <20090120163024.GA5289@in.ibm.com> On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote: > (Btw, it's "embed.") Indeed :-) > I think it would be less disruptive (and materially no different) > to leave utrace_flags as it is. That field is the one (and only) > that is used in hot paths (or used anywhere outside utrace.c). > It might in future get moved around to stay in a cache-hot part > of task_struct, for example. > > The long comment above struct utrace is really all about implementation > details inside utrace.c and I don't think you should move that commentary > to the header file. Instead, put a comment saying that the contents of > struct utrace and their use is entirely private to kernel/utrace.c and it > is only defined in the header to make its size known for struct task_struct > layout (and init_task.h). Agreed. > I committed some cosmetic changes that will make for a little less flutter > in your patch. Thanks! Working on it at the moment. I was able to test the new patch on powerpc without issues, but haven't been able to test it on x86 successfully yet. Will post the patch soon. Ananth From confirm-s2-ppk21cxft5ojs4eh33mdp3r2fckccjvk-utrace-devel=redhat.com at yahoogrupos.com.br Tue Jan 20 19:14:20 2009 From: confirm-s2-ppk21cxft5ojs4eh33mdp3r2fckccjvk-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos) Date: 20 Jan 2009 19:14:20 -0000 Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo de_amigo_para_amigo Message-ID: <1232478860.16.72496.w124@yahoogrupos.com.br> Ol? utrace-devel at redhat.com, Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo do Yahoo! Grupos, um servi?o de comunidades online gratuito e super f?cil de usar. Este pedido expirar? em 7 dias. PARA ENTRAR NESTE GRUPO: 1) V? para o site do Yahoo! Grupos clicando neste link: http://br.groups.yahoo.com/i?i=ppk21cxft5ojs4eh33mdp3r2fckccjvk&e=utrace-devel%40redhat%2Ecom (Se n?o funcionar, use os comandos para cortar e colar o link acima na barra de endere?o do seu navegador.) -OU- 2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar", no seu programa de e-mail. Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo de_amigo_para_amigo, por favor, ignore esta mensagem. Sauda??es, Atendimento ao usu?rio do Yahoo! Grupos O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html From bungapapan at gmail.com Wed Jan 21 00:48:36 2009 From: bungapapan at gmail.com (Bunga Papan Untuk Ucapan) Date: Wed, 21 Jan 2009 07:48:36 +0700 Subject: =?iso-8859-1?q?PERKENALAN_=3A_=91Sakura_Florist=94_=3D_menerima_?= =?iso-8859-1?q?pesanan_khusus_pembuatan_bunga_papan_?= Message-ID: <200901210047.n0L0kn5C032574@mx2.redhat.com> Menerima pesanan khusus pembuatan ?Bunga Papan?? untuk ucapan selamat pernikahan, ucapan belasungkawa, ucapan untuk peresmian usaha, ulang tahun, dll untuk daerah jabodetabek. Pesanan dari luar kota untuk relasi anda di Jakarta bisa menggunakan jasa kami. Harga mulai Rp.350.000,- Terimakasih, 021 93606390 0818745955 http://www.bungapapan.multiply.com/ email : bungapapan at gmail.com messenger : bungapapan at hotmail.com From asee at asee2009conference.org Wed Jan 21 00:55:42 2009 From: asee at asee2009conference.org (ASEE 2009) Date: Tue, 20 Jan 2009 16:55:42 -0800 Subject: Second CFP: American Society of Engineering Education Northeast Conference Message-ID: <200901210055.n0L0thC1028538@mx3.redhat.com> Dear Colleagues, If you received this email in error, please forward it to the appropriate department at your institution. If you wish to unsubscribe please follow the unsubscribe link at bottom of the email. Please do not reply to this message. If you need to contact us please email us at info at asee2009conference.org ********************************************************************* * American Society for Engineering Education * * ASEE Spring 2009 Northeast Conference * * * * * * University of Bridgeport * * * * * * http://www.asee2009conference.org * * * * * * April 3-4, 2009 * * * ********************************************************************* --------------------------------------------------------------------- CONFERENCE OVERVIEW --------------------------------------------------------------------- The Spring 2009 Northeast ASEE Conference will be held on April 3-4, 2009 at the University of Bridgeport, Bridgeport, Connecticut, U.S.A. This year's conference theme is: Engineering in the New Global Economy. In the coming years, our world will continue to face economical, environmental and energy related problems. How is Engineering and Engineering Technology Education responding to the needs of our society and the world? This will be the theme for an exhilarating and thought provoking weekend of professional workshops, presentations, and discussions at the University of Bridgeport. The ASEE Northeast Section is soliciting faculty papers, student papers and student posters which address the various challenges and paradigms in this technological world through research and instructional programs in Engineering and Engineering Technology education. There are three conference tracks: 1. Regular/ faculty papers 2. Student papers and 3. Student posters The deadline for abstract submission is February 27th, 2009. Prospective authors are invited to submit their abstracts online in Microsoft Word or Adobe PDF format through the conference website at http://www.asee2009conference.org Suggested conference topics are listed below. Other innovations in course and laboratory experiences and assessments are also most welcome for submission: ? Chemical and Biological Engineering ? Civil & Environmental Engineering ? Electrical & Computer Engineering ? Engineering Technology/ Community Colleges ? Industrial, Automation and Manufacturing Engineering ? Engineering Technology and Community Colleges ? Innovations In Engineering Education ? First Year Experiences ? K-12 Education (Engineering Curriculum Integration) ? Mechanical Engineering ? Computer Science and Information Technology ? Women in Engineering and Computer Science ? Robotics ? Service Learning ? Sustainability ? Design Projects ? Engineering and Technology in the Liberal Arts ? Systems Engineering ? Globalization ? Ethics ? Diversity In Engineering ? Multidisciplinary Research Paper and other Proposal Submissions ================= Prospective authors are invited to submit their abstracts online in Microsoft Word or Adobe PDF format through the website of the conference at http://www.asee2009conference.org. Proposals for special sessions, tutorials, worskshops and exhibitions are also weclcome. Please check the conference website regarding instructions for these proposal submissions. Important Dates =============== Abstracts due 27th February, 2009 Acceptance notification 6th March, 2009 Final manuscript & Registration due 20th March, 2009 ------------------------------------------------------------------------ Sarosh Patel ASEE NE 2009 Technical Support Team University of Bridgeport 221 University Avenue e-mail:info at asee2009conference.org Bridgeport, CT 06604, U.S.A. http://www.asee2009conference.org ------------------------------------------------------------------------ Click here on http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=182&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3 to update your profile or Unsubscribe From ananth at in.ibm.com Wed Jan 21 06:28:25 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Wed, 21 Jan 2009 11:58:25 +0530 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090119232031.82675FC3C6@magilla.sf.frob.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> Message-ID: <20090121062825.GD3251@in.ibm.com> On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote: > Thanks for working on this, Ananth. (Btw, it's "embed.") > > I think it would be less disruptive (and materially no different) > to leave utrace_flags as it is. That field is the one (and only) > that is used in hot paths (or used anywhere outside utrace.c). > It might in future get moved around to stay in a cache-hot part > of task_struct, for example. > > The long comment above struct utrace is really all about implementation > details inside utrace.c and I don't think you should move that commentary > to the header file. Instead, put a comment saying that the contents of > struct utrace and their use is entirely private to kernel/utrace.c and it > is only defined in the header to make its size known for struct task_struct > layout (and init_task.h). > > I committed some cosmetic changes that will make for a little less flutter > in your patch. Here is V2 of the patch. Tested and works fine. Same two tests on powerpc fail, all tests pass on x86, while there are some occurances of the ptrace.c WARN_ON. Roland, I've tried to tweak the comments appropriately. Please feel free to modify them as you consider fit. Signed-off-by: Ananth N Mavinakayanahalli --- include/linux/sched.h | 3 include/linux/tracehook.h | 16 -- include/linux/utrace.h | 48 ++++-- kernel/ptrace.c | 11 + kernel/utrace.c | 331 +++++++++++----------------------------------- 5 files changed, 126 insertions(+), 283 deletions(-) Index: utrace-20jan/include/linux/sched.h =================================================================== --- utrace-20jan.orig/include/linux/sched.h +++ utrace-20jan/include/linux/sched.h @@ -88,6 +88,7 @@ struct sched_param { #include #include #include +#include #include @@ -1267,7 +1268,7 @@ struct task_struct { seccomp_t seccomp; #ifdef CONFIG_UTRACE - struct utrace *utrace; + struct utrace utrace; unsigned long utrace_flags; #endif Index: utrace-20jan/include/linux/utrace.h =================================================================== --- utrace-20jan.orig/include/linux/utrace.h +++ utrace-20jan/include/linux/utrace.h @@ -33,13 +33,37 @@ #include #include #include -#include +#include struct linux_binprm; +struct linux_binfmt; struct pt_regs; -struct utrace; +struct task_struct; struct user_regset; struct user_regset_view; +struct seq_file; + +/* + * Per-thread structure task_struct.utrace refers to. + * + * The structure and its contents are private to kernel/utrace.c and is + * defined here only so its size is known for struct task_struct layout + */ +struct utrace { + struct task_struct *cloning; + struct list_head attached, attaching; + spinlock_t lock; + + struct utrace_attached_engine *reporting; + + unsigned int stopped:1; + unsigned int report:1; + unsigned int interrupt:1; + unsigned int signal_handler:1; + unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */ + unsigned int death:1; /* in utrace_report_death() now */ + unsigned int reap:1; /* release_task() has run */ +}; /* * Event bits passed to utrace_set_events(). @@ -133,7 +157,7 @@ static inline struct utrace *task_utrace { return NULL; } -static inline void utrace_init_task(struct task_struct *child) +static inline void utrace_init_task(struct task_struct *task) { } @@ -144,22 +168,10 @@ static inline void task_utrace_proc_stat #else /* CONFIG_UTRACE */ -static inline unsigned long task_utrace_flags(struct task_struct *task) -{ - return task->utrace_flags; -} - -static inline struct utrace *task_utrace_struct(struct task_struct *task) -{ - return task->utrace; -} - -static inline void utrace_init_task(struct task_struct *child) -{ - child->utrace_flags = 0; - child->utrace = NULL; -} +#define task_utrace_flags(task) ((task)->utrace_flags) +#define task_utrace_struct(task) (&(task)->utrace) +void utrace_init_task(struct task_struct *task); void task_utrace_proc_status(struct seq_file *m, struct task_struct *p); /** Index: utrace-20jan/kernel/utrace.c =================================================================== --- utrace-20jan.orig/kernel/utrace.c +++ utrace-20jan/kernel/utrace.c @@ -10,103 +10,56 @@ * Red Hat Author: Roland McGrath. */ -#include +#include #include #include #include #include #include -#include #include #include #include #include #include +#include /* - * Per-thread structure task_struct.utrace points to. + * struct utrace, defined in utrace.h is private to this file. Its + * defined there just so struct task_struct knows its size. * - * The task itself never has to worry about this going away after - * some event is found set in task_struct.utrace_flags. - * Once created, this pointer is changed only when the task is quiescent - * (TASK_TRACED or TASK_STOPPED with the siglock held, or dead). - * - * For other parties, the pointer to this is protected by RCU and - * task_lock. Since call_rcu is never used while the thread is alive and - * using this struct utrace, we can overlay the RCU data structure used - * only for a dead struct with some local state used only for a live utrace - * on an active thread. - * - * The two lists @attached and @attaching work together for smooth - * asynchronous attaching with low overhead. Modifying either list - * requires @lock. The @attaching list can be modified any time while - * holding @lock. New engines being attached always go on this list. - * - * The @attached list is what the task itself uses for its reporting - * loops. When the task itself is not quiescent, it can use the - * @attached list without taking any lock. Noone may modify the list - * when the task is not quiescent. When it is quiescent, that means - * that it won't run again without taking @lock itself before using - * the list. + * The two lists @utrace->attached and @utrace->attaching work together + * for smooth asynchronous attaching with low overhead. Modifying + * either list requires @utrace->lock. The @utrace->attaching list + * can be modified any time while holding @utrace->lock. New engines + * being attached always go on this list. + * + * The @utrace->attached list is what the task itself uses for its + * reporting loops. When the task itself is not quiescent, it can + * use the @utrace->attached list without taking any lock. Noone + * may modify the list when the task is not quiescent. When it is + * quiescent, that means that it won't run again without taking + * @utrace->lock itself before using the list. * * At each place where we know the task is quiescent (or it's current), - * while holding @lock, we call splice_attaching(), below. This moves - * the @attaching list members on to the end of the @attached list. - * Since this happens at the start of any reporting pass, any new - * engines attached asynchronously go on the stable @attached list - * in time to have their callbacks seen. - */ -struct utrace { - union { - struct rcu_head dead; - struct { - struct task_struct *cloning; - } live; - } u; - - struct list_head attached, attaching; - spinlock_t lock; - - struct utrace_attached_engine *reporting; - - unsigned int stopped:1; - unsigned int report:1; - unsigned int interrupt:1; - unsigned int signal_handler:1; - unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */ - unsigned int death:1; /* in utrace_report_death() now */ - unsigned int reap:1; /* release_task() has run */ -}; + * while holding @utrace->lock, we call splice_attaching(), below. + * This moves the @utrace->attaching list members on to the end of + * the @utrace->attached list. Since this happens at the start of + * any reporting pass, any new engines attached asynchronously go + * on the stable @utrace->attached list in time to have their + * callbacks seen. + */ -static struct kmem_cache *utrace_cachep; static struct kmem_cache *utrace_engine_cachep; static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */ static int __init utrace_init(void) { - utrace_cachep = KMEM_CACHE(utrace, SLAB_PANIC); utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC); return 0; } module_init(utrace_init); -static void utrace_free(struct rcu_head *rhead) -{ - struct utrace *utrace = container_of(rhead, struct utrace, u.dead); - kmem_cache_free(utrace_cachep, utrace); -} - -/* - * Called with utrace locked. Clean it up and free it via RCU. - */ -static void rcu_utrace_free(struct utrace *utrace) - __releases(utrace->lock) -{ - spin_unlock(&utrace->lock); - call_rcu(&utrace->u.dead, utrace_free); -} - /* * This is called with @utrace->lock held when the task is safely * quiescent, i.e. it won't consult utrace->attached without the lock. @@ -172,8 +125,12 @@ static inline bool exclude_utrace(struct /* * Initialize the struct, initially zero'd. */ -static inline void init_utrace_struct(struct utrace *utrace) +void utrace_init_task(struct task_struct *task) { + struct utrace *utrace = task_utrace_struct(task); + + task->utrace_flags = 0; + utrace->cloning = NULL; INIT_LIST_HEAD(&utrace->attached); INIT_LIST_HEAD(&utrace->attaching); spin_lock_init(&utrace->lock); @@ -181,8 +138,6 @@ static inline void init_utrace_struct(st /* * Called without locks. - * Allocate target->utrace and install engine in it. If we lose a race in - * setting it up, return -EAGAIN. This function mediates startup races. * The creating parent task has priority, and other callers will delay here * to let its call succeed and take the new utrace lock first. */ @@ -199,8 +154,8 @@ static int utrace_first_engine(struct ta * report_clone hook has had a chance to run. */ if (target->flags & PF_STARTING) { - utrace = current->utrace; - if (!utrace || utrace->u.live.cloning != target) { + utrace = task_utrace_struct(current); + if (utrace->cloning != target) { yield(); if (signal_pending(current)) return -ERESTARTNOINTR; @@ -208,11 +163,7 @@ static int utrace_first_engine(struct ta } } - utrace = kmem_cache_zalloc(utrace_cachep, GFP_KERNEL); - if (unlikely(!utrace)) - return -ENOMEM; - init_utrace_struct(utrace); - + utrace = task_utrace_struct(target); list_add(&engine->entry, &utrace->attached); ret = -EAGAIN; @@ -220,9 +171,7 @@ static int utrace_first_engine(struct ta task_lock(target); if (exclude_utrace(target)) { ret = -EBUSY; - } else if (likely(!target->utrace)) { - rcu_assign_pointer(target->utrace, utrace); - + } else { /* * The task_lock protects us against another thread doing * the same thing. We might still be racing against @@ -240,30 +189,20 @@ static int utrace_first_engine(struct ta spin_unlock(&utrace->lock); return 0; } - - /* - * The target has already been through release_task. - * Our caller will restart and notice it's too late now. - */ - target->utrace = NULL; } /* - * Another engine attached first, so there is a struct already. - * A null return says to restart looking for the existing one. + * Another engine attached first. + * Restart looking for the existing one. */ task_unlock(target); spin_unlock(&utrace->lock); - kmem_cache_free(utrace_cachep, utrace); return ret; } /* - * Called with rcu_read_lock() held. - * Lock utrace and verify that it's still installed in target->utrace. - * If not, return -EAGAIN. - * Then enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE. + * Enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE. */ static int utrace_second_engine(struct task_struct *target, struct utrace *utrace, @@ -276,13 +215,7 @@ static int utrace_second_engine(struct t spin_lock(&utrace->lock); - if (unlikely(rcu_dereference(target->utrace) != utrace)) { - /* - * We lost a race with other CPUs doing a sequence - * of detach and attach before we got in. - */ - ret = -EAGAIN; - } else if ((flags & UTRACE_ATTACH_EXCLUSIVE) && + if ((flags & UTRACE_ATTACH_EXCLUSIVE) && unlikely(matching_engine(utrace, flags, ops, data))) { ret = -EEXIST; } else { @@ -344,18 +277,15 @@ struct utrace_attached_engine *utrace_at { struct utrace *utrace; struct utrace_attached_engine *engine; - int ret; + int ret = 0; restart: - rcu_read_lock(); - utrace = rcu_dereference(target->utrace); - smp_rmb(); + utrace = task_utrace_struct(target); if (unlikely(target->exit_state == EXIT_DEAD)) { /* * The target has already been reaped. * Check this first; a race with reaping may lead to restart. */ - rcu_read_unlock(); if (!(flags & UTRACE_ATTACH_CREATE)) return ERR_PTR(-ENOENT); return ERR_PTR(-ESRCH); @@ -363,19 +293,14 @@ restart: if (!(flags & UTRACE_ATTACH_CREATE)) { engine = NULL; - if (utrace) { - spin_lock(&utrace->lock); - engine = matching_engine(utrace, flags, ops, data); - if (engine) - utrace_engine_get(engine); - spin_unlock(&utrace->lock); - } - rcu_read_unlock(); + spin_lock(&utrace->lock); + engine = matching_engine(utrace, flags, ops, data); + if (engine) + utrace_engine_get(engine); + spin_unlock(&utrace->lock); return engine ?: ERR_PTR(-ENOENT); } - rcu_read_unlock(); - if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops)) return ERR_PTR(-EINVAL); @@ -398,15 +323,12 @@ restart: engine->ops = ops; engine->data = data; - rcu_read_lock(); - utrace = rcu_dereference(target->utrace); - if (!utrace) { - rcu_read_unlock(); + if ((ret == 0) && (list_empty(&utrace->attached))) { + /* First time here, set engines up */ ret = utrace_first_engine(target, engine); } else { ret = utrace_second_engine(target, utrace, engine, flags, ops, data); - rcu_read_unlock(); } if (unlikely(ret)) { @@ -555,28 +477,23 @@ static bool utrace_stop(struct task_stru try_to_freeze(); killed = false; - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (utrace) { + /* + * utrace_wakeup() clears @utrace->stopped before waking us up. + * We're officially awake if it's clear. + */ + spin_lock(&utrace->lock); + if (unlikely(utrace->stopped)) { /* - * utrace_wakeup() clears @utrace->stopped before waking us up. - * We're officially awake if it's clear. + * If we're here with it still set, it must have been + * signal_wake_up() instead, waking us up for a SIGKILL. */ - spin_lock(&utrace->lock); - if (unlikely(utrace->stopped)) { - /* - * If we're here with it still set, it must have been - * signal_wake_up() instead, waking us up for a SIGKILL. - */ - spin_lock_irq(&task->sighand->siglock); - WARN_ON(!sigismember(&task->pending.signal, SIGKILL)); - spin_unlock_irq(&task->sighand->siglock); - utrace->stopped = 0; - killed = true; - } - spin_unlock(&utrace->lock); + spin_lock_irq(&task->sighand->siglock); + WARN_ON(!sigismember(&task->pending.signal, SIGKILL)); + spin_unlock_irq(&task->sighand->siglock); + utrace->stopped = 0; + killed = true; } - rcu_read_unlock(); + spin_unlock(&utrace->lock); /* * While we were in TASK_TRACED, complete_signal() considered @@ -613,6 +530,7 @@ static struct utrace *get_utrace_lock(st __acquires(utrace->lock) { struct utrace *utrace; + int ret = 0; /* * You must hold a ref to be making a call. A call from within @@ -644,33 +562,34 @@ static struct utrace *get_utrace_lock(st return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS); } - utrace = rcu_dereference(target->utrace); + utrace = task_utrace_struct(target); smp_rmb(); - if (unlikely(!utrace) || unlikely(target->exit_state == EXIT_DEAD)) { + if (unlikely(target->exit_state == EXIT_DEAD)) { /* * If all engines detached already, utrace is clear. * Otherwise, we're called after utrace_release_task might * have started. A call to this engine's report_reap * callback might already be in progress. */ - utrace = ERR_PTR(-ESRCH); + ret = -ESRCH; } else { spin_lock(&utrace->lock); - if (unlikely(rcu_dereference(target->utrace) != utrace) || - unlikely(!engine->ops) || + if (unlikely(!engine->ops) || unlikely(engine->ops == &utrace_detached_ops)) { /* * By the time we got the utrace lock, * it had been reaped or detached already. */ spin_unlock(&utrace->lock); - utrace = ERR_PTR(-ESRCH); + ret = -ESRCH; if (!attached && engine->ops == &utrace_detached_ops) - utrace = ERR_PTR(-ERESTARTSYS); + ret = -ERESTARTSYS; } } rcu_read_unlock(); + if (ret) + return ERR_PTR(ret); return utrace; } @@ -690,7 +609,7 @@ static void put_detached_list(struct lis /* * Called with utrace->lock held. - * Notify and clean up all engines, then free utrace. + * Notify and clean up all engines. */ static void utrace_reap(struct task_struct *target, struct utrace *utrace) __releases(utrace->lock) @@ -726,33 +645,23 @@ restart: goto restart; } - rcu_utrace_free(utrace); /* Releases the lock. */ - + spin_unlock(&utrace->lock); put_detached_list(&detached); } #define DEATH_EVENTS (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE)) /* - * Called by release_task. After this, target->utrace must be cleared. + * Called by release_task. */ void utrace_release_task(struct task_struct *target) { - struct utrace *utrace; - - task_lock(target); - utrace = rcu_dereference(target->utrace); - rcu_assign_pointer(target->utrace, NULL); - task_unlock(target); - - if (unlikely(!utrace)) - return; + struct utrace *utrace = task_utrace_struct(target); spin_lock(&utrace->lock); /* - * If the list is empty, utrace is already on its way to be freed. * We raced with detach and we won the task_lock race but lost the - * utrace->lock race. All we have to do is let RCU run. + * utrace->lock race. */ if (likely(!list_empty(&utrace->attached))) { utrace->reap = 1; @@ -1066,25 +975,8 @@ static void utrace_reset(struct task_str /* * If any engines are left, we're done. */ - if (flags) { - spin_unlock(&utrace->lock); - } else { - /* - * No more engines, clear out the utrace. Here we can race - * with utrace_release_task(). If it gets task_lock() - * first, then it cleans up this struct for us. - */ - - task_lock(task); - if (unlikely(task->utrace != utrace)) { - task_unlock(task); - spin_unlock(&utrace->lock); - } else { - rcu_assign_pointer(task->utrace, NULL); - task_unlock(task); - rcu_utrace_free(utrace); - } - + spin_unlock(&utrace->lock); + if (!flags) { if (action) *action = UTRACE_RESUME; } @@ -1699,18 +1591,18 @@ void utrace_report_clone(unsigned long c /* * We don't use the REPORT() macro here, because we need - * to clear utrace->u.live.cloning before finish_report(). + * to clear utrace->cloning before finish_report(). * After finish_report(), utrace can be a stale pointer * in cases when report.action is still UTRACE_RESUME. */ start_report(utrace); - utrace->u.live.cloning = child; + utrace->cloning = child; REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(CLONE), report_clone, report.action, engine, task, clone_flags, child); - utrace->u.live.cloning = NULL; + utrace->cloning = NULL; finish_report(&report, task, utrace); /* @@ -1766,25 +1658,13 @@ void utrace_report_jctl(int notify, int * was finished, we might be here with utrace already * removed or in the middle of being removed. * - * RCU makes it safe to get the utrace->lock even if it's - * being freed. Once we have that lock, either an external - * detach has finished and this struct has been freed, or - * else we know we are excluding any other detach attempt. - * * If we are indeed attached, then make sure we are no * longer considered stopped while we run callbacks. */ - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (unlikely(!utrace)) { - rcu_read_unlock(); - return; - } spin_lock(&utrace->lock); utrace->stopped = 0; utrace->report = 0; spin_unlock(&utrace->lock); - rcu_read_unlock(); REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); @@ -1987,7 +1867,7 @@ void utrace_resume(struct task_struct *t */ bool utrace_interrupt_pending(void) { - return current->utrace->interrupt; + return current->utrace.interrupt; } /* @@ -2028,7 +1908,7 @@ int utrace_get_signal(struct task_struct __releases(task->sighand->siglock) __acquires(task->sighand->siglock) { - struct utrace *utrace; + struct utrace *utrace = task_utrace_struct(task); struct k_sigaction *ka; INIT_REPORT(report); struct utrace_attached_engine *engine, *next; @@ -2037,44 +1917,13 @@ int utrace_get_signal(struct task_struct u32 ret; int signr; - /* - * We could have been considered quiescent while we were in - * TASK_STOPPED, and detached asynchronously. If we woke up - * and checked task->utrace_flags before that was finished, - * we might be here with utrace already removed or in the - * middle of being removed. - */ - rcu_read_lock(); - utrace = rcu_dereference(task->utrace); - if (unlikely(!utrace)) { - rcu_read_unlock(); - return 0; - } - if (utrace->interrupt || utrace->report || utrace->signal_handler) { /* * We've been asked for an explicit report before we * even check for pending signals. */ - spin_unlock_irq(&task->sighand->siglock); - - /* - * RCU makes it safe to get the utrace->lock even if - * it's being freed. Once we have that lock, either an - * external detach has finished and this struct has been - * freed, or else we know we are excluding any other - * detach attempt. - */ spin_lock(&utrace->lock); - rcu_read_unlock(); - - if (unlikely(task->utrace != utrace)) { - spin_unlock(&utrace->lock); - cond_resched(); - return -1; - } - splice_attaching(utrace); if (unlikely(!utrace->interrupt) && unlikely(!utrace->report)) @@ -2122,7 +1971,6 @@ int utrace_get_signal(struct task_struct * If noone is interested in intercepting signals, * let the caller just dequeue them normally. */ - rcu_read_unlock(); return 0; } else { if (unlikely(utrace->stopped)) { @@ -2141,17 +1989,9 @@ int utrace_get_signal(struct task_struct */ spin_unlock_irq(&task->sighand->siglock); spin_lock(&utrace->lock); - rcu_read_unlock(); - if (unlikely(task->utrace != utrace)) { - spin_unlock(&utrace->lock); - cond_resched(); - return -1; - } utrace->stopped = 0; spin_unlock(&utrace->lock); spin_lock_irq(&task->sighand->siglock); - } else { - rcu_read_unlock(); } /* @@ -2542,11 +2382,7 @@ struct task_struct *utrace_tracer_task(s struct utrace_attached_engine *engine; const struct utrace_engine_ops *ops; struct task_struct *tracer = NULL; - struct utrace *utrace; - - utrace = rcu_dereference(target->utrace); - if (!utrace) - return NULL; + struct utrace *utrace = task_utrace_struct(target); list_for_each_safe(pos, next, &utrace->attached) { engine = list_entry(pos, struct utrace_attached_engine, @@ -2587,9 +2423,8 @@ int utrace_unsafe_exec(struct task_struc */ void task_utrace_proc_status(struct seq_file *m, struct task_struct *p) { - struct utrace *utrace = rcu_dereference(p->utrace); - if (likely(!utrace)) - return; + struct utrace *utrace = task_utrace_struct(p); + seq_printf(m, "Utrace: %lx%s%s%s\n", p->utrace_flags, utrace->stopped ? " (stopped)" : "", Index: utrace-20jan/include/linux/tracehook.h =================================================================== --- utrace-20jan.orig/include/linux/tracehook.h +++ utrace-20jan/include/linux/tracehook.h @@ -370,8 +370,7 @@ static inline void tracehook_report_vfor static inline void tracehook_prepare_release_task(struct task_struct *task) { smp_mb(); - if (task_utrace_struct(task) != NULL) - utrace_release_task(task); + utrace_release_task(task); } /** @@ -385,21 +384,8 @@ static inline void tracehook_prepare_rel */ static inline void tracehook_finish_release_task(struct task_struct *task) { - int bad = 0; ptrace_release_task(task); BUG_ON(task->exit_state != EXIT_DEAD); - if (unlikely(task_utrace_struct(task) != NULL)) { - /* - * In a race condition, utrace_attach() will temporarily set - * it, but then check @task->exit_state and clear it. It does - * all this under task_lock(), so we take the lock to check - * that there is really a bug and not just that known race. - */ - task_lock(task); - bad = unlikely(task_utrace_struct(task) != NULL); - task_unlock(task); - } - BUG_ON(bad); } /** Index: utrace-20jan/kernel/ptrace.c =================================================================== --- utrace-20jan.orig/kernel/ptrace.c +++ utrace-20jan/kernel/ptrace.c @@ -778,7 +778,16 @@ static inline bool exclude_ptrace(struct */ static inline bool exclude_ptrace(struct task_struct *task) { - return unlikely(!!task_utrace_struct(task)); + struct utrace *utrace = task_utrace_struct(task); + + spin_lock(&utrace->lock); + if (list_empty(&utrace->attached) && list_empty(&utrace->attaching)) { + spin_unlock(&utrace->lock); + return false; + } + + spin_unlock(&utrace->lock); + return true; } #endif From de_amigo_para_amigo-owner at yahoogrupos.com.br Wed Jan 21 11:14:14 2009 From: de_amigo_para_amigo-owner at yahoogrupos.com.br (Moderador do grupo de_amigo_para_amigo) Date: 21 Jan 2009 11:14:14 -0000 Subject: Bem-vindo ao grupo de_amigo_para_amigo! Message-ID: <1232536454.186.25968.m44@yahoogrupos.com.br> Ol?, Bem-vindo ao grupo de_amigo_para_amigo em Yahoo! Grupos. Voc? est? pronto para se conectar com seu grupo, ? s? come?ar! Confira todos as simples (e gratuitas) maneiras de se comunicar, compartilhar e descobrir: * Voc? escolhe quando e como manter contato * Compartilhe fotos, arquivos, enquetes, calend?rios, links e muito mais * Transfira rapidamente novas mensagens e encontre arquivos detalhados * Aproveite muitas outras maneiras de se comunicar - 24/7 Comece Visit de_amigo_para_amigo j?! http://us.rd.yahoo.com/evt=42879/*http://br.groups.yahoo.com/group/de_amigo_para_amigo Sauda??es, Moderador de_amigo_para_amigo Complete sua conta do Yahoo! Grupos agora: ---------------------------------------------------------------------- Seu endere?o de e-mail foi adicionado ? lista de discuss?o de um grupo do Yahoo!. Para ter acesso a todos os recursos web dispon?veis para o grupo (arquivo de mensagens, compartilhamento de fotos e arquivos, agenda, etc.) e, al?m disso, ter mais controle sobre as suas op??es para recebimento de mensagens, recomendamos que voc? complete sua conta associando seu endere?o de e-mail a uma conta do Yahoo!. Fazer isso ? r?pido, f?cil e gratuito. Visite o link abaixo para saber mais: http://br.groups.yahoo.com/convacct?email=utrace-devel%40redhat.com&list=de_amigo_para_amigo O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html From shunt at recordsreduction.com Thu Jan 22 16:37:35 2009 From: shunt at recordsreduction.com (Shane Hunt) Date: Thu, 22 Jan 2009 08:37:35 -0800 Subject: =?utf-8?q?Do_you_dread_moving_the_=E2=80=9908_files_to_make_room?= =?utf-8?b?IGZvciDigJkwOT8=?= Message-ID: <200901221711.n0MGoLDM000797@mx2.redhat.com> Let us do it for you?.FREE of charge. Records Reduction, Inc. is offering FREE pickup for new customers in January & February, 2009. In addition, we will also pull the files from the filing cabinets and box them at NO CHARGE! That?s right, this year you will have to touch a file to get ready for ?09 files. It?s the perfect time for you to begin using our services. Scanning ? This is the best solutions for files that you must keep long term,or that require a lot of retrievals. Records Reduction, Inc. will scan them in and provide a legal copy on disk. You can save the files on your system and have a networked imaging solution with no additional software. Off site file storage ? This is the most economical solution for files that you don?t have to keep long term and for those that are rarely retrieved. Shredding ? If you have files that no longer have to be kept, let us pick them up and provide secure shredding. It?s also a great solution for any documents that contain Names, Social Security Numbers, or other identifying information. We can do large purges, or provide secure bins for ongoing shredding. Please call Shane Hunt @ 704-724-3313, or email shunt at recordsreduction.com for more information. www.recordsreduction.com Electronic filing (scanning/imaging) is the best long-term storage solution for any files that you must keep long term, or if you do a lot of retrievals from them. Examples include, but are not limited to: Accounts Payable Human Resources Medical Charts EOBs Sales Files Job Files Accounts Receivable Engineering Drawings School Records Educational Materials Legal Files Real Estate Files Bill of Ladings Workers Comp Files Which Service is Right for You? Document Scanning Document scanning is perfect for files that you must store for a long time ? typically five years or greater. Also, if you have to do many retrievals, scanning will pay for itself by increasing efficiencies in the office. With scanning, there are no ongoing costs. You pay once and you have a legal copy of your business documents forever. Some examples where scanning makes sense include Accounts Payables, Job Files, Corporate Financials, Medical Files, Legal Files, Insurance Documents, Human Resources, etc. www.recordsreduction.com Offsite Record Storage Offsite document Storage is best for files that you do not have to keep forever, and do very little retrievals. Records Reduction, Inc. provides records storage, retrieval, delivery and pick-up services for companies in the Carolinas. Records are stored at our secure service center where our team members retrieve boxes or individual files as requested by our clients. Records are normally delivered the next day & emergency delivery options are also available. We can always retrieve the file, scan it and email or fax it to you within minutes. Records Reduction, Inc. will become an extension to your existing file room or storage area by providing: - Secure, confidential document storage - Efficient retrieval of records - Next-day & emergency deliveries - The highest level of customer service in the industry We manage your records inventory through computer software tracking system. Once records are entered into our database and placed into storage, our customers can simply call or email and have their files physically or electronically delivered. www.recordsreduction.com Ongoing, Onsite Document Destruction Identity theft is the fastest rising crime in America. Companies can be found liable if they do not protect information that can be used in identity theft. You can use our secure bins for paper that contains information that might be used for identity theft. Many companies now use the bins for ALL of their discarded paper - sensitive or not - simply because they know it will be recycled. It's just another way to help protect our planet! Records Reduction, Inc. provides FREE locked, secure containers for thestorage of your confidential material while awaiting destruction. The containers are attractive and fit in well with all office environments. Our containers will segregate and secure sensitive materials in between our service visits. The containers are locked and can only be opened by authorized personnel, eliminating the chance of sensitive documents being made public or falling into the wrong hands. The locked containers will be picked up and placed in a secure document shredding system. In addition to paper document shredding services, Records Reduction provides secure destruction services for X-Rays, Computer Hard Drives, CDs, and Magnetic Media Tapes. www.recordsreduction.com Bulk Purge Shredding Services Companies file away storage boxes year after year. Often, they are kept long after their legal requirement. Shredding has become a necessary business service to not only comply with regulatory requirements but to protect your business, employees and customers from identity theft. Experts recommend that you shred most files as soon as it is legally permissible. Records Reduction, Inc. can provide onsite or offsite secure shredding services. www.recordsreduction.com eDocHealth ? Electronic Medical Records Solution Enhance Patient Care, reduce cost of operations and increase revenues through eDocHealth. eDocHealth is a proven medical document management solution that instantly improves medical office document access as well as practice workflow by electronically scanning and filing your documents and making them accessible to your entire staff regardless of their location. When you minimize paper-based activity and work within a digital environment, you trim overhead costs by reducing reliance on paper, streamline workflow with quick access to information, and protect patient records with strict user-control. The burden of administrative and clinical documents in a medical practice is considerable. Busy offices lead to inaccessible administrative documents and charts; whether misplaced, lost, or in use by another staff member. Physician practices continue to seek a solution to reduce or eliminate the increasing volumes of paper within their organizations. The optimal product would eliminate the issues of overcrowded office space and storage facilities as well as the problems associated with paper medical records such as lost or misplaced patient charts, patient EOBs, etc. Medical staff and providers demand a user friendly HIPAA compliant solution that enhances patient care, and reduces cost of operations while increasing revenue and generating a rapid return on investment (ROI). eDocHealth is a cost-effective way to meet those needs, by automation of administrative and clinical documents management. eDocHealth does not force you to change your office workflow, instead, it can adapt to it or be configured for ?best operational practices?. eDocHealth can work in conjunction with your Practice Management software and Electronic Medical Records software (EMR/EHR). In most cases document management solutions are better suited to manage medical records than traditional EMR/EHR. It is a non fact that document management solutions have near 98% implementation success while traditional EMR/HER solutions are more challenging endeavors. www.recordsreduction.com PO Box 3322, Matthews, NC 28106 http://app.streamsend.com/private/tF8d/2bm/cAm25g7/unsubscribe/2511712 -------------- next part -------------- An HTML attachment was scrubbed... URL: From trujillo_shiloh at sm.sexsm.org Fri Jan 23 19:41:03 2009 From: trujillo_shiloh at sm.sexsm.org (Booty Lox) Date: Sat, 24 Jan 2009 02:41:03 +0700 Subject: What are you waiting for ? Message-ID: <38eb01c97dcd$07395279$5a862f59@[89.47.134.90]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From eliseo.salijs at srvkiit3sc1vzs0emm4p8h7.usercash.com Fri Jan 23 22:33:48 2009 From: eliseo.salijs at srvkiit3sc1vzs0emm4p8h7.usercash.com (Jeams) Date: Sat, 24 Jan 2009 03:33:48 +0500 Subject: Fw: Passed up for the promotion ... right ? Message-ID: <18e301c97dd4$021622f7$2bc49fc8@BRTEL196043.res-com.brtel.com.br> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From dore_marius at rlwd.com Sat Jan 24 01:11:31 2009 From: dore_marius at rlwd.com (Fale Danil) Date: Sat, 24 Jan 2009 04:11:31 +0300 Subject: Fw: Degree = prestige ! Message-ID: <261001c97dd9$17222139$d70c7cd4@[212.124.12.215]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From deluxe_khurram at spacemanetc.com Fri Jan 23 22:48:16 2009 From: deluxe_khurram at spacemanetc.com (Huseyin) Date: Sat, 24 Jan 2009 04:48:16 +0600 Subject: but I'm only missing twenty credits ... Message-ID: <3c2f01c97dde$3c3bae40$39c3a24e@[78.162.195.57]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From katty at ruiterlighting.nl Fri Jan 23 22:00:26 2009 From: katty at ruiterlighting.nl (Bogner Freddy) Date: Sat, 24 Jan 2009 05:00:26 +0700 Subject: Degree = advancement ! Message-ID: <1b9601c97de0$0115dfa0$abd0557c@p5171-ipbf506souka.saitama.ocn.ne.jp> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From jacinto.oganezov at sposabellanoivas.com Sat Jan 24 02:24:26 2009 From: jacinto.oganezov at sposabellanoivas.com (Newman) Date: Sat, 24 Jan 2009 05:24:26 +0300 Subject: Fw: Door-unlocker ! Message-ID: <0dae01c97de4$05e1d2f0$038b603b@[59.96.139.3]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From ragazzone_shelby at simson-maxwell.net Fri Jan 23 22:29:01 2009 From: ragazzone_shelby at simson-maxwell.net (Vanilson) Date: Sat, 24 Jan 2009 05:29:01 +0700 Subject: Fw: Get a better position ! Message-ID: <245601c97de4$0e311802$0dce2f5c@[92.47.206.13]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From tomlinson_menno at sexxxyvideos.com Sat Jan 24 01:29:49 2009 From: tomlinson_menno at sexxxyvideos.com (Yarkova Caprio) Date: Sat, 24 Jan 2009 05:29:49 +0400 Subject: Fw: Get the recognition that you deserve ! Message-ID: <48e801c97de4$2408636a$3e0c97c1@62-12.alba.ua> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From jastrzebska_krish at sinotek.net Sat Jan 24 01:34:14 2009 From: jastrzebska_krish at sinotek.net (Entchen Crenshaw) Date: Sat, 24 Jan 2009 05:34:14 +0400 Subject: Fw: Passed up, again ? Message-ID: <5cc901c97de5$2f40015c$28b0b95a@0133300159.0.fullrate.dk> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From vanguers.vaz at sec7.com Sat Jan 24 00:44:07 2009 From: vanguers.vaz at sec7.com (Slavnova Nika) Date: Sat, 24 Jan 2009 05:44:07 +0500 Subject: Do you have anough life experience ? Message-ID: <72cd01c97de6$0554dbdf$02871dd0@[208.29.135.2]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don?t you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! Ring Anytime +1-904-346-1158 From correo.comercial at adistech.net Sat Jan 24 08:51:20 2009 From: correo.comercial at adistech.net (.) Date: Sat, 24 Jan 2009 09:51:20 +0100 Subject: =?iso-8859-1?q?Mejor_imposible=2E=2E=2Eultimas_existencias_en_li?= =?iso-8859-1?q?quidaci=F3n=2E=2E=2E?= Message-ID: <20090124085118.51FE78F59B@svr.adistech.net> Publicidad Promoci?n v?lida a partir del 21/01/2009 Adistech Europe, S.L. adistech.europesl at gmail.com PD: Para cualquier consulta, puedes ponerte en contacto con nuestro equipo al tel. (+34) 93 481 4162 Si deseas darte de baja de nuestras listas de distribuciones, por favor pulsa aqu? (poniendo en el asunto la palabra "baja"). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Promocion.jpg Type: image/jpeg Size: 94903 bytes Desc: not available URL: From morgrimm_tanya at uook-s.com Mon Jan 26 08:46:49 2009 From: morgrimm_tanya at uook-s.com (Filemon) Date: Mon, 26 Jan 2009 09:46:49 +0100 Subject: Smile and dial ! Message-ID: <300001c97f9b$1549608c$c2bf514d@[77.81.191.194]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +19043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From graficarmc at pop.com.br Mon Jan 26 02:59:27 2009 From: graficarmc at pop.com.br (RMC Visual) Date: Mon, 26 Jan 2009 02:59:27 GMT Subject: =?iso-8859-1?q?Comunicar_!!!_Faz_a_Diferen=E7a=2E?= Message-ID: <20090126025934.9606652F9D42@postfix41.rmcvisual.com> An HTML attachment was scrubbed... URL: From rumen at thefordadvantage.com Mon Jan 26 14:52:14 2009 From: rumen at thefordadvantage.com (Scavetta Deian) Date: Mon, 26 Jan 2009 15:52:14 +0100 Subject: Schaaf Annya VIP world Message-ID: <263901c97fce$0629a420$a67497c1@harrier.sx5.cable.tolna.net> We don?t accept just anyone... For the most prestigious gaming experience around, visit Exclusive Club Casino. http://www.best-winner-casino-usa.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alti_hitoshi at stocktradeinsider.com Mon Jan 26 14:59:37 2009 From: alti_hitoshi at stocktradeinsider.com (Romanyuk) Date: Mon, 26 Jan 2009 15:59:37 +0100 Subject: Romanyuk Dreseler VIP club Message-ID: <38af01c97fcf$003bae9e$154e0abe@[190.10.78.21]> We don?t accept just anyone... For the most prestigious gaming experience around, visit Exclusive Club Casino. http://www.casino-usa-online.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From napoli.arlen at vanrollovers.com Mon Jan 26 15:28:09 2009 From: napoli.arlen at vanrollovers.com (Ejsotet Fleerackers) Date: Mon, 26 Jan 2009 16:28:09 +0100 Subject: Ejsotet Kurt VIP club Message-ID: <6b8401c97fd3$0aab023c$496b18bd@18924107073.user.veloxzone.com.br> We don?t accept just anyone... For the most prestigious gaming experience around, visit Exclusive Club Casino. http://www.the-online-usa-casino-club.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From speedpromarketing at speedpromarketing.com Tue Jan 27 17:54:17 2009 From: speedpromarketing at speedpromarketing.com (Fabiano Couto) Date: Tue, 27 Jan 2009 17:54:17 GMT Subject: Tv via Internet 3000Canais 24horas Message-ID: <200901271811.n0RIBWNv028873@mx3.redhat.com> An HTML attachment was scrubbed... URL: From fche at redhat.com Tue Jan 27 19:54:26 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 27 Jan 2009 14:54:26 -0500 Subject: proof-of-concept, utrace->ftrace engine Message-ID: <20090127195425.GF32568@redhat.com> Hi - Here's the start of a little ditty that ties process-related events as hooked by the Roland McGrath's utrace code into the ftrace buffer/control widgetry. If nothing else, think of it as one potential in-tree user of utrace. Script started on Tue 27 Jan 2009 02:39:06 PM EST [root at vm-fed10-64 tracing]# cat available_tracers process wakeup irqsoff sysprof sched_switch nop [root at vm-fed10-64 tracing]# echo process > current_tracer [root at vm-fed10-64 tracing]# echo 500 > process_trace_uid_filter [root at vm-fed10-64 tracing]# cat trace # tracer: process # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | [root at vm-fed10-64 tracing]# su - fche % vm-fed10-64 /home/fche [14:39:50] % pwd /home/fche % vm-fed10-64 /home/fche [14:39:52] % ls /tmp firstbootX.log pulse-PKdhtXMmr18n stapbXg0xB stapUniATd foo stap6cNJ5M stapl9Ww2f virtual-fche.4SkpzQ kerneloops.pxnITL stap9MajHI stapT1LKnQ % vm-fed10-64 /home/fche [14:39:59] % df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 13706328 11417980 2149176 85% / /dev/sda1 194442 34259 150144 19% /boot tmpfs 382320 0 382320 0% /dev/shm super:/home 1300999168 496440320 750835712 40% /home % vm-fed10-64 /home/fche [14:40:03] % exit Tue Jan 27 14:40:05 EST 2009 [root at vm-fed10-64 tracing]# cat trace # tracer: process # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | zsh 2091 0 2616701.950948 exec zsh 2091 0 2616701.966410 fork 2092 flags 0x1200011 whoami 2092 1 2616702.005276 exec whoami 2092 0 2616702.008612 exit 0 zsh 2091 0 2616702.009193 signal 17 errno 0 code 262145 zsh 2091 0 2616702.011385 fork 2093 flags 0x1200011 mkdir 2093 1 2616702.013701 exec mkdir 2093 0 2616702.017300 exit 0 zsh 2091 0 2616702.018133 signal 17 errno 0 code 262145 zsh 2091 0 2616702.018951 fork 2094 flags 0x1200011 whoami 2094 0 2616702.023867 exec whoami 2094 0 2616702.026108 exit 0 zsh 2091 0 2616702.026567 signal 17 errno 0 code 262145 zsh 2091 0 2616702.027358 fork 2095 flags 0x1200011 mkdir 2095 1 2616702.029712 exec mkdir 2095 1 2616702.031703 exit 0 zsh 2091 0 2616702.032275 signal 17 errno 0 code 262145 zsh 2091 0 2616702.035062 fork 2096 flags 0x1200011 zsh 2096 1 2616702.036457 exit 0 zsh 2091 0 2616702.037344 fork 2097 flags 0x1200011 zsh 2091 0 2616702.038959 signal 17 errno 0 code 262145 egrep 2097 1 2616702.039692 exec egrep 2097 1 2616702.041620 exit 256 zsh 2091 0 2616702.042150 signal 17 errno 0 code 262145 zsh 2091 0 2616702.043095 fork 2098 flags 0x1200011 zsh 2098 1 2616702.044435 exit 0 zsh 2091 0 2616702.045329 fork 2099 flags 0x1200011 zsh 2091 0 2616702.046846 signal 17 errno 0 code 262145 egrep 2099 1 2616702.047646 exec egrep 2099 1 2616702.049571 exit 0 zsh 2091 0 2616702.050141 signal 17 errno 0 code 262145 zsh 2091 0 2616702.051020 fork 2100 flags 0x1200011 zsh 2100 0 2616702.052046 exit 0 zsh 2091 0 2616702.053346 fork 2101 flags 0x1200011 zsh 2091 0 2616702.054672 signal 17 errno 0 code 262145 egrep 2101 1 2616702.055515 exec egrep 2101 1 2616702.057346 exit 0 zsh 2091 0 2616702.057907 signal 17 errno 0 code 262145 zsh 2091 0 2616702.058982 fork 2102 flags 0x1200011 id 2102 1 2616702.064822 exec id 2102 1 2616702.067609 exit 0 zsh 2091 0 2616702.068307 signal 17 errno 0 code 262145 zsh 2091 0 2616702.069246 fork 2103 flags 0x1200011 hostname 2103 0 2616702.072067 exec hostname 2103 0 2616702.074154 exit 0 zsh 2091 0 2616702.074766 signal 17 errno 0 code 262145 zsh 2091 0 2616702.076529 fork 2104 flags 0x1200011 zsh 2104 1 2616702.077982 exit 0 zsh 2091 0 2616702.079742 fork 2105 flags 0x1200011 zsh 2091 0 2616702.081672 signal 17 errno 0 code 262145 grep 2105 1 2616702.082929 exec grep 2105 0 2616702.087867 exec grep 2105 0 2616702.089716 exit 256 zsh 2091 0 2616702.090205 signal 17 errno 0 code 262145 zsh 2091 0 2616702.092925 fork 2106 flags 0x1200011 tput 2106 1 2616702.099077 exec tput 2106 1 2616702.100918 exit 0 zsh 2091 0 2616702.101588 signal 17 errno 0 code 262145 zsh 2091 0 2616702.102659 fork 2107 flags 0x1200011 dircolors 2107 1 2616702.108917 exec dircolors 2107 1 2616702.110359 exit 0 zsh 2091 0 2616702.110997 signal 17 errno 0 code 262145 zsh 2091 0 2616702.134110 fork 2108 flags 0x1200011 egrep 2108 0 2616702.136910 exec egrep 2108 0 2616702.138921 exit 256 zsh 2091 0 2616702.139430 signal 17 errno 0 code 262145 zsh 2091 0 2616702.141230 fork 2109 flags 0x1200011 zsh 2109 1 2616702.142714 exit 0 zsh 2091 0 2616702.143685 fork 2110 flags 0x1200011 zsh 2091 0 2616702.145204 signal 17 errno 0 code 262145 grep 2110 1 2616702.145974 exec grep 2110 1 2616702.147934 exit 256 zsh 2091 0 2616702.150523 signal 17 errno 0 code 262145 zsh 2091 0 2616702.151842 fork 2111 flags 0x1200011 zsh 2111 1 2616702.153271 exit 0 zsh 2091 0 2616702.154703 fork 2112 flags 0x1200011 zsh 2091 0 2616702.156063 signal 17 errno 0 code 262145 grep 2112 1 2616702.157028 exec grep 2112 1 2616702.158834 exit 256 zsh 2091 0 2616702.159476 signal 17 errno 0 code 262145 zsh 2091 0 2616702.160319 fork 2113 flags 0x1200011 id 2113 1 2616702.162848 exec id 2113 1 2616702.165115 exit 0 zsh 2091 0 2616702.165872 signal 17 errno 0 code 262145 zsh 2091 0 2616702.168590 fork 2114 flags 0x1200011 consoletype 2114 1 2616702.171021 exec consoletype 2114 1 2616702.171988 exit 512 zsh 2091 0 2616702.172443 signal 17 errno 0 code 262145 zsh 2091 0 2616702.181959 fork 2115 flags 0x1200011 whoami 2115 1 2616702.188936 exec whoami 2115 1 2616702.191366 exit 0 zsh 2091 0 2616702.192051 signal 17 errno 0 code 262145 zsh 2091 0 2616702.194605 fork 2116 flags 0x1200011 mkdir 2116 0 2616702.197377 exec mkdir 2116 0 2616702.199480 exit 0 zsh 2091 0 2616702.200084 signal 17 errno 0 code 262145 zsh 2091 0 2616702.201017 fork 2117 flags 0x1200011 whoami 2117 0 2616702.206033 exec whoami 2117 0 2616702.208245 exit 0 zsh 2091 0 2616702.208888 signal 17 errno 0 code 262145 zsh 2091 0 2616702.209836 fork 2118 flags 0x1200011 mkdir 2118 0 2616702.212527 exec mkdir 2118 0 2616702.214474 exit 0 zsh 2091 0 2616702.215117 signal 17 errno 0 code 262145 zsh 2091 0 2616702.217011 fork 2119 flags 0x1200011 stty 2119 0 2616702.220137 exec stty 2119 0 2616702.223496 exit 0 zsh 2091 0 2616702.223977 signal 17 errno 0 code 262145 zsh 2091 0 2616702.229063 fork 2120 flags 0x1200011 mesg 2120 0 2616702.232073 exec mesg 2120 0 2616702.233994 exit 0 zsh 2091 0 2616702.234454 signal 17 errno 0 code 262145 zsh 2091 0 2616711.333172 fork 2121 flags 0x1200011 ls 2121 0 2616711.336055 exec ls 2121 0 2616711.356496 exit 0 zsh 2091 0 2616711.364547 signal 17 errno 0 code 262145 zsh 2091 0 2616714.474787 fork 2125 flags 0x1200011 df 2125 0 2616714.479280 exec df 2125 0 2616714.483010 exit 0 zsh 2091 0 2616714.483701 signal 17 errno 0 code 262145 zsh 2091 0 2616716.594615 fork 2126 flags 0x1200011 clear 2126 0 2616716.598083 exec clear 2126 0 2616716.599856 exit 0 zsh 2091 0 2616716.600439 signal 17 errno 0 code 262145 zsh 2091 0 2616716.601532 fork 2127 flags 0x1200011 date 2127 0 2616716.613852 exec date 2127 0 2616716.619608 exit 0 zsh 2091 0 2616716.620334 signal 17 errno 0 code 262145 zsh 2091 0 2616716.632090 fork 2128 flags 0x1200011 clear 2128 0 2616716.634284 exec clear 2128 0 2616716.636012 exit 0 zsh 2091 0 2616716.636775 signal 17 errno 0 code 262145 zsh 2091 0 2616716.637448 exit 0 [root at vm-fed10-64 tracing]# nop > current_tracer [root at vm-fed10-64 tracing]# cat trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | [root at vm-fed10-64 tracing]# exit Script done on Tue 27 Jan 2009 02:40:26 PM EST diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h new file mode 100644 index 0000000..f902443 --- /dev/null +++ b/include/linux/processtrace.h @@ -0,0 +1,33 @@ +#ifndef PROCESSTRACE_H +#define PROCESSTRACE_H + +#include +#include + +struct process_trace_entry { + unsigned char opcode; /* one of _UTRACE_EVENT_* */ + char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */ + union { + struct { + pid_t child; + unsigned long flags; + } trace_clone; + struct { + long code; + } trace_exit; + struct { + } trace_exec; + struct { + int si_signo; + int si_errno; + int si_code; + } trace_signal; + }; +}; + +/* in kernel/trace/trace_process.c */ + +extern void enable_process_trace (void); +extern void disable_process_trace (void); + +#endif /* PROCESSTRACE_H */ diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 33dbefd..9276863 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -119,6 +119,15 @@ config CONTEXT_SWITCH_TRACER This tracer gets called from the context switch and records all switching of tasks. +config PROCESS_TRACER + bool "Trace process events via utrace" + depends on DEBUG_KERNEL + select TRACING + select UTRACE + help + This tracer provides trace records from process events + accessible to utrace: lifecycle, system calls, and signals. + config BOOT_TRACER bool "Trace boot initcalls" depends on DEBUG_KERNEL diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index c8228b1..b06a5d6 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -24,5 +24,6 @@ obj-$(CONFIG_NOP_TRACER) += trace_nop.o obj-$(CONFIG_STACK_TRACER) += trace_stack.o obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o obj-$(CONFIG_BOOT_TRACER) += trace_boot.o +obj-$(CONFIG_PROCESS_TRACER) += trace_process.o libftrace-y := ftrace.o diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 8465ad0..7c0cd57 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -7,6 +7,7 @@ #include #include #include +#include #include enum trace_type { @@ -22,6 +23,7 @@ enum trace_type { TRACE_MMIO_RW, TRACE_MMIO_MAP, TRACE_BOOT, + TRACE_PROCESS, __TRACE_LAST_TYPE }; @@ -117,6 +119,11 @@ struct trace_boot { struct boot_trace initcall; }; +struct trace_process { + struct trace_entry ent; + struct process_trace_entry event; +}; + /* * trace_flag_type is an enumeration that holds different * states when a trace occurs. These are: @@ -219,6 +226,7 @@ extern void __ftrace_bad_type(void); IF_ASSIGN(var, ent, struct trace_mmiotrace_map, \ TRACE_MMIO_MAP); \ IF_ASSIGN(var, ent, struct trace_boot, TRACE_BOOT); \ + IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \ __ftrace_bad_type(); \ } while (0) diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c new file mode 100644 index 0000000..10c2c3c --- /dev/null +++ b/kernel/trace/trace_process.c @@ -0,0 +1,440 @@ +/* + * utrace-based process event tracing + * Copyright (C) 2009 Red Hat Inc. + * By Frank Ch. Eigler + */ + +#define DEBUG 1 + +#include +#include +#include +#include + +#include "trace.h" + +/* A process must match these filters in order to be traced. */ +static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */ +static u32 trace_taskuid_filter = -1; /* -1: unrestricted */ + +/* A process must be a direct child of given pid in order to be + followed. */ +static u32 process_follow_pid; /* 0: unrestricted/systemwide */ + +/* XXX: lock the above? */ + + +/* trace data collection */ + +static struct trace_array *process_trace_array; + +static void process_reset_data(struct trace_array *tr) +{ + int cpu; + + pr_debug("in %s\n", __func__); + tr->time_start = ftrace_now(tr->cpu); + for_each_online_cpu(cpu) + tracing_reset(tr, cpu); +} + +static void process_trace_init(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + process_trace_array = tr; + if (tr->ctrl) { + process_reset_data(tr); + enable_process_trace(); + } +} + +static void process_trace_reset(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + if (tr->ctrl) + disable_process_trace(); + process_reset_data(tr); + process_trace_array = NULL; +} + +static void process_trace_ctrl_update(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + if (tr->ctrl) { + process_reset_data(tr); + enable_process_trace(); + } else { + disable_process_trace(); + } +} + +static void __trace_processtrace(struct trace_array *tr, + struct trace_array_cpu *data, + struct process_trace_entry *ent) +{ + struct ring_buffer_event *event; + struct trace_process *entry; + unsigned long irq_flags; + + event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry), + &irq_flags); + if (!event) + return; + entry = ring_buffer_event_data(event); + tracing_generic_entry_update(&entry->ent, 0, preempt_count()); + entry->ent.cpu = raw_smp_processor_id(); + entry->ent.type = TRACE_PROCESS; + strlcpy (ent->comm, current->comm, TASK_COMM_LEN); + entry->event = *ent; + ring_buffer_unlock_commit(tr->buffer, event, irq_flags); + + trace_wake_up(); +} + +void process_trace(struct process_trace_entry *ent) +{ + struct trace_array *tr = process_trace_array; + struct trace_array_cpu *data = tr->data[smp_processor_id()]; + + __trace_processtrace(tr, data, ent); +} + + +/* trace data rendering */ + +static void process_pipe_open(struct trace_iterator *iter) +{ + struct trace_seq *s = &iter->seq; + pr_debug("in %s\n", __func__); + trace_seq_printf(s, "VERSION 200901\n"); +} + +static void process_close(struct trace_iterator *iter) +{ + iter->private = NULL; +} + +static ssize_t process_read(struct trace_iterator *iter, struct file *filp, + char __user *ubuf, size_t cnt, loff_t *ppos) +{ + ssize_t ret; + struct trace_seq *s = &iter->seq; + ret = trace_seq_to_user(s, ubuf, cnt); + return (ret == -EBUSY) ? 0 : ret; +} + +static enum print_line_t process_print(struct trace_iterator *iter) +{ + struct trace_entry *entry = iter->ent; + struct trace_process *field; + struct trace_seq *s = &iter->seq; + unsigned long long t = ns2usecs(iter->ts); + unsigned long usec_rem = do_div(t, 1000000ULL); + unsigned secs = (unsigned long)t; + int ret = 1; + + pr_debug("in %s\n", __func__); + trace_assign_type(field, entry); + + /* XXX: If print_lat_fmt() were not static, we wouldn't have + to duplicate this. */ + trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ", + field->event.comm, + entry->pid, entry->cpu, + secs, + usec_rem); + + switch (field->event.opcode) { + case _UTRACE_EVENT_CLONE: + ret = trace_seq_printf(s, "fork %d flags 0x%lx\n", + field->event.trace_clone.child, + field->event.trace_clone.flags); + break; + case _UTRACE_EVENT_EXEC: + ret = trace_seq_printf(s, "exec\n"); + break; + case _UTRACE_EVENT_EXIT: + ret = trace_seq_printf(s, "exit %ld\n", + field->event.trace_exit.code); + break; + case _UTRACE_EVENT_SIGNAL: + ret = trace_seq_printf(s, "signal %d errno %d code %d\n", + field->event.trace_signal.si_signo, + field->event.trace_signal.si_errno, + field->event.trace_signal.si_code); + break; + default: + ret = trace_seq_printf(s, "process code %d?\n", field->event.opcode); + break; + } + if (ret) + return TRACE_TYPE_HANDLED; + return TRACE_TYPE_PARTIAL_LINE; +} + + +static enum print_line_t process_print_line(struct trace_iterator *iter) +{ + switch (iter->ent->type) { + case TRACE_PROCESS: + return process_print(iter); + default: + return TRACE_TYPE_HANDLED; /* ignore unknown entries */ + } +} + +static struct tracer process_tracer __read_mostly = +{ + .name = "process", + .init = process_trace_init, + .reset = process_trace_reset, + .pipe_open = process_pipe_open, + .close = process_close, + .read = process_read, + .ctrl_update = process_trace_ctrl_update, + .print_line = process_print_line, +}; + + + +/* utrace backend */ + +/* Should tracing apply to given task? Compare against filter + values. */ +static int trace_test (struct task_struct *tsk) +{ + if (trace_taskcomm_filter[0] + && strcmp (trace_taskcomm_filter, tsk->comm)) + return 0; + if (trace_taskuid_filter != (u32)-1 + && trace_taskuid_filter != task_uid (tsk)) + return 0; + + return 1; +} + + +static struct utrace_engine_ops process_trace_ops __read_mostly; + +static void process_trace_tryattach (struct task_struct *tsk) +{ + struct utrace_attached_engine *engine; + + pr_debug("in %s\n", __func__); + engine = utrace_attach_task (tsk, UTRACE_ATTACH_CREATE, + & process_trace_ops, NULL); + if (IS_ERR(engine) || (engine == NULL)) { + pr_warning ("utrace_attach_task %d (rc %p)\n", + tsk->pid, engine); + } else { + int rc; + + /* XXX: Why is this not implicit from the fields + set in the process_trace_ops? */ + rc = utrace_set_events (tsk, engine, + UTRACE_EVENT(CLONE) | + UTRACE_EVENT(EXEC) | + UTRACE_EVENT(SIGNAL) | + UTRACE_EVENT(EXIT)); + if (rc == -EINPROGRESS) + rc = utrace_barrier (tsk, engine); + if (rc) + pr_warning ("utrace_set_events/barrier rc %d\n", rc); + + utrace_engine_put (engine); + pr_debug("attached in %s to %s(%d)\n", __func__, tsk->comm, tsk->pid); + } +} + + +u32 process_trace_report_clone (enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *parent, + unsigned long clone_flags, + struct task_struct *child) +{ + if (trace_test (parent)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_CLONE; + ent.trace_clone.child = child->pid; + ent.trace_clone.flags = clone_flags; + process_trace(& ent); + } + + process_trace_tryattach (child); + + return action; +} + + +u32 process_trace_report_exec (enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *task, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs) +{ + if (trace_test (task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXEC; + process_trace(& ent); + } + + /* We're already attached; no need for a new tryattach. */ + + return action; +} + + +u32 process_trace_report_signal (u32 action, + struct utrace_attached_engine *engine, + struct task_struct *task, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka) +{ + if (trace_test (task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SIGNAL; + ent.trace_signal.si_signo = info->si_signo; + ent.trace_signal.si_errno = info->si_errno; + ent.trace_signal.si_code = info->si_code; + process_trace(& ent); + } + + /* We're already attached; no need for a new tryattach. */ + + return action; +} + + +u32 process_trace_report_exit (enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *task, + long orig_code, long *code) +{ + if (trace_test (task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXIT; + ent.trace_exit.code = orig_code; + process_trace(& ent); + } + + /* There is no need to explicitly attach or detach here. */ + + return action; +} + + +void enable_process_trace () { + struct task_struct *grp, *tsk; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + struct mm_struct *mm; + + /* Skip over kernel threads. */ + mm = get_task_mm (tsk); + if (!mm) + continue; + + if (process_follow_pid) { + if (tsk->tgid == process_follow_pid || + tsk->parent->tgid == process_follow_pid) + process_trace_tryattach (tsk); + } else { + process_trace_tryattach (tsk); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + +void disable_process_trace () { + struct utrace_attached_engine *engine; + struct task_struct *grp, *tsk; + int rc; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + if (tsk->pid <= 1) + continue; + + /* Find matching engine, if any. Returns -ENOENT for + unattached threads. */ + engine = utrace_attach_task (tsk, UTRACE_ATTACH_MATCH_OPS, + & process_trace_ops, 0); + if (IS_ERR(engine)) { + if (PTR_ERR(engine) != -ENOENT) + pr_warning ("utrace_attach_task %d (rc %ld)\n", + tsk->pid, -PTR_ERR(engine)); + } else if (engine == NULL) { + pr_warning ("utrace_attach_task %d (null engine)\n", + tsk->pid); + } else { + /* Found one of our own engines. Detach. */ + rc = utrace_control (tsk, engine, UTRACE_DETACH); + switch (rc) { + case 0: /* success */ + break; + case -ESRCH: /* REAP callback already begun */ + case -EALREADY: /* DEATH callback already begun */ + break; + default: + rc = -rc; + pr_warning ("utrace_detach %d (rc %d)\n", + tsk->pid, rc); + break; + } + utrace_engine_put(engine); + pr_debug("detached in %s from %s(%d)\n", __func__, tsk->comm, tsk->pid); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + + +static struct utrace_engine_ops process_trace_ops __read_mostly = { + .report_clone = process_trace_report_clone, + .report_exec = process_trace_report_exec, + .report_exit = process_trace_report_exit, + .report_signal = process_trace_report_signal, +}; + + + +/* control interfaces */ + +static struct debugfs_blob_wrapper trace_taskcomm_filter_blob = { + .data = trace_taskcomm_filter, + .size = sizeof (trace_taskcomm_filter), +}; + +static __init int init_process_trace(void) +{ + struct dentry *d_tracer; + struct dentry *entry; + + d_tracer = tracing_init_dentry(); + + entry = debugfs_create_blob("process_trace_taskcomm_filter", 0644, d_tracer, + & trace_taskcomm_filter_blob); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_taskcomm_filter' entry\n"); + + entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer, + & trace_taskuid_filter); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_uid_filter' entry\n"); + + entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer, + & process_follow_pid); + if (!entry) + pr_warning("Could not create debugfs 'process_follow_pid' entry\n"); + + return register_tracer(&process_tracer); +} + +device_initcall(init_process_trace); From fche at redhat.com Wed Jan 28 00:43:32 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 27 Jan 2009 19:43:32 -0500 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: <20090127224303.GB5850@nowhere> (Frederic Weisbecker's message of "Tue, 27 Jan 2009 23:43:05 +0100") References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> Message-ID: Frederic Weisbecker writes: > [...] > Speaking about a global syscall tracer, I made a patch to trace only the syscalls > with the function-graph-tracer. > http://lkml.org/lkml/2008/12/30/267 This low-level part can easily > be used by all tracers that would like to inspect syscalls. > [...] > Just a change is needed: Steven requested that the part inside > syscall_trace_enter become a tracepoint, making it totally shareable > between tracers and easy to turn on and off. Alternately, you could just rely on utrace's hooks. They were thought out more fully with respect to parameter access, manipulation, and programmatic control befitting even a debugger. - FChE From fweisbec at gmail.com Wed Jan 28 13:58:28 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Wed, 28 Jan 2009 14:58:28 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> Message-ID: 2009/1/28 Frank Ch. Eigler : > Frederic Weisbecker writes: > >> [...] >> Speaking about a global syscall tracer, I made a patch to trace only the syscalls >> with the function-graph-tracer. >> http://lkml.org/lkml/2008/12/30/267 This low-level part can easily >> be used by all tracers that would like to inspect syscalls. >> [...] >> Just a change is needed: Steven requested that the part inside >> syscall_trace_enter become a tracepoint, making it totally shareable >> between tracers and easy to turn on and off. > > Alternately, you could just rely on utrace's hooks. They were thought > out more fully with respect to parameter access, manipulation, and > programmatic control befitting even a debugger. > > > - FChE > I don't know much it. But I will soon have some time to look at your patch which uses ftrace from utrace. Anyway, are there some plans about utrace to be merged? Unless I couldn't be able to use it... From acme at redhat.com Wed Jan 28 14:29:28 2009 From: acme at redhat.com (Arnaldo Carvalho de Melo) Date: Wed, 28 Jan 2009 12:29:28 -0200 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> Message-ID: <20090128142928.GF15877@ghostprotocols.net> Em Wed, Jan 28, 2009 at 02:58:28PM +0100, Fr?d?ric Weisbecker escreveu: > 2009/1/28 Frank Ch. Eigler : > > Frederic Weisbecker writes: > > > >> [...] > >> Speaking about a global syscall tracer, I made a patch to trace only the syscalls > >> with the function-graph-tracer. > >> http://lkml.org/lkml/2008/12/30/267 This low-level part can easily > >> be used by all tracers that would like to inspect syscalls. > >> [...] > >> Just a change is needed: Steven requested that the part inside > >> syscall_trace_enter become a tracepoint, making it totally shareable > >> between tracers and easy to turn on and off. > > > > Alternately, you could just rely on utrace's hooks. They were thought > > out more fully with respect to parameter access, manipulation, and > > programmatic control befitting even a debugger. > > > > > > - FChE > > > > I don't know much it. But I will soon have some time to look at your > patch which uses ftrace from utrace. > Anyway, are there some plans about utrace to be merged? Unless I > couldn't be able to use > it... Well, one of the reasons for utrace not to be merged, IIRC, was that there would be no users in-kernel. With Frank's ftrace plugin that is not true anymore. - Arnaldo From jade at test.bio-met.ru Wed Jan 28 14:54:42 2009 From: jade at test.bio-met.ru (Radica Coello) Date: Wed, 28 Jan 2009 15:54:42 +0100 Subject: Alanis Ruggia VIP world Message-ID: <31ae01c98160$13e63812$bb93e6d8@187.147.intelnet.net.gt> -------------- next part -------------- An HTML attachment was scrubbed... URL: From phisit.erica at top.nash-kovcheg.ru Wed Jan 28 15:04:05 2009 From: phisit.erica at top.nash-kovcheg.ru (Ruslanas Pihl) Date: Wed, 28 Jan 2009 16:04:05 +0100 Subject: Jackeline Pedersson VIP club Message-ID: <778301c98162$036406c8$275d1e53@cav39.neoplus.adsl.tpnet.pl> -------------- next part -------------- An HTML attachment was scrubbed... URL: From janyurka at u-x0s9u8gi0.looble.net Wed Jan 28 15:15:03 2009 From: janyurka at u-x0s9u8gi0.looble.net (Galit Wittmeyer) Date: Wed, 28 Jan 2009 16:15:03 +0100 Subject: Tolulope VIP club Message-ID: <4f3a01c98163$0f6b8ab2$f6a0505c@dsldevice.lan> -------------- next part -------------- An HTML attachment was scrubbed... URL: From karot.zhuk at topliste.geil-ficken.net Wed Jan 28 15:23:51 2009 From: karot.zhuk at topliste.geil-ficken.net (Cheney Windsor) Date: Wed, 28 Jan 2009 16:23:51 +0100 Subject: Karelia VIP Lounge Invitation Message-ID: <331201c98164$08ebb001$ee509d56@host86-157-80-238.range86-157.btcentralplus.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreh.lippes at televuelo.com Wed Jan 28 15:59:47 2009 From: andreh.lippes at televuelo.com (Paterson Geib) Date: Wed, 28 Jan 2009 16:59:47 +0100 Subject: Baik Asher VIP Lounge Invitation Message-ID: <692901c98169$1623e58c$ae07cec4@adsl196-174-7-206-196.adsl196-1.iam.net.ma> -------------- next part -------------- An HTML attachment was scrubbed... URL: From skorpion.janny at texasloanpros.com Wed Jan 28 17:02:37 2009 From: skorpion.janny at texasloanpros.com (Shakur Hadi) Date: Wed, 28 Jan 2009 18:02:37 +0100 Subject: Derraz Santanu VIP club Message-ID: <1f0901c98172$02e92058$dc084397@[151.67.8.220]> -------------- next part -------------- An HTML attachment was scrubbed... URL: From schick_anirudha at trespassing.kostenloses-forum.tk Wed Jan 28 17:47:17 2009 From: schick_anirudha at trespassing.kostenloses-forum.tk (Dima Chizhova) Date: Wed, 28 Jan 2009 18:47:17 +0100 Subject: Lindsen VIP Lounge Invitation Message-ID: <769601c98178$011a3fd0$4ffc6455@dsl.dynamic8510025279.ttnet.net.tr> -------------- next part -------------- An HTML attachment was scrubbed... URL: From criss.kondratyuk at thoroughcarecarpet.com Wed Jan 28 18:29:29 2009 From: criss.kondratyuk at thoroughcarecarpet.com (Ducky Komepun) Date: Wed, 28 Jan 2009 19:29:29 +0100 Subject: Gemignani Rian VIP Lounge Invitation Message-ID: <078401c9817e$09ed19b3$dbcc7c5b@219-204-124-91.pool.ukrtel.net> -------------- next part -------------- An HTML attachment was scrubbed... URL: From Cranelxbksb at garageland.fsnet.co.uk Thu Jan 29 13:24:15 2009 From: Cranelxbksb at garageland.fsnet.co.uk (Mendez boatswain) Date: Thu, 29 Jan 2009 18:24:15 +0500 Subject: Contact List of Neurologists and many more Message-ID: <020107z8mgz0$v3622of0$7835i7d0@Delldim5150 Board Certified MDs in the US 788,326 in total <> 17,847 emails Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more Over a dozen sortable fields Now priced at: $397 *** If you order by the end of the week you can take all the items below for fr ee *** Pharmaceutical Companies in the US Personal email addresses (47,000 in total) and names for top level executives American Hospitals more than 23k hospital administrators in over 7k hospitals [worth over $300 alone) Extensive Database of Dentists in the United States 597,000 dentists and dental services ( a $350 value!) Chiropractors in the USA 100k Chiropractors offices with full contact data including email, postal address, phone and fax reply by email: Horne at listamaze.com valid thru January 30 kill future mailing by pressing this please send an email to discontinue at listamaze.com From fweisbec at gmail.com Thu Jan 29 14:29:15 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Thu, 29 Jan 2009 15:29:15 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: <20090129140451.GM24391@elte.hu> References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> Message-ID: 2009/1/29 Ingo Molnar : > > * Frederic Weisbecker wrote: > >> On Tue, Jan 27, 2009 at 11:43:03PM +0100, Frederic Weisbecker wrote: >> > On Tue, Jan 27, 2009 at 12:08:04PM -0800, Kok, Auke wrote: >> > > >> > > This tracer monitors regular file open() syscalls. This is a fast >> > > and low-overhead alternative to strace, and does not allow or >> > > require to be attached to every process. >> > > >> > > The tracer only logs succesfull calls, as those are the only ones we >> > > are currently interested in, and we can determine the absolute path >> > > of these files as we log. >> > > >> > > Signed-off-by: Auke Kok >> > >> > >> > Hi Auke, >> > >> > Speaking about a global syscall tracer, I made a patch to trace only the syscalls >> > with the function-graph-tracer. >> > >> > http://lkml.org/lkml/2008/12/30/267 >> > >> > Its approach and purpose is different than a tracer dedicated only to syscalls. >> > The function graph tracer traces execution graph of the functions and is more about >> > execution time spent and code flow whereas a syscall tracer can provide more specific >> > informations about syscalls. >> > >> > So both are not overlaping. >> > >> > But the low level part of my patch creates a thread flag _TIF_SYSCALL_TRACE which triggers >> >> s/_TIF_SYSCALL_TRACE/_TIF_SYSCALL_FTRACE > >> > Once we have it, I think a syscall tracer can be fed with new syscalls >> > events through several patch iterations, starting with the open and >> > close one :-) >> > >> > Are you ok with that? >> > >> > Steven, Ingo, do you agree? > > yes. We definitely need this on the asm syscall level, to not contaminate > hundreds of syscalls with tracepoints. > > Auke's sys_open() plugin would be a nice prototype for that concept - but > in generally it would be useful to be able to augment kernel tracer output > with all syscall events that occur. > > The output would be something like a slimmed-down strace, but for the > whole kernel and not tied to ptrace semantics (which are crippling). > > Would you be interested in extending your syscall tracing concept with > those bits and would you be interested in integrating Auke's plugin into > that > > Ingo Several people talked me about utrace and gave some examples about it in this discussion. The Api is very convenient to fetch syscall numbers, arguments and return values. And the hooks are done in the generic core code, so it is arch independent. The only drawback I can see is that it is not yet merged upstream, in need of in-kernel users. If it only depends on this condition, we could be these users... What do you think? From mingo at elte.hu Thu Jan 29 14:31:20 2009 From: mingo at elte.hu (Ingo Molnar) Date: Thu, 29 Jan 2009 15:31:20 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> Message-ID: <20090129143120.GS24391@elte.hu> * Fr?d?ric Weisbecker wrote: > 2009/1/29 Ingo Molnar : > > > > * Frederic Weisbecker wrote: > > > >> On Tue, Jan 27, 2009 at 11:43:03PM +0100, Frederic Weisbecker wrote: > >> > On Tue, Jan 27, 2009 at 12:08:04PM -0800, Kok, Auke wrote: > >> > > > >> > > This tracer monitors regular file open() syscalls. This is a fast > >> > > and low-overhead alternative to strace, and does not allow or > >> > > require to be attached to every process. > >> > > > >> > > The tracer only logs succesfull calls, as those are the only ones we > >> > > are currently interested in, and we can determine the absolute path > >> > > of these files as we log. > >> > > > >> > > Signed-off-by: Auke Kok > >> > > >> > > >> > Hi Auke, > >> > > >> > Speaking about a global syscall tracer, I made a patch to trace only the syscalls > >> > with the function-graph-tracer. > >> > > >> > http://lkml.org/lkml/2008/12/30/267 > >> > > >> > Its approach and purpose is different than a tracer dedicated only to syscalls. > >> > The function graph tracer traces execution graph of the functions and is more about > >> > execution time spent and code flow whereas a syscall tracer can provide more specific > >> > informations about syscalls. > >> > > >> > So both are not overlaping. > >> > > >> > But the low level part of my patch creates a thread flag _TIF_SYSCALL_TRACE which triggers > >> > >> s/_TIF_SYSCALL_TRACE/_TIF_SYSCALL_FTRACE > > > >> > Once we have it, I think a syscall tracer can be fed with new syscalls > >> > events through several patch iterations, starting with the open and > >> > close one :-) > >> > > >> > Are you ok with that? > >> > > >> > Steven, Ingo, do you agree? > > > > yes. We definitely need this on the asm syscall level, to not contaminate > > hundreds of syscalls with tracepoints. > > > > Auke's sys_open() plugin would be a nice prototype for that concept - but > > in generally it would be useful to be able to augment kernel tracer output > > with all syscall events that occur. > > > > The output would be something like a slimmed-down strace, but for the > > whole kernel and not tied to ptrace semantics (which are crippling). > > > > Would you be interested in extending your syscall tracing concept with > > those bits and would you be interested in integrating Auke's plugin into > > that > > > > Ingo > > > Several people talked me about utrace and gave some examples about it in > this discussion. The Api is very convenient to fetch syscall numbers, > arguments and return values. And the hooks are done in the generic core > code, so it is arch independent. > > The only drawback I can see is that it is not yet merged upstream, in > need of in-kernel users. If it only depends on this condition, we could > be these users... > > What do you think? sure - how do the minimal bits/callbacks look like which enable syscall tracing? Ingo From fweisbec at gmail.com Thu Jan 29 14:48:41 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Thu, 29 Jan 2009 15:48:41 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> Message-ID: 2009/1/29 Fr?d?ric Weisbecker : > 2009/1/29 Ingo Molnar : >>> >>> Several people talked me about utrace and gave some examples about it in >>> this discussion. The Api is very convenient to fetch syscall numbers, >>> arguments and return values. And the hooks are done in the generic core >>> code, so it is arch independent. >>> >>> The only drawback I can see is that it is not yet merged upstream, in >>> need of in-kernel users. If it only depends on this condition, we could >>> be these users... >>> >>> What do you think? >> >> sure - how do the minimal bits/callbacks look like which enable syscall >> tracing? >> >> Ingo > > > There is a very straightforward example provided by Ananth in there: > http://lkml.org/lkml/2009/1/28/59 > One other drawback may be the fact that utrace will be traced by the function tracers... adding some junk on their traces. But I guess this is just a matter of some patches to make it not traced. BTW, there is an interesting proof of concept there: http://lkml.org/lkml/2009/1/27/294 From mingo at elte.hu Thu Jan 29 15:09:34 2009 From: mingo at elte.hu (Ingo Molnar) Date: Thu, 29 Jan 2009 16:09:34 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> Message-ID: <20090129150934.GF6512@elte.hu> * Fr?d?ric Weisbecker wrote: > 2009/1/29 Ingo Molnar : > >> > >> Several people talked me about utrace and gave some examples about it in > >> this discussion. The Api is very convenient to fetch syscall numbers, > >> arguments and return values. And the hooks are done in the generic core > >> code, so it is arch independent. > >> > >> The only drawback I can see is that it is not yet merged upstream, in > >> need of in-kernel users. If it only depends on this condition, we could > >> be these users... > >> > >> What do you think? > > > > sure - how do the minimal bits/callbacks look like which enable syscall > > tracing? > > > > Ingo > > > There is a very straightforward example provided by Ananth in there: > http://lkml.org/lkml/2009/1/28/59 I mean, how does the infrastructure patch look like - what code does this add to the kernel - just to get the syscall tracing bits. Lets get some progress here - it's clear that tracing syscalls is good, we just need to do it and look at actual patches. Ingo From fweisbec at gmail.com Thu Jan 29 14:40:55 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Thu, 29 Jan 2009 15:40:55 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: <20090129143120.GS24391@elte.hu> References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> Message-ID: 2009/1/29 Ingo Molnar : >> >> Several people talked me about utrace and gave some examples about it in >> this discussion. The Api is very convenient to fetch syscall numbers, >> arguments and return values. And the hooks are done in the generic core >> code, so it is arch independent. >> >> The only drawback I can see is that it is not yet merged upstream, in >> need of in-kernel users. If it only depends on this condition, we could >> be these users... >> >> What do you think? > > sure - how do the minimal bits/callbacks look like which enable syscall > tracing? > > Ingo There is a very straightforward example provided by Ananth in there: http://lkml.org/lkml/2009/1/28/59 From fweisbec at gmail.com Thu Jan 29 15:17:54 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Thu, 29 Jan 2009 16:17:54 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: <20090129150934.GF6512@elte.hu> References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> <20090129150934.GF6512@elte.hu> Message-ID: 2009/1/29 Ingo Molnar : > > * Fr?d?ric Weisbecker wrote: > >> 2009/1/29 Ingo Molnar : >> >> >> >> Several people talked me about utrace and gave some examples about it in >> >> this discussion. The Api is very convenient to fetch syscall numbers, >> >> arguments and return values. And the hooks are done in the generic core >> >> code, so it is arch independent. >> >> >> >> The only drawback I can see is that it is not yet merged upstream, in >> >> need of in-kernel users. If it only depends on this condition, we could >> >> be these users... >> >> >> >> What do you think? >> > >> > sure - how do the minimal bits/callbacks look like which enable syscall >> > tracing? >> > >> > Ingo >> >> >> There is a very straightforward example provided by Ananth in there: >> http://lkml.org/lkml/2009/1/28/59 > > I mean, how does the infrastructure patch look like - what code does this > add to the kernel - just to get the syscall tracing bits. Lets get some > progress here - it's clear that tracing syscalls is good, we just need to > do it and look at actual patches. > > Ingo > The latest snapshot version I've found is here: http://people.redhat.com/roland/utrace/2.6-current/utrace.patch This is mostly independent core code and a good number of hooks inside ptrace. But I don't know much about the overhead it potentially brings on ptrace. From fweisbec at gmail.com Thu Jan 29 15:34:46 2009 From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=) Date: Thu, 29 Jan 2009 16:34:46 +0100 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: <20090129150934.GF6512@elte.hu> References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> <20090129150934.GF6512@elte.hu> Message-ID: 2009/1/29 Ingo Molnar : > > * Fr?d?ric Weisbecker wrote: > >> 2009/1/29 Ingo Molnar : >> >> >> >> Several people talked me about utrace and gave some examples about it in >> >> this discussion. The Api is very convenient to fetch syscall numbers, >> >> arguments and return values. And the hooks are done in the generic core >> >> code, so it is arch independent. >> >> >> >> The only drawback I can see is that it is not yet merged upstream, in >> >> need of in-kernel users. If it only depends on this condition, we could >> >> be these users... >> >> >> >> What do you think? >> > >> > sure - how do the minimal bits/callbacks look like which enable syscall >> > tracing? I know you are talking about the only necessary bits from utrace to have the syscalls tracing. But I can't answer you better than would the utrace people. And actually I'm not sure the utrace bits for syscall tracing can be isolated from the rest of its core. Anyway, I will let the utrace guy answer to it :-) >> There is a very straightforward example provided by Ananth in there: >> http://lkml.org/lkml/2009/1/28/59 > > I mean, how does the infrastructure patch look like - what code does this > add to the kernel - just to get the syscall tracing bits. Lets get some > progress here - it's clear that tracing syscalls is good, we just need to > do it and look at actual patches. > > Ingo > From fche at redhat.com Thu Jan 29 15:53:42 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Thu, 29 Jan 2009 10:53:42 -0500 Subject: [PATCH] tracer for sys_open() - sreadahead In-Reply-To: References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere> <20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu> <20090129143120.GS24391@elte.hu> <20090129150934.GF6512@elte.hu> Message-ID: <20090129155341.GB20679@redhat.com> Hi - On Thu, Jan 29, 2009 at 04:34:46PM +0100, Fr?d?ric Weisbecker wrote: > 2009/1/29 Ingo Molnar : > [...] > >> > sure - how do the minimal bits/callbacks look like which enable syscall > >> > tracing? > I know you are talking about the only necessary bits from utrace to > have the syscalls tracing. But I can't answer you better than would > the utrace people. And actually I'm not sure the utrace bits for > syscall tracing can be isolated from the rest of its core. My understanding is that the parts of utrace that remain out-of-tree are relatively integrated, and just present the programmatic callback API to the already merged "tracehook" layer. - FChE From ananth at in.ibm.com Thu Jan 29 16:32:34 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Thu, 29 Jan 2009 22:02:34 +0530 Subject: [PATCH] Track live engines and their refcounts Message-ID: <20090129163234.GA26777@in.ibm.com> Here is a patch that will track live engines and expose them via debugfs. This will show if there are stale engines and their refcounts, also to determine if there are any engine slab leaks. This is just for debug purposes. Needs tweaking if this needs to be part of the core patch (ifdefs, etc). Applies atop the rcu removal patch sent last week: https://www.redhat.com/archives/utrace-devel/2009-January/msg00075.html Signed-off-by: Ananth N Mavinakayanahalli --- include/linux/utrace.h | 1 kernel/utrace.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 99 insertions(+), 1 deletion(-) Index: utrace-20jan/include/linux/utrace.h =================================================================== --- utrace-20jan.orig/include/linux/utrace.h +++ utrace-20jan/include/linux/utrace.h @@ -317,6 +317,7 @@ struct utrace_attached_engine { /* private: */ struct kref kref; struct list_head entry; + struct list_head live; /* public: */ const struct utrace_engine_ops *ops; Index: utrace-20jan/kernel/utrace.c =================================================================== --- utrace-20jan.orig/kernel/utrace.c +++ utrace-20jan/kernel/utrace.c @@ -21,8 +21,10 @@ #include #include #include +#include #include +#include /* * struct utrace, defined in utrace.h is private to this file. Its @@ -50,11 +52,16 @@ * callbacks seen. */ +static spinlock_t live_lock; +static struct list_head live_engines; + static struct kmem_cache *utrace_engine_cachep; static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */ static int __init utrace_init(void) { + INIT_LIST_HEAD(&live_engines); + spin_lock_init(&live_lock); utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC); return 0; } @@ -79,6 +86,9 @@ void __utrace_engine_release(struct kref struct utrace_attached_engine *engine = container_of(kref, struct utrace_attached_engine, kref); BUG_ON(!list_empty(&engine->entry)); + spin_lock(&live_lock); + list_del(&engine->live); + spin_unlock(&live_lock); kmem_cache_free(utrace_engine_cachep, engine); } EXPORT_SYMBOL_GPL(__utrace_engine_release); @@ -322,6 +332,7 @@ restart: engine->flags = 0; engine->ops = ops; engine->data = data; + INIT_LIST_HEAD(&engine->live); if ((ret == 0) && (list_empty(&utrace->attached))) { /* First time here, set engines up */ @@ -338,8 +349,12 @@ restart: goto restart; } engine = ERR_PTR(ret); + } else { + /* Debugging... engine leaks */ + spin_lock(&live_lock); + list_add(&engine->live, &live_engines); + spin_unlock(&live_lock); } - return engine; } EXPORT_SYMBOL_GPL(utrace_attach_task); @@ -2431,3 +2446,85 @@ void task_utrace_proc_status(struct seq_ utrace->report ? " (report)" : "", utrace->interrupt ? " (interrupt)" : ""); } + +#ifdef CONFIG_DEBUG_FS +/* Similar what's in to net/core/sock.c */ +static void *ut_eng_seq_start(struct seq_file *s, loff_t *pos) +{ + rcu_read_lock(); + spin_lock(&live_lock); + + return seq_list_start_head(&live_engines, *pos); +} + +static void *ut_eng_seq_next(struct seq_file *s, void *v, loff_t *pos) +{ + return seq_list_next(v, &live_engines, pos); +} + +static void ut_eng_seq_stop(struct seq_file *s, void *v) +{ + spin_unlock(&live_lock); + rcu_read_unlock(); +} + +void ut_eng_seq_printf(struct seq_file *seq, + struct utrace_attached_engine *engine) +{ + seq_printf(seq, "%p %d\n", + engine, atomic_read(&engine->kref.refcount)); +} + +static int ut_eng_seq_show(struct seq_file *seq, void *v) +{ + if (v == &live_engines) + seq_printf(seq, "engine ref_cnt\n"); + else + ut_eng_seq_printf(seq, list_entry(v, + struct utrace_attached_engine, + live)); + return 0; +} + +static const struct seq_operations ut_eng_seq_ops = { + .start = ut_eng_seq_start, + .next = ut_eng_seq_next, + .stop = ut_eng_seq_stop, + .show = ut_eng_seq_show +}; + +static int utrace_eng_open(struct inode *inode, struct file *filp) +{ + return seq_open(filp, &ut_eng_seq_ops); +} + +struct file_operations debugfs_utrace_ops = { + .open = utrace_eng_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +static int debugfs_utrace_init(void) +{ + struct dentry *dir, *file; + + dir = debugfs_create_dir("utrace", NULL); + if (!dir) { + printk(KERN_INFO "Unable to create utrace dir\n"); + return -ENOMEM; + } + + file = debugfs_create_file("engines", 0440, dir, NULL, + &debugfs_utrace_ops); + if (!file) { + printk(KERN_INFO "Unable to create engines file\n"); + debugfs_remove(dir); + return -ENOMEM; + } + + return 0; +} +late_initcall(debugfs_utrace_init); + +#endif /* CONFIG_DEBUG_FS */ From EliJohnstone at manage.com Sat Jan 31 17:46:17 2009 From: EliJohnstone at manage.com (Eli Johnstone) Date: Sat, 31 Jan 2009 17:46:17 +0000 Subject: A Whole New Experience of Managerial Learning Message-ID: <200901310945.n0V9jaaI007723@mx2.redhat.com> This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EDU.jpg Type: image/jpeg Size: 77568 bytes Desc: not available URL: From Ledbetter_Lewis at mzogroup.com Tue Feb 3 05:04:21 2009 From: Ledbetter_Lewis at mzogroup.com (Erwin V Winston) Date: Tue, 03 Feb 2009 06:04:21 +0100 Subject: MD Listing in the US Message-ID: <607465g2gmk0$s2478su0$7885h2g0@Delldim5150 Here's what we're offering for this week: Certified MDs in the US 788,426 in total * 17,399 emails Lots of MDs in specialties like Orthopedics, Surgery, Radiology, Dermatology, Neurology, General Practice etc.. Can easily be sorted by 16 different fields Directory of US Pharma Companies Personal email addresses (47,000 in total) and names for top level executives Hospital Facilities in America Full data for all the major positions in more than 7k facilities Directory of US Dentists Practically every dentist in the United States is listed here Chiropractors in the USA Over than 100k chiropractors practicing in the US Now priced at: $399 for all lists above send and email to: Ellis at qualitymedlists.com this offer is only valid until February 06 2009 to stop this email in future email us at nomail at qualitymedlists.com From renzo at cs.unibo.it Wed Feb 4 11:35:07 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Wed, 4 Feb 2009 12:35:07 +0100 Subject: Utrace and process (partial) virtualization Message-ID: <20090204113507.GE17452@cs.unibo.it> Dear Roland and dear utrace developers, I am already having some problems regarding utrace, and more specifically the utrace interface for (partial) virtual machines and (again) the support for utrace engines nesting. I am writing my point of view here for a general discussion. This is the summary: 1- Virtual Machines may need to change the system call 2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for report_syscall_entry? 3- Nesting, is it really useful to run all the reports in a row and (eventually) stop and the end waiting for all the engines? 4- report_syscall_entry engines evaluation order should be reversed ---- 1- This is the simplest suggestion/request. sometimes virtual machine engines need to change the system call (e.g. the process calls a "creat", the kernel must run "open" instead). I suggest to add some useful inline functions in arch/*/include/asm/syscall.h: syscall_set_nr // to set the system call number syscall_get_pc // to get/set the program counter syscall_set_pc syscall_get_sp // to get/set the stack pointer syscall_set_sp These inline calls would help to create architecture independent virtual machine engines. Now the "hard" part: 2- Which is the scenario of virtual machines based on utrace? In my mind there are two or three actors. K- At the lowest layer there is the kernel providing utrace M- There is a module which uses utrace and virtualize something. M can do all the virtualization at kernel level but maybe it uses also: U- A userland Virtual Machine Monitor. So we have K,M and U. When a virtualized process does a syscall, K calls the report_syscall_entry function of M. If M is entirely at kernel level it can decide whether to abort the syscall (setting UTRACE_SYSCALL_ABORT) or not but there is no (clean) way to forward the request to U and wait for U's decision about the syscall. SYSEMU can be implemented with utrace current interface as it aborts *all* the syscalls. View-OS cannot use it. In fact km-view is a userland VM which need to decide which system calls must be skipped and which executed. It is not for View-OS only, whoever tries to implement similar features will run into the same problem. Maybe even VMMs entirely implemented in the kernel module need to delay the decision about the action. I think UTRACE_STOP has exactly this meaning: in Roland's ptrace implementation UTRACE_STOP is used in this way. User-mode Linux running on ptrace do change the registers of the process status while the process in in STOP state. I am currently trying to implement a new kmview module using UTRACE_STOP. When I need to skip the syscall I change the syscall (orig_ax in x86) number to -1 while the process is stopped. Utrace believes that the syscall is *not* aborted then it passes orig_ax (return ret ?: regs->orig_ax; in arch/x86/kernel/ptrace.c) to the "entry_{32/64}.s" layer, causing the syscall to be skipped. This is a dirty workaround. I think that the specific actions (for syscalls, signals) should be accepted during a utrace_control(..., UTRACE_RESUME). In this way: ** K calls report_syscall_entry ** M sends the request to U and returns UTRACE_STOP. (M can then process requests for many other processes and many userland VMM) ** U receives the request, decides syscall abort or execute ** U sends its reply to M ** M calls utrace_control UTRACE_RESUME setting the action flag needed (e.g. UTRACE_SYSCALL_ABORT). The same scenario can apply to userland management of signals, the VMM or debugger could need to delay the decision among UTRACE_SIGNAL* cases, and it is hard to keep the monitor inside the report_signal upcall waiting to return a value. It would need another implementation of some kind of process stop/quiescence inside the module. 3- Following the KMU schema above, let us now depict a scenario where there are multiple M engines and multiple U VMMs on the same process. If I have correctly understood the code, the current implementation runs all the report upcalls in a row. If some ot the report upcalls return UTRACE_STOP, utrace waits for all the stopped engine to send a UTRACE_RESUME. (from utrace.c: If another engine is keeping @target stopped, then it remains stopped until all engines let it resume.) All the M engines may try to change the status of the process concurrently, as each engine thinks the process has been stopped for its manamengent. Maybe we have two different ideas of the STOP state and of process virtualization. For me a process in STOP state is blocked for inspection. During the STOP state a module M can change the process status. With "virtualized process" I mean a process that "sees" an environment different from that provided by the hosting kernel. A user-mode linux process is a virtualized process. In my mind several engines working on a process implement several layers of virtualization. The first engine provides the process a modified virtual world. If a second engine gets loaded on the same process, the first engine provides its modified world to the second engine which implement a further virtualization for the process and so on. In this perspective I think that the useful sequence (for kernel generated events) is: K calls the report upcall of the first engine if M returns UTRACE_STOP wait for UTRACE_RESUME from the first engine K calls the report upcall of the second engine if M returns UTRACE_STOP wait for UTRACE_RESUME from the second engine and so on. In this way each engine can safely change the state (based on its virtual perspective of the world maybe provided by the previous engine) and notify its action before next engine start working. The next engine "sees" the world as it has been modified by the previous one. 4- utrace_report_syscall_entry must scan the list of engine in the reverse order (it is the only event type which is process generated). >From the idea of nested virtualization it follows that the process request to run a system call must be processed by the outer (latest) engine first and then down to the inner/first. Utrace uses "list_for_each_entry_safe" for the list scan. "list_for_each_entry_safe_reverse" do exist, maybe it can be used. I haven't tested it yet. Interested readers may refer also to my previous postings on the same subject. (July 2008) ------- Thank you if you have read up to here. ciao renzo From Borisov.Alex at belizehotelsmotels.com Wed Feb 4 22:49:35 2009 From: Borisov.Alex at belizehotelsmotels.com (Rob Moscrop) Date: Wed, 04 Feb 2009 22:49:35 +0000 Subject: Just a Minute With: Singer Chaka Khan Message-ID: <40cf01c9871a$2e99e552$9b3ae8be@[190.232.58.155]> discounts here http://www.goxtixunas.com/ Miss. Rob Moscrop tel +1 667 2413975 Moscrop at aucklandboatsales.com Hitchco Distributors Ltd., 2000 E. Horsetooth Rd. From Lexel.Alex at bluffmtnadventures.com Wed Feb 4 22:51:58 2009 From: Lexel.Alex at bluffmtnadventures.com (John Bloch) Date: Wed, 04 Feb 2009 22:51:58 +0000 Subject: Lovett will go to bat for radio royalties Message-ID: <246001c9871b$00cb7a30$a6ed6552@cm1032103-a.maast1.lb.home.nl> discounts here http://www.goxtixunas.com/ marketing, John Bloch +1 (667) 2413975 Bloch at azimutconseils.com Allianz Canada, PO Box 1273 From Solf.David at bakersbusinesssolutions.com Wed Feb 4 23:07:27 2009 From: Solf.David at bakersbusinesssolutions.com (Helen Boott) Date: Wed, 04 Feb 2009 23:07:27 +0000 Subject: Act today to enjoy it tomorrow! Message-ID: <2a2b01c9871d$14049a4d$7f0169be@[190.105.1.127]> what we can do for you http://www.goxtixunas.com/ Miss. Helen Boott tel +1(720)5892341 Boott at bouwmaterialen.info Royal Bank Leasing, 1858 Charter Lane, Suite 103 From jkenisto at us.ibm.com Thu Feb 5 00:18:56 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 04 Feb 2009 16:18:56 -0800 Subject: instruction-analysis API(s) Message-ID: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> Hi, Roland. Back in a conference call in December, we discussed approaches to refactoring utrace-related code such as uprobes, to make some of the services provided there more generally available. In particular, you suggested an "instruction analysis" service that various subsystems could exploit -- kprobes and uprobes/ubp at first, and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. I decided to survey the kernel for subsystems that parse and/or analyze CPU instructions. I hoped to review the various approaches -- perhaps finding one that's widely accepted -- and to evaluate possible clients for our instruction-analysis service. The results were discouraging, as summarized below. I see no promise of an architecture-agnostic instruction-analysis API. Within each architecture, I think the best we could do would be an (architecture-specific) instruction-parsing API. (And even within an architecture, different subsystems look at different aspects of an instruction.) Srikar Dronamraju and I are exploring two different approaches to an x86 instruction-parsing service. Since x86 kvm seems to have one of the most systematic and thorough approaches, Srikar is prototyping a generalization of kvm's x86_decode_insn() to make it support kprobes, and eventually uprobes. (Note that kvm does NOT appear to be a good starting place on powerpc and s390.) Approaching from the minimalist side, I've implemented an x86 instruction-parsing API with just enough smarts (so far) to support kprobes and uprobes. We'd be interested to know whether these efforts are consistent with what you have in mind. See more details below. Jim Intro ----- "Instruction analysis" refers to the analysis of a CPU instruction in the kernel or a user program. Typically, the instruction must be analyzed so that it can be properly emulated (in the case of SSOL, by executing the same instruction at a different address), or so a fault caused by the instruction can be properly handled. There are other uses as well -- see below. Possible Clients of an Instruction-Analysis Service --------------------------------------------------- Where in the kernel is instruction analysis currently used? - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) - uprobes (ia64, powerpc, s390, x86) - hypervisors: - kvm (ia64, powerpc, s390, x86) - powerpc Cell Beat hypervisor - floating-point unit emulation (arm, s390, sparc, x86) - exception handling: - page fault (powerpc, x86) - illegal instruction (s390) - unaligned trap (ia64) - vm86 fault (x86) - disassembly (powerpc, s390) - powerpc: xmon, code patching (for crash dump?) - ia64: emulation of brl instruction - x86: alternative-instruction patching (replacing instructions that are inappropriate for the CPU rev), fault injection - djprobes (not in kernel, not sure of status) Note: I looked in detail only at the architectures that implement kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86. And I note in passing that sh does a lot of instruction analysis -- as does mips -- but I skipped sh for now. Note: Roland also listed gdb, perfmon, and ftrace as subsystems that do instruction analysis. I think that oprofile has also been suggested. - I haven't investigated gdb, but I have no reason to think Roland is wrong about it. - I've looked briefly at the various components of perfmon[2] and oprofile, but I don't see any instruction analysis per se; and the perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any. - Similarly, I don't see instruction analysis per se in ftrace. Prospects/Problems ------------------ What are the prospects for adapting these various subsystems to use a common instruction-analysis service? Typically, not very good. Here are some of the problems: - Different architectures have very different instruction-analysis needs. - Different architectures have very different instruction formats and instruction attributes. Consequently, the opportunities for common code shared by multiple architectures are few. - Different subsystems are interested in different instruction attributes, and/or classify instructions differently. - Some subsystems are interested in only certain instructions. - Some subsystems, such as fault handlers, want to maximize efficiency by examining as little of the instruction as possible; while others, such as *probes, take a more leisurely approach (e.g., reading enough bytes to capture the largest possible instruction, even if that means faulting in a page). From mhiramat at redhat.com Fri Feb 6 20:49:12 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Fri, 06 Feb 2009 15:49:12 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <498CA248.2090708@redhat.com> Hi Jim, I'm also interested in the instruction decoder. If you don't mind, could we share the API specification? I'd like to port djprobe on it. Thanks! Jim Keniston wrote: > Hi, Roland. Back in a conference call in December, we discussed > approaches to refactoring utrace-related code such as uprobes, to > make some of the services provided there more generally available. > In particular, you suggested an "instruction analysis" service that > various subsystems could exploit -- kprobes and uprobes/ubp at first, > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. > > I decided to survey the kernel for subsystems that parse and/or analyze > CPU instructions. I hoped to review the various approaches -- perhaps > finding one that's widely accepted -- and to evaluate possible clients > for our instruction-analysis service. > > The results were discouraging, as summarized below. I see no > promise of an architecture-agnostic instruction-analysis API. > Within each architecture, I think the best we could do would be an > (architecture-specific) instruction-parsing API. (And even within > an architecture, different subsystems look at different aspects of > an instruction.) > > Srikar Dronamraju and I are exploring two different approaches to an > x86 instruction-parsing service. Since x86 kvm seems to have one of > the most systematic and thorough approaches, Srikar is prototyping a > generalization of kvm's x86_decode_insn() to make it support kprobes, > and eventually uprobes. (Note that kvm does NOT appear to be a good > starting place on powerpc and s390.) Approaching from the minimalist > side, I've implemented an x86 instruction-parsing API with just enough > smarts (so far) to support kprobes and uprobes. > > We'd be interested to know whether these efforts are consistent > with what you have in mind. > > See more details below. > > Jim > > Intro > ----- > "Instruction analysis" refers to the analysis of a CPU instruction > in the kernel or a user program. Typically, the instruction must > be analyzed so that it can be properly emulated (in the case of > SSOL, by executing the same instruction at a different address), > or so a fault caused by the instruction can be properly handled. > There are other uses as well -- see below. > > Possible Clients of an Instruction-Analysis Service > --------------------------------------------------- > Where in the kernel is instruction analysis currently used? > - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) > - uprobes (ia64, powerpc, s390, x86) > - hypervisors: > - kvm (ia64, powerpc, s390, x86) > - powerpc Cell Beat hypervisor > - floating-point unit emulation (arm, s390, sparc, x86) > - exception handling: > - page fault (powerpc, x86) > - illegal instruction (s390) > - unaligned trap (ia64) > - vm86 fault (x86) > - disassembly (powerpc, s390) > - powerpc: xmon, code patching (for crash dump?) > - ia64: emulation of brl instruction > - x86: alternative-instruction patching (replacing instructions that are > inappropriate for the CPU rev), fault injection > - djprobes (not in kernel, not sure of status) > > Note: I looked in detail only at the architectures that implement > kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86. > And I note in passing that sh does a lot of instruction analysis -- > as does mips -- but I skipped sh for now. > > Note: Roland also listed gdb, perfmon, and ftrace as subsystems that > do instruction analysis. I think that oprofile has also been suggested. > - I haven't investigated gdb, but I have no reason to think Roland is > wrong about it. > - I've looked briefly at the various components of perfmon[2] and > oprofile, but I don't see any instruction analysis per se; and the > perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any. > - Similarly, I don't see instruction analysis per se in ftrace. > > Prospects/Problems > ------------------ > What are the prospects for adapting these various subsystems to use > a common instruction-analysis service? Typically, not very good. > Here are some of the problems: > - Different architectures have very different instruction-analysis > needs. > - Different architectures have very different instruction formats and > instruction attributes. Consequently, the opportunities for common > code shared by multiple architectures are few. > - Different subsystems are interested in different instruction > attributes, and/or classify instructions differently. > - Some subsystems are interested in only certain instructions. > - Some subsystems, such as fault handlers, want to maximize efficiency > by examining as little of the instruction as possible; while others, > such as *probes, take a more leisurely approach (e.g., reading enough > bytes to capture the largest possible instruction, even if that means > faulting in a page). > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From jkenisto at us.ibm.com Fri Feb 6 23:58:58 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 06 Feb 2009 15:58:58 -0800 Subject: instruction-analysis API(s) In-Reply-To: <498CA248.2090708@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> Message-ID: <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > Hi Jim, > > I'm also interested in the instruction decoder. > If you don't mind, could we share the API specification? > I'd like to port djprobe on it. I'm enclosing the little x86 instruction-analysis protoype I hacked together (insn_x86.*), along with a copy of systemtap's runtime/uprobes2/uprobes_x86.c, which I modified to use it. But again, we haven't really settled on an API. For example, my x86 prototype doesn't collect all the info that kvm needs. We're thinking that adapting some existing code (like kvm in the x86 case) might be more palatable to LKML. Jim > > Thanks! > > Jim Keniston wrote: > > Hi, Roland. Back in a conference call in December, we discussed > > approaches to refactoring utrace-related code such as uprobes, to > > make some of the services provided there more generally available. > > In particular, you suggested an "instruction analysis" service that > > various subsystems could exploit -- kprobes and uprobes/ubp at first, > > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. > > ... > > Srikar Dronamraju and I are exploring two different approaches to an > > x86 instruction-parsing service. Since x86 kvm seems to have one of > > the most systematic and thorough approaches, Srikar is prototyping a > > generalization of kvm's x86_decode_insn() to make it support kprobes, > > and eventually uprobes. (Note that kvm does NOT appear to be a good > > starting place on powerpc and s390.) Approaching from the minimalist > > side, I've implemented an x86 instruction-parsing API with just enough > > smarts (so far) to support kprobes and uprobes. > > > > We'd be interested to know whether these efforts are consistent > > with what you have in mind. > > > > See more details below. > > > > Jim ... -------------- next part -------------- A non-text attachment was scrubbed... Name: insn_x86.c Type: text/x-csrc Size: 7705 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: insn_x86.h Type: text/x-chdr Size: 3060 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: uprobes_x86.c Type: text/x-csrc Size: 19871 bytes Desc: not available URL: From mhiramat at redhat.com Sat Feb 7 00:40:43 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Fri, 06 Feb 2009 19:40:43 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <498CD88B.3070209@redhat.com> Jim Keniston wrote: [...] > Possible Clients of an Instruction-Analysis Service > --------------------------------------------------- > Where in the kernel is instruction analysis currently used? I think we also need to clarify why they need it(what information/action they require), because it defines API. > - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86) > - uprobes (ia64, powerpc, s390, x86) > - djprobes (not in kernel, not sure of status) They need 'static' analysis of instructions to get below parameters. - length - attribute (prefixes, etc) - type (jump/accumulation/memory access/flag change, etc) > - hypervisors: > - kvm (ia64, powerpc, s390, x86) > - powerpc Cell Beat hypervisor > - floating-point unit emulation (arm, s390, sparc, x86) They need 'dynamic' instruction emulation. > - exception handling: > - page fault (powerpc, x86) > - illegal instruction (s390) > - unaligned trap (ia64) > - vm86 fault (x86) Depends on the case, however, some of them just need instruction type and length, and these should be done very quickly. So, they need a light-weight and specialized analyzer/emulator. > - disassembly (powerpc, s390) > - powerpc: xmon, code patching (for crash dump?) Maybe, static analysis is enough? > - ia64: emulation of brl instruction Dynamic emulation. > - x86: alternative-instruction patching (replacing instructions that are > inappropriate for the CPU rev), fault injection Static analysis. [...] > Prospects/Problems > ------------------ > What are the prospects for adapting these various subsystems to use > a common instruction-analysis service? Typically, not very good. > Here are some of the problems: > - Different architectures have very different instruction-analysis > needs. IMHO, there are just need two types of interfaces: static analyzer or dynamic emulator. > - Different architectures have very different instruction formats and > instruction attributes. Consequently, the opportunities for common > code shared by multiple architectures are few. > - Different subsystems are interested in different instruction > attributes, and/or classify instructions differently. > - Some subsystems are interested in only certain instructions. Indeed. I think we don't need to care all of them at the start point. Just starting simply and evolving code on upstream is my recommendation. > - Some subsystems, such as fault handlers, want to maximize efficiency > by examining as little of the instruction as possible; while others, > such as *probes, take a more leisurely approach (e.g., reading enough > bytes to capture the largest possible instruction, even if that means > faulting in a page). Indeed. I think those efficiency-required subsystems are so arch-dependent that we can (just) shares instruction bitmaps or provide special interfaces. Thank you for your work! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From renzo at cs.unibo.it Sat Feb 7 11:07:10 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Sat, 7 Feb 2009 12:07:10 +0100 Subject: utrace@FOSDEM Message-ID: <20090207110710.GA11184@cs.unibo.it> I am at FOSDEM in Brussels. (I'll give a talk tomorrow 11:00, not directly related to utrace). If there are other utrace developers here araund we can meet in person for some brainstorming.... renzo From dada.adistech at gmail.com Sat Feb 7 15:47:13 2009 From: dada.adistech at gmail.com (Fotografia) Date: Sat, 07 Feb 2009 16:47:13 +0100 Subject: Ultimas existencias. Message-ID: <20090206112041.46282105@gmail.com> (Mailing list information, including unsubscription instructions, is located at the end of this message.) __ Publicidad Promoci?n v?lida a partir del 21/01/2009 Adistech Europe, S.L. adistech.europesl at gmail.com PD: Para cualquier consulta, puedes ponerte en contacto con nuestro equipo al tel. (+34) 93 481 4162 -- The following information is a reminder of your current mailing list subscription: You are subscribed to the following list: Fotografia using the following email: utrace-devel at redhat.com You may automatically unsubscribe from this list at any time by visiting the following URL: If the above URL is inoperable, make sure that you have copied the entire address. Some mail readers will wrap a long URL and thus break this automatic unsubscribe mechanism. You may also change your subscription by visiting this list's main screen: If you're still having trouble, please contact the list owner at: The following physical address is associated with this mailing list: Fotoart -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpg Size: 74016 bytes Desc: not available URL: From shjfmgk at fm2way.com Sun Feb 8 03:58:21 2009 From: shjfmgk at fm2way.com (Monty Prater) Date: Sun, 08 Feb 2009 02:58:21 -0100 Subject: MD Database in the USA Message-ID: <261506j8eso0$m0258bh0$6485g6k0@Delldim5150 Board Certified MDs in the United States 788,193 in total <> 17,736 emails Coverage in many different areas of medicine such as Endocrinology, Pathology, Urology, Neurology, Plastic Surgery, Psychiatry, Cardiology and much more Over a dozen sortable fields Reduced to only: $398 ()()() GET THE 4 ITEMS BELOW AS A GIFT WHEN YOU ORDER ()()() -> Listing of US Pharma Companies Names and email addresses of 47,000 employees in high-ranking positions -> Complete List of Hospitals in America Full data for all the major positions in more than 7k facilities -> American Dentists A complete Contact List or dentists and related services (valued at $399) -> US Chiropractor List Over than 100k chiropractors practicing in America email to: Jarrett at qualitymedlists.com valid until February 13 email nomail at qualitymedlists.com for delisting From roland at redhat.com Mon Feb 9 07:22:18 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 8 Feb 2009 23:22:18 -0800 (PST) Subject: proof-of-concept, utrace->ftrace engine In-Reply-To: Frank Ch. Eigler's message of Tuesday, 27 January 2009 14:54:26 -0500 <20090127195425.GF32568@redhat.com> References: <20090127195425.GF32568@redhat.com> Message-ID: <20090209072218.214FBFC35D@magilla.sf.frob.com> > Here's the start of a little ditty that ties process-related events as > hooked by the Roland McGrath's utrace code into the ftrace > buffer/control widgetry. If nothing else, think of it as one > potential in-tree user of utrace. Cool! I won't comment on the use of the tracer or its interface code, I'll leave that to others. (It's simplistic and kludgey, but I know that's what it's going for.) I'll just review your use of the utrace API. > +/* Should tracing apply to given task? Compare against filter > + values. */ > +static int trace_test (struct task_struct *tsk) > +{ > + if (trace_taskcomm_filter[0] > + && strcmp (trace_taskcomm_filter, tsk->comm)) > + return 0; Note that this is the most simple-minded approach for this. The ->comm value only changes at exec. So the "normal", slightly more sophisticated, way to approach this would be to check at attach time if ->comm matches. If so, enable full tracing. If not, enable only EXEC and CLONE events. In your report_exec callback, check ->comm to see if the task now should be filtered in or now should be filtered out, and call utrace_set_events with more or fewer bits set accordingly. You always need the report_clone callback to attach the new child so you can see when it execs; give the new child the thin or fat event mask as its parent has. This way, you don't go off the fast paths in signals, etc. when you are never going to care about those events. For a trivial hack like this one, you might not care. But for more serious use, you want to bother doing it the fancier way. If you added syscall tracing support, you probably would care about the overhead of enabling that on all the uninteresting tasks. > + if (trace_taskuid_filter != (u32)-1 > + && trace_taskuid_filter != task_uid (tsk)) > + return 0; We don't have a utrace event for uid changes, so this one you do have to do "eagerly". (Some day in the future, we might well have an event for this so it can be treated intelligently on transitions as with exec as the "->comm change event".) > +static struct utrace_engine_ops process_trace_ops __read_mostly; This is normally const. utrace never touches it (all const pointers). You could change it yourself, but that would not be a normal way to do things. > + engine = utrace_attach_task (tsk, UTRACE_ATTACH_CREATE, > + & process_trace_ops, NULL); Given how you use UTRACE_ATTACH_MATCH_OPS to effect detach, you might want to use UTRACE_ATTACH_MATCH_OPS|UTRACE_ATTACH_EXCLUSIVE here. It's probably impossible to have another call than yours with the same ops pointer, but if not then it probably indicates that your later detach could well foul something up. > + /* XXX: Why is this not implicit from the fields > + set in the process_trace_ops? */ > + rc = utrace_set_events (tsk, engine, The same reason FWRITE on a struct file is not implicit from having a .write field set in your struct file_operations. Your ops struct says statically what your code is written to handle. An engine's event mask says what callbacks you want from that specific thread to that specific engine at the moment. > + UTRACE_EVENT(SIGNAL) | Note this means (exactly as documented): _UTRACE_EVENT_SIGNAL, /* Signal delivery will run a user handler. */ You might have had UTRACE_EVENT_SIGNAL_ALL in mind. That is the union of the five different kinds of SIGNAL* event. > +u32 process_trace_report_clone (enum utrace_resume_action action, [...] > + return action; > +} This is wrong. If you have nothing special you want to do (just observing, not perturbing), then "return UTRACE_RESUME;" is what you say. In report_signal, the non-utrace_resume_action part of the return value matters, so: return UTRACE_RESUME | utrace_signal_action(action); is what doesn't change anything there. As documented under 'struct utrace_engine_ops', the action argument is what other engines are causing to be done independent of what your engine does. The utrace_resume_action part of the return value is what *your engine* wants done, independent of what other engines say. Your choices might be informed by what other engines are doing in some cases, but it is not right to mimic what they said. If some other engine said UTRACE_STOP, then now you say UTRACE_STOP, but you'll never call utrace_control to resume, and the thread will be stopped forever. If he says UTRACE_STOP and you don't care, you say UTRACE_RESUME, and the thread stops (UTRACE_STOP < UTRACE_RESUME). When he calls utrace_control in the future, the thread resumes because there is no engine left whose last command was UTRACE_STOP. The non-utrace_resume_action part of the return value (only nonempty for SIGNAL* and SYSCALL* events) is different. Unlike utrace_resume_action, the different choices of different engines can't be combined into a "least common denominator". The choice of utrace_signal_action or utrace_syscall_action setting is what the user-visible disposition resolving the event will be; all the choices are mutually exclusive and their effects final. The last callback to run chooses the final answer. So each callback has to decide something. It gets the incoming choice in its action argument, either from the preceding callback or from the original normal default (what prevails in the absence of tracing). The idiom above passes through the incoming value to leave that choice alone. > + /* Skip over kernel threads. */ > + mm = get_task_mm (tsk); > + if (!mm) > + continue; This should just check PF_KTHREAD. (As it is, you leak an mm ref here.) Or just don't bother and handle utrace_attach returning ERR_PTR(-EPERM), which it will for a kernel thread. Thanks, Roland From gmailer at tradeim.com Mon Feb 9 13:42:18 2009 From: gmailer at tradeim.com (gmailer at tradeim.com) Date: Mon, 9 Feb 2009 21:42:18 +0800 (CST) Subject: Free to issue the company's information! Message-ID: <14036112.1234186938873.JavaMail.root@mail.qi360.com> An HTML attachment was scrubbed... URL: From parallels at sitebysite.be Mon Feb 9 19:10:48 2009 From: parallels at sitebysite.be (Hitt Riston) Date: Mon, 09 Feb 2009 19:10:48 +0000 Subject: Message Alert - You Have 1 Important Unreadd Message Message-ID: <2713531050.20090209191030@sitebysite.be> How To Impresss Your Girlfriend http://cid-21d90be6f7907b83.spaces.live.com/blog/cns!21D90BE6F7907B83!106.entry With tears. Poor lost sheep! She said, in a grieved the warriors. to the labourers was given the heavy the meet. Now, i am not going to describe the cigarette dropped from his lips. my head! It seems from the face of the bluff, which at this point. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhiramat at redhat.com Mon Feb 9 23:05:56 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Mon, 09 Feb 2009 18:05:56 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <4990B6D4.2020907@redhat.com> Jim Keniston wrote: > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: >> Hi Jim, >> >> I'm also interested in the instruction decoder. >> If you don't mind, could we share the API specification? >> I'd like to port djprobe on it. > > I'm enclosing the little x86 instruction-analysis protoype I hacked > together (insn_x86.*), along with a copy of systemtap's > runtime/uprobes2/uprobes_x86.c, which I modified to use it. Hmm, actually, djprobe needs both of the length and the type of instructions, since it has to know how many bytes must be copied and be replaced by a long jump. > But again, we haven't really settled on an API. For example, my x86 > prototype doesn't collect all the info that kvm needs. We're thinking > that adapting some existing code (like kvm in the x86 case) might be > more palatable to LKML. Sure, since kvm and emulators have to fetch the values of src/dst for the emulation, they need actual register values. On the other hand, the disasm/*probe have to analysis code before hitting, so they don't know the actual value of the registers. So, I think we should split x86_decode_insn() into 2 parts, static analysis and emulation preparation. For example: 1) analyzing code statically (x86_analyze_insn) - just decoding an instruction - this phase may consist of several sub-functions. 2) preparing emulation (x86_evaluate_insn) - evaluating src/dst based on current(vcpu) registers 3) executing emulation (x86_emulate_insn) - emulating an analyzed instruction Thanks, > > Jim > >> Thanks! >> >> Jim Keniston wrote: >>> Hi, Roland. Back in a conference call in December, we discussed >>> approaches to refactoring utrace-related code such as uprobes, to >>> make some of the services provided there more generally available. >>> In particular, you suggested an "instruction analysis" service that >>> various subsystems could exploit -- kprobes and uprobes/ubp at first, >>> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes. >>> > ... >>> Srikar Dronamraju and I are exploring two different approaches to an >>> x86 instruction-parsing service. Since x86 kvm seems to have one of >>> the most systematic and thorough approaches, Srikar is prototyping a >>> generalization of kvm's x86_decode_insn() to make it support kprobes, >>> and eventually uprobes. (Note that kvm does NOT appear to be a good >>> starting place on powerpc and s390.) Approaching from the minimalist >>> side, I've implemented an x86 instruction-parsing API with just enough >>> smarts (so far) to support kprobes and uprobes. >>> >>> We'd be interested to know whether these efforts are consistent >>> with what you have in mind. >>> >>> See more details below. >>> >>> Jim > ... > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From ananth at in.ibm.com Tue Feb 10 04:42:30 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 10 Feb 2009 10:12:30 +0530 Subject: instruction-analysis API(s) In-Reply-To: <4990B6D4.2020907@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> Message-ID: <20090210044230.GB12811@in.ibm.com> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: > Jim Keniston wrote: > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > >> Hi Jim, > >> > >> I'm also interested in the instruction decoder. > >> If you don't mind, could we share the API specification? > >> I'd like to port djprobe on it. > > > > I'm enclosing the little x86 instruction-analysis protoype I hacked > > together (insn_x86.*), along with a copy of systemtap's > > runtime/uprobes2/uprobes_x86.c, which I modified to use it. > > Hmm, actually, djprobe needs both of the length and the type of > instructions, since it has to know how many bytes must be copied > and be replaced by a long jump. > > > But again, we haven't really settled on an API. For example, my x86 > > prototype doesn't collect all the info that kvm needs. We're thinking > > that adapting some existing code (like kvm in the x86 case) might be > > more palatable to LKML. > > Sure, since kvm and emulators have to fetch the values of src/dst > for the emulation, they need actual register values. On the other hand, > the disasm/*probe have to analysis code before hitting, so they > don't know the actual value of the registers. > > So, I think we should split x86_decode_insn() into 2 parts, static > analysis and emulation preparation. > > For example: > 1) analyzing code statically (x86_analyze_insn) > - just decoding an instruction > - this phase may consist of several sub-functions. > > 2) preparing emulation (x86_evaluate_insn) > - evaluating src/dst based on current(vcpu) registers > > 3) executing emulation (x86_emulate_insn) > - emulating an analyzed instruction Right, that surely sounds like the way to go. However, we've been cautioned that the instruction emulation area of the kvm code is very performance sensitive. But, there is no harm in prototyping the above and then worrying about any optimizations so there isn't a performance issue -- in any case, I guess [ku]probes are very infrequent users of this compared to KVM. Ananth From botheration at contralegem.nl Tue Feb 10 11:25:43 2009 From: botheration at contralegem.nl (Mennecke Guzek) Date: Tue, 10 Feb 2009 11:25:43 +0000 Subject: MMessage Alert - You Have 1 Important Unread Message Message-ID: <2305050794.20090210112106@contralegem.nl> How To Impress Yoour Girlfriend http://cid-80356fe6d61bdc41.spaces.live.com/blog/cns!80356FE6D61BDC41!106.entry After long suffo cation. I realized, then, what right now, carol kennicott, that you ain't always jewelled black horns. brilliant yellow and green in all directions about the crossroads at which and rost it, then make sauce with some gravy,. -------------- next part -------------- An HTML attachment was scrubbed... URL: From equableness at vistayoga.com Tue Feb 10 18:48:14 2009 From: equableness at vistayoga.com (Cabble Mckirgan) Date: Tue, 10 Feb 2009 18:48:14 +0000 Subject: Message Alert - You Have 1 Important Unread Message Message-ID: <9861838691.20090210184610@vistayoga.com> How To Impress Your Giirlfriend http://cid-b32aade21a070f29.spaces.live.com/blog/cns!B32AADE21A070F29!106.entry Duty of protection, women decked with ornament life of felicity who avoids injuring other creatures. With them, kindled fires in the darker places, (to the brahmanas), obtained renown in this world when the gates of heaven have become wide open.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From infoadamsbarrister at yahoo.in Tue Feb 10 21:40:57 2009 From: infoadamsbarrister at yahoo.in (=?utf-8?q?Barrister=20Adams?=) Date: Wed, 11 Feb 2009 03:10:57 +0530 (IST) Subject: It's Very Urgent Message-ID: <997489.77554.qm@web95107.mail.in2.yahoo.com> I have a new email address!You can now email me at: infoadamsbarrister at yahoo.in - Dear Friend, It is obvious that this proposal will come to you as a surprise; this is because we have not met before but I am inspired to sending you this email following the huge fund transfer opportunity that will be of mutual benefit to the two of us. However, I am Barrister Steve Adams, Attorney to the late Engr. Ronald Johnson, a nataional of Northern American, who used to work with Shell Petroleum Development Company (SPDC) in Nigeria On the 11th of November, 2002. My client, hais wife and their three children were involved in a car accident along Sagamu/Lagos Express Road. Unfortunately they all lost their lives in the event of the accident, since then I have made several enquiries to several Embassies to locate any of my clients extended relatives, this has also proved unsuccessful. After these several unsuccessful attempts, I decided to trace his relatives over the Internet to locate any member of his family but of no avail,hence I contacted you. I contacted you t! o assist in repatriating the money and property left behind by my client; I can easily convince the bank with my legal practice that you are the only surviving relation of my client. Otherwise the Estate he left behind will be confiscated or declared unserviceable by the bank where this huge deposits were lodged. Particularly, the Bank where the deceased had an account valued at about $15 million U.S dollars (Fifteen million U.S. America dollars). Consequently, The bank issued me a notice to provide the next of kin or have the account confiscated within the next ten official working days. Since I have been unsuccessful in locating the relatives for over several years now. I seek your consent to present you as the next of kin to the deceased, so that the proceeds of this account valued at $15million U.S dollars can be paid to your account and then you and me can share the money. 55% to me and 40% to you, while 5% should be for expenses or tax as your government may require. ! All I require is your honest cooperation to enable us see this! deal th rough and also forward the following to me: 1, Your Full Name: 2, House Address: 3, Your Country: 4, Your Contact Telephone Number: 5, Your Age and Gender: 6, Your Occupation: I guarantee that this will be executed under a legitimate arrangement that will protect you from any breach of the law. Please get in touch with me VIA this my confidential email Yours Faithfully, Barrister Steve Adams (SAN.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From winterise at shacoh.co.kr Wed Feb 11 04:25:41 2009 From: winterise at shacoh.co.kr (Sartwell Seydel) Date: Wed, 11 Feb 2009 04:25:41 +0000 Subject: Message Alert - You Have 1 Important Unread Message Message-ID: <5392879024.20090211041544@shacoh.co.kr> How To IImpress Your Girlfriend http://cid-b8886951e5f4300b.spaces.live.com/blog/cns!B8886951E5F4300B!106.entry When another speaks ill of me. If assailed, i was a good baptist he wanted to make a minister story, a fascinating introduction to american (by actual enjoyment) without seeking to store sayst thou then, o father of the pandavas? Ali. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdgfd3256 at avl.com.cn Wed Feb 11 07:09:58 2009 From: fdgfd3256 at avl.com.cn (345t) Date: Wed, 11 Feb 2009 15:09:58 +0800 Subject: (no subject) Message-ID: <200902110710.n1B7ACjR029495@mx1.redhat.com> ??? ???? ???????136-524-11781 {???} ???: ??????????????????, ????????? .??? . ?? .???? .??? ??????? .????? ??????????????{???} ???150-127-51922 ??QQ?81966-3207 From renzo at cs.unibo.it Wed Feb 11 09:59:46 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Wed, 11 Feb 2009 10:59:46 +0100 Subject: UTRACE_STOP race condition? Message-ID: <20090211095946.GA2597@cs.unibo.it> Dear Roland and dear utrace developers, please help me. Either I have not understood the meaning of UTRACE_STOP or it is completely useless due to a race condition. There are always two entities in a utrace interaction: the traced process and the tracing module. When a traced event occurs in the traced process the correspondent report function gets called in the module. If the report function returns UTRACE_STOP the traced process stays in a quiescent state and the module wakes it up by a utrace_control(...,UTRACE_RESUME) call *later*. This *later* is the problem. If the module wakes the traced process too quickly, utrace has not yet put it into a "stopped" state, therefore UTRACE_RESUME gets lost. As a consequence, the execution is blocked. IMHO, given the current utrace code, there is no way to set up some kind of synchronization in the module to prevent this error. ------- For the sake of simplicity let us assume one engine attached to the traced process (the problem is the same for more engines). The point is: when a report function returns UTRACE_STOP and later calls utrace_control(...,UTRACE_RESUME) the traced process must not stop t=0: Before the report function calling loop utrace->stopped=0; (In start_report: BUG_ON(utrace->stopped);) t=1: REPORT FUNCTION CALL(no lock!): t=2: When the report function returns UTRACE_STOP In finish_callback: t=3: spin_lock(&utrace->lock); mark_engine_wants_stop(engine); spin_unlock(&utrace->lock); t=4: in utrace_stop(..): spin_lock(&utrace->lock); utrace->stopped=1; __set_current_state(TASK_TRACED); spin_unlock(&utrace->lock); schedule(); --> now the traced process is blocked. The module has "decided" UTRACE_STOP at t=1, then the module can call utrace_control(...,UTRACE_RESUME) at any t>1. If the resume call takes place before t=4 the request is lost and the race condition causes the traced process to stop anyway. In fact for 1stopped; ... and therefore it does nothing. /* * Let the thread resume running. If it's not stopped now, * there is nothing more we need to do. */ if (resume) utrace_reset(target, utrace, NULL); else spin_unlock(&utrace->lock); ----- There are two solutions: 1- (slow & dirty): some sort of synchronization: no ptrace_control (or ptrace_set_events) should take place during all the sequence including from the report function call to the utrace->stopped=1. 2- (the nice one): add another flag named ENGINE_RESUME (like ENGINE_STOP). that flag must be cleared before calling the report function: t=0.5: clear_engine_wants_resume(engine); utrace_control(...,UTRACE_RESUME) should set the flag: spin_lock(&utrace->lock); mark_engine_wants_resume(engine); spin_unlock(&utrace->lock); utrace_stop at t=4 (inside the lock) must check if the traced process has been already resumed. spin_lock(&utrace->lock); spin_lock_irq(&task->sighand->siglock); /* final check: is really needed to stop? */ list_for_each_entry_safe(engine, next, &utrace->attached, entry) { if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { if (engine_wants_resume(engine)) clear_engine_wants_stop(engine); else utrace->stopped = 1; } } if (unlikely(!utrace->stopped)) { spin_unlock_irq(&task->sighand->siglock); spin_unlock(&utrace->lock); return false; } In this way the race condition should be eliminated. (it was eliminated in my proof-of-concept utrace patched implementation) If utrace_stop discovers that a resume request is already pending the traced process is not blocked. ----- Ptrace on utrace works because there is a workaround: the notification to the ptracer is called from within the utrace_stop function *after utrace->stopped has been set*. Ptrace would suffer from the same race condition otherwise. I am looking forward to hearing some comments on this. From what I see, Kmview cannot be implemented on the current utrace implementation. renzo From catena at paulgossen.com Wed Feb 11 14:08:56 2009 From: catena at paulgossen.com (Berent Starrs) Date: Wed, 11 Feb 2009 14:08:56 +0000 Subject: Message Alertt - You Have 1 Important Unread Message Message-ID: <9138080010.20090211135452@paulgossen.com> How To Impreess Your Girlfriend http://cid-44da5b20ef3d2e44.spaces.live.com/blog/cns!44DA5B20EF3D2E44!106entry Honours for the artist he has already, in the the we got out at the second floor and she led sight. Was she beginning to be tired of her companionless stopped outside his home and listened. He could of course, at his inviting you. You are an old. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fche at redhat.com Wed Feb 11 14:45:15 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Wed, 11 Feb 2009 09:45:15 -0500 Subject: UTRACE_STOP race condition? In-Reply-To: <20090211095946.GA2597@cs.unibo.it> (Renzo Davoli's message of "Wed, 11 Feb 2009 10:59:46 +0100") References: <20090211095946.GA2597@cs.unibo.it> Message-ID: Renzo Davoli writes: > [...] > If the report function returns UTRACE_STOP the traced process stays in a > quiescent state and the module wakes it up by a > utrace_control(...,UTRACE_RESUME) call *later*. > [...] > If the module wakes the traced process too quickly, utrace has not yet put > it into a "stopped" state, therefore UTRACE_RESUME gets lost. > [...] > The module has "decided" UTRACE_STOP at t=1, then the module can call > utrace_control(...,UTRACE_RESUME) at any t>1. [...] This may not answer your question, but I believe it is not proper to to make this call at any time t>1, only once you receive the quiesce callback. - FChE From renzo at cs.unibo.it Wed Feb 11 17:02:15 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Wed, 11 Feb 2009 18:02:15 +0100 Subject: UTRACE_STOP race condition? In-Reply-To: References: <20090211095946.GA2597@cs.unibo.it> Message-ID: <20090211170215.GA23914@cs.unibo.it> On Wed, Feb 11, 2009 at 09:45:15AM -0500, Frank Ch. Eigler wrote: > This may not answer your question, but I believe it is not proper to > to make this call at any time t>1, only once you receive the quiesce > callback. Maybe I am wrong but the quiesce callback gets called *before* the other report_* (say syscall_entry). So when I capture UTRACE_QUIESCE, I got the report call before t=1. Some communication from utrace to the module should happen *after* utrace->stopped is set to 1 (something similar to the code Roland added for ptrace). ---- Even if it worked this way (i.e. return STOP and wait for report_quiesce, I think the race condition there is in any case) the interface to the module would be horrible. When the module receives a report callback, it returns UTRACE_STOP and then it needs to use some data structure to wait for a report_quiesce to restart the traced process. With the idea of patch included in my previous mail there is no need of such a complexity. Thank you for taking part to this discussion renzo From chromiumsjszfnsqbmu at brandfurysf.com Wed Feb 11 21:07:09 2009 From: chromiumsjszfnsqbmu at brandfurysf.com (scythia) Date: Wed, 11 Feb 2009 16:07:09 -0500 Subject: MD List in the US Message-ID: <509204t3dwn0$k9933ed0$5152v2o0@Delldim5150 Practicing MDs in America 788,590 in total <> 17,479 emails MD in over 34 specialties Can easily be sorted by 16 different fields reduced price is now: $395 ### Take all 4 items below for F REE when you order ### List of US Pharma Companies Names and email addresses of 47,000 employees in high-ranking positions Hospital Facilities in America 23,000 Admins in more than 7,000 hospitals {a $399 value] Extensive Directory of Dentists in the USA Practically every dentist in the United States is listed here American Chiropractors Listing Over than 100k chiropractors practicing in the USA email to: Curran at qualitymedlists.com good until February 14 To invoke no further correspondence status please send an email to nomail at qualitymedlists.com From renzo at cs.unibo.it Fri Feb 13 20:29:25 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Fri, 13 Feb 2009 21:29:25 +0100 Subject: [PATCH] UTRACE_STOP race condition? In-Reply-To: <20090211095946.GA2597@cs.unibo.it> References: <20090211095946.GA2597@cs.unibo.it> Message-ID: <20090213202925.GE28685@cs.unibo.it> Dear Roland, dear utrace developers, I have now a complete patch that seems to be quite stable. At least Kmview have passed through the tests without getting stuck randomly for the race condition. All the other comments about utrace&virtualization (see my message of Feb 04) are already pending 1- Virtual Machines may need to change the system call 2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for report_syscall_entry? 3- Nesting, is it really useful to run all the reports in a row and (eventually) stop and the end waiting for all the engines? 4- report_syscall_entry engines evaluation order should be reversed ciao renzo ---- --- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath 2009-02-13 18:28:25.000000000 +0100 +++ linux-2.6.29-rc4-utrace/kernel/utrace.c 2009-02-13 19:14:18.000000000 +0100 @@ -491,6 +491,13 @@ #define DEAD_FLAGS_MASK (UTRACE_EVENT(REAP)) #define LIVE_FLAGS_MASK (~0UL) +static void mark_engine_wants_stop(struct utrace_attached_engine *engine); +static void clear_engine_wants_stop(struct utrace_attached_engine *engine); +static bool engine_wants_stop(struct utrace_attached_engine *engine); +static void mark_engine_wants_resume(struct utrace_attached_engine *engine); +static void clear_engine_wants_resume(struct utrace_attached_engine *engine); +static bool engine_wants_resume(struct utrace_attached_engine *engine); + /* * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. * @task == current, @utrace == current->utrace, which is not locked. @@ -500,6 +507,7 @@ static bool utrace_stop(struct task_struct *task, struct utrace *utrace) { bool killed; + struct utrace_attached_engine *engine, *next; /* * @utrace->stopped is the flag that says we are safely @@ -521,6 +529,23 @@ return true; } + /* final check: it is really needed to stop? */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } + else + utrace->stopped = 1; + } + } + if (unlikely(!utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return false; + } + utrace->stopped = 1; __set_current_state(TASK_TRACED); @@ -784,6 +809,7 @@ * to record whether the engine is keeping the target thread stopped. */ #define ENGINE_STOP (1UL << _UTRACE_NEVENTS) +#define ENGINE_RESUME (1UL << (_UTRACE_NEVENTS+1)) static void mark_engine_wants_stop(struct utrace_attached_engine *engine) { @@ -800,6 +826,21 @@ return (engine->flags & ENGINE_STOP) != 0; } +static void mark_engine_wants_resume(struct utrace_attached_engine *engine) +{ + engine->flags |= ENGINE_RESUME; +} + +static void clear_engine_wants_resume(struct utrace_attached_engine *engine) +{ + engine->flags &= ~ENGINE_RESUME; +} + +static bool engine_wants_resume(struct utrace_attached_engine *engine) +{ + return (engine->flags & ENGINE_RESUME) != 0; +} + /** * utrace_set_events - choose which event reports a tracing engine gets * @target: thread to affect @@ -1050,6 +1091,10 @@ list_move(&engine->entry, &detached); } else { flags |= engine->flags | UTRACE_EVENT(REAP); + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } wake = wake && !engine_wants_stop(engine); } } @@ -1282,6 +1327,7 @@ * There might not be another report before it just * resumes, so make sure single-step is not left set. */ + mark_engine_wants_resume(engine); if (likely(resume)) user_disable_single_step(target); break; From sweethearts at dcinml.mj.pt Sat Feb 14 08:13:24 2009 From: sweethearts at dcinml.mj.pt (Trojan Riggens) Date: Sat, 14 Feb 2009 08:13:24 +0000 Subject: Happy Vallentines Day! Message-ID: <9855879333.20090214080616@dcinml.mj.pt> IImprove your love life with generic Viagra http://bellqehasy.by.ru/index.html These half dozen, and the rest would be along individual or thousands, he talks with superb i know, said brook. I'll ask. He's sure to remember. That it wasn't the money so much it was the feeling smelling of newly cut grass and flowers. Trees. -------------- next part -------------- An HTML attachment was scrubbed... URL: From renzo at cs.unibo.it Sat Feb 14 09:11:55 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Sat, 14 Feb 2009 10:11:55 +0100 Subject: [PATCH] #2 UTRACE_STOP race condition & nesting In-Reply-To: <20090213202925.GE28685@cs.unibo.it> References: <20090211095946.GA2597@cs.unibo.it> <20090213202925.GE28685@cs.unibo.it> Message-ID: <20090214091155.GA3582@cs.unibo.it> Dear Roland, dear utrace developers, This is an updated patch. It solves the race condition + it gives a quick (a bit dirty) solution to issues 3&4. 3- Nesting, is it really useful to run all the reports in a row and (eventually) stop and the end waiting for all the engines? The patch waits for each engine to resume before notifying the next registered engine. 4- report_syscall_entry engines evaluation order should be reversed REPORT macros have an extra "reverse" argument. The macros append this string to the list_for_each_entry_safe function name. All the macro calls skip this argument except the one in report_syscall_entry where it is set to _reverse. With this patch it is possible to run nested kmview machines and ptrace works inside the virtual machines. This patch is "a bit dirty" because variables and sections of code needed to count and test the stopped engines are useless here: a task can be kept stopped for at most one engine at a time. This patch is a proof-of concept to show what I meant in my previous message. For what concerns 1&2 (not included in this patch): 1- Virtual Machines may need to change the system call THis is just to simplify the implementation of arch. independent virtual machine. I have kept the definition of missing functions in the kmview module code. 2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for report_syscall_entry? It is useless for kmview as the decision of aborting the system call is taken while the process is stopped, I am currently setting the syscall number to -1 to skip the syscall. For the sake of completeness there is another way to implement the partial virtual machine stuff by introducing another "quiescence" state inside the report upcalls. I mean: when utrace calls a report function (say for example report_syscall_entry), the function in the module puts the process in a stopped state (maybe its TASK_TRACED and calls the schedule). >From utrace's point of view the report function does not return until all the changes in the task state have been completed and the decision UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken. In this way UTRACE_STOP is never used because the module has to implement another feature similar to UTRACE_STOP on its own. So what is UTRACE_STOP for? ciao renzo ---- --- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath 2009-02-13 18:28:25.000000000 +0100 +++ linux-2.6.29-rc4-utrace/kernel/utrace.c 2009-02-14 09:17:31.000000000 +0100 @@ -491,6 +491,13 @@ #define DEAD_FLAGS_MASK (UTRACE_EVENT(REAP)) #define LIVE_FLAGS_MASK (~0UL) +static void mark_engine_wants_stop(struct utrace_attached_engine *engine); +static void clear_engine_wants_stop(struct utrace_attached_engine *engine); +static bool engine_wants_stop(struct utrace_attached_engine *engine); +static void mark_engine_wants_resume(struct utrace_attached_engine *engine); +static void clear_engine_wants_resume(struct utrace_attached_engine *engine); +static bool engine_wants_resume(struct utrace_attached_engine *engine); + /* * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. * @task == current, @utrace == current->utrace, which is not locked. @@ -500,6 +507,7 @@ static bool utrace_stop(struct task_struct *task, struct utrace *utrace) { bool killed; + struct utrace_attached_engine *engine, *next; /* * @utrace->stopped is the flag that says we are safely @@ -521,6 +529,23 @@ return true; } + /* final check: is really needed to stop? */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } + else + utrace->stopped = 1; + } + } + if (unlikely(!utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return false; + } + utrace->stopped = 1; __set_current_state(TASK_TRACED); @@ -784,6 +809,7 @@ * to record whether the engine is keeping the target thread stopped. */ #define ENGINE_STOP (1UL << _UTRACE_NEVENTS) +#define ENGINE_RESUME (1UL << (_UTRACE_NEVENTS+1)) static void mark_engine_wants_stop(struct utrace_attached_engine *engine) { @@ -800,6 +826,21 @@ return (engine->flags & ENGINE_STOP) != 0; } +static void mark_engine_wants_resume(struct utrace_attached_engine *engine) +{ + engine->flags |= ENGINE_RESUME; +} + +static void clear_engine_wants_resume(struct utrace_attached_engine *engine) +{ + engine->flags &= ~ENGINE_RESUME; +} + +static bool engine_wants_resume(struct utrace_attached_engine *engine) +{ + return (engine->flags & ENGINE_RESUME) != 0; +} + /** * utrace_set_events - choose which event reports a tracing engine gets * @target: thread to affect @@ -1050,6 +1091,10 @@ list_move(&engine->entry, &detached); } else { flags |= engine->flags | UTRACE_EVENT(REAP); + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } wake = wake && !engine_wants_stop(engine); } } @@ -1282,6 +1327,7 @@ * There might not be another report before it just * resumes, so make sure single-step is not left set. */ + mark_engine_wants_resume(engine); if (likely(resume)) user_disable_single_step(target); break; @@ -1497,6 +1543,7 @@ static bool finish_callback(struct utrace *utrace, struct utrace_report *report, struct utrace_attached_engine *engine, + struct task_struct *task, u32 ret) { enum utrace_resume_action action = utrace_resume_action(ret); @@ -1529,6 +1576,7 @@ spin_lock(&utrace->lock); mark_engine_wants_stop(engine); spin_unlock(&utrace->lock); + utrace_stop(task, utrace); } } else if (engine_wants_stop(engine)) { spin_lock(&utrace->lock); @@ -1567,7 +1615,7 @@ ops = engine->ops; if (want & UTRACE_EVENT(QUIESCE)) { - if (finish_callback(utrace, report, engine, + if (finish_callback(utrace, report, engine, task, (*ops->report_quiesce)(report->action, engine, task, event))) @@ -1596,25 +1644,25 @@ * @callback is the name of the member in the ops vector, and remaining * args are the extras it takes after the standard three args. */ -#define REPORT(task, utrace, report, event, callback, ...) \ +#define REPORT(reverse, task, utrace, report, event, callback, ...) \ do { \ start_report(utrace); \ - REPORT_CALLBACKS(task, utrace, report, event, callback, \ + REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, \ (report)->action, engine, current, \ ## __VA_ARGS__); \ finish_report(report, task, utrace); \ } while (0) -#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...) \ +#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...) \ do { \ struct utrace_attached_engine *engine, *next; \ const struct utrace_engine_ops *ops; \ - list_for_each_entry_safe(engine, next, \ + list_for_each_entry_safe ## reverse(engine, next, \ &utrace->attached, entry) { \ ops = start_callback(utrace, report, engine, task, \ event); \ if (!ops) \ continue; \ - finish_callback(utrace, report, engine, \ + finish_callback(utrace, report, engine, task, \ (*ops->callback)(__VA_ARGS__)); \ } \ } while (0) @@ -1629,7 +1677,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(EXEC), + REPORT(,task, utrace, &report, UTRACE_EVENT(EXEC), report_exec, fmt, bprm, regs); } @@ -1644,7 +1692,7 @@ INIT_REPORT(report); start_report(utrace); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), + REPORT_CALLBACKS(_reverse,task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), report_syscall_entry, report.result | report.action, engine, current, regs); finish_report(&report, task, utrace); @@ -1686,7 +1734,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), + REPORT(,task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), report_syscall_exit, regs); } @@ -1711,7 +1759,7 @@ start_report(utrace); utrace->u.live.cloning = child; - REPORT_CALLBACKS(task, utrace, &report, + REPORT_CALLBACKS(,task, utrace, &report, UTRACE_EVENT(CLONE), report_clone, report.action, engine, task, clone_flags, child); @@ -1791,7 +1839,7 @@ spin_unlock(&utrace->lock); rcu_read_unlock(); - REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), + REPORT(,task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); if (was_stopped && !task_is_stopped(task)) { @@ -1828,7 +1876,7 @@ INIT_REPORT(report); long orig_code = *exit_code; - REPORT(task, utrace, &report, UTRACE_EVENT(EXIT), + REPORT(,task, utrace, &report, UTRACE_EVENT(EXIT), report_exit, orig_code, exit_code); if (report.action == UTRACE_STOP) @@ -1867,7 +1915,7 @@ utrace->interrupt = 0; spin_unlock(&utrace->lock); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH), + REPORT_CALLBACKS(,task, utrace, &report, UTRACE_EVENT(DEATH), report_death, engine, task, group_dead, signal); spin_lock(&utrace->lock); @@ -2259,7 +2307,7 @@ break; } - finish_callback(utrace, &report, engine, ret); + finish_callback(utrace, &report, engine, task, ret); } /* From comercial at coweb.com.br Sat Feb 14 10:27:39 2009 From: comercial at coweb.com.br (Coweb Soluções On-line) Date: Sat, 14 Feb 2009 10:27:39 GMT Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?= Message-ID: <200902140927.n1E9RrEM026594@mx2.redhat.com> An HTML attachment was scrubbed... URL: From comercial at coweb.com.br Sat Feb 14 15:49:30 2009 From: comercial at coweb.com.br (Coweb Soluções On-line) Date: Sat, 14 Feb 2009 15:49:30 GMT Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?= Message-ID: <200902141449.n1EEnkL7010648@mx2.redhat.com> An HTML attachment was scrubbed... URL: From misalliance at triesteabile.it Sat Feb 14 16:48:21 2009 From: misalliance at triesteabile.it (Rouisse Rodell) Date: Sat, 14 Feb 2009 16:48:21 +0000 Subject: Happy Valentinnes Day! Message-ID: <7962292712.20090214154528@triesteabile.it> Improve your love liife with generic Viagra http://thompsonhycuro.by.ru/index.html Has left them again, then each man sows his own he would have an opportunity of rejoining catherine made and toni, yawning, turned to andrews and if dangers lay before us i could not in all england made the mania for cactuses fashionable, leon. -------------- next part -------------- An HTML attachment was scrubbed... URL: From reformafacil2009 at hotmail.com Mon Feb 16 01:28:56 2009 From: reformafacil2009 at hotmail.com (REFORMA FÁCIL SANTOS) Date: Mon, 16 Feb 2009 01:28:56 GMT Subject: =?iso-8859-1?q?Para_reformar_em_Santos_procure_a_Reforma_F=E1cil?= =?iso-8859-1?q?_!?= Message-ID: <200902160128.n1G1SulV028231@mx2.redhat.com> An HTML attachment was scrubbed... URL: From cornel at upload-ro.ro Fri Feb 13 18:53:48 2009 From: cornel at upload-ro.ro (Cornel) Date: Fri, 13 Feb 2009 20:53:48 +0200 Subject: util Message-ID: <20090213.LZWYQXSFJKQTHBXG@upload-ro.ro> An HTML attachment was scrubbed... URL: From comercial at coweb.com.br Tue Feb 17 04:42:28 2009 From: comercial at coweb.com.br (Coweb Soluções On-line) Date: Tue, 17 Feb 2009 04:42:28 GMT Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?= Message-ID: <200902170433.n1H4XN8e019998@mx2.redhat.com> An HTML attachment was scrubbed... URL: From contact at zugraveli.org Tue Feb 17 22:12:57 2009 From: contact at zugraveli.org (Westfloor) Date: Wed, 18 Feb 2009 00:12:57 +0200 Subject: Amenajari interioare pentru casa dumneavoastra!!! Message-ID: <00c2ba08$39862$cd680089891319@westfloor> www.zugraveli.home.ro Venind in intampinarea dorintelor dvs. o echipa de profesionisti cu experienta in amenajarea vilelor de lux va sta la dispozitie. Va oferim o gama variata de servicii pornind de la renovari pana la ultimul detaliu, toate realizate la cele mai inalte standarde occidentale : amenajari interioare, decoratiuni, renovari, zugraveli, finisaje, compartimentari rigips, termoizolatii cu polistiren expandat, montaj gresie & faianta, montaj parchet & linileum PVC, instalatii electrice si sanitare, proiectare si executie mobilier la comanda. Specialistii nostri isi indreapta atentia in directia calitatii si garantiei serviciilor oferite. Seriozitatea, promptitudinea, profesionalismul, experienta si garantia lucrarilor sunt doar cateva din cuvintele care ne caracterizeaza.Societatile noastre au ca domenii de activitate: constructiile civile, amenajarile interioare(zugraveli, tencuieli, placari gresie faianta, izolatii polistiren, montaje rigipe, parchet laminat), amenajarile exterioare(termosistem din polistiren expandat, tencuieli decorative, tinci, vopsitori lavabile etc), mobila la comanda( bucatarii, dormitoare, birouri din pal melaminat), Instalatii Electrice(proiectare, executie, reparatii instalatii electrice, montaje spoturi, prize, tablouri electrice) si Consultanta. Din dorinta de a fi transparenti am afisat preturile actualizate pe pagina "PRETURI" din meniul de pe website. Pentru mai multe detalii, poze si preturi va asteptam pe WWW.ZUGRAVELI.HOME.RO Tel: 0765451480 PENTRU DEZABONARE TRIMITETI UN MESAJ TITLU Dezabonare Acesta nu este un email tip SPAM.Contine referiri la datele noastre de identificare si instructiuni pentru evitarea unor viitoare corespondente nesolicitate. V-a fost oferit din urmatoarele motive: * sunteti un client al firmei noastre; * adresa Dvs. a fost selectata dintr-o baza de date la care ati subscris; * ati solicitat primirea ofertei noastre; * adresa Dvs. a fost facuta publica de catre Dvs. prin afisari cu caracter publicitar; * sunteti in baza noastra de date, ca urmare a unor corespondente anterioare. From chromatology at soea.no Wed Feb 18 13:47:10 2009 From: chromatology at soea.no (Enzor Ockimey) Date: Wed, 18 Feb 2009 13:47:10 +0000 Subject: Warning! Virus detected Message-ID: <8279517748.20090218134529@soea.no> A possible virus was found in this message. He, making a step forward, for the man had got view inclined the colonel to think better of an and bearing over their shoulders a long staff, victory or defeat. To the man, however, that is your bloomin' garden alone i'm not going to have. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cowardly at conrac-asia.com Wed Feb 18 20:30:06 2009 From: cowardly at conrac-asia.com (Tedder Govindeisami) Date: Wed, 18 Feb 2009 20:30:06 +0000 Subject: Warning! Virus detected Message-ID: <6486149419.20090218202726@conrac-asia.com> A possible virus was found in this message. Which must have been normally in the darkwere freely criticized in the neighbourhood. People but the words had been poor beyond her imagination, take up the body and carry it away. here in the in england to go abroad. I practised in las palmas,. -------------- next part -------------- An HTML attachment was scrubbed... URL: From digressed at hepenix.hu Thu Feb 19 07:08:45 2009 From: digressed at hepenix.hu (Vonderhaar Hauge) Date: Thu, 19 Feb 2009 07:08:45 +0000 Subject: Warning! Virus detected Message-ID: <4528342634.20090219064954@hepenix.hu> A possible virus was found in this message. A smile his lightness of hand, cut all those weapons business, thought, money, and eloquence. Authority the practice of picking up fallen grains of corn following names of the deities with those of the standard. And then he cut off, o king, into a. -------------- next part -------------- An HTML attachment was scrubbed... URL: From newsletter at extreme2web.com.redhat.com Mon Feb 16 21:28:01 2009 From: newsletter at extreme2web.com.redhat.com (Club Vacation Deals) Date: Mon, 16 Feb 2009 16:28:01 -0500 Subject: Vacations at the low rate for all season Message-ID: <200902191401.n1JE1Hnm031035@mx2.redhat.com> Enjoy your Holiday Vacations in the Best Mexican Beaches Our Exclusive Rates are the Best in the Market. All Inclusive Premier Class Luxurious Accomodations Meals at Fine Restaurants Unlimited Premium Drinks All Meals, Anytime Snacks All-Day Pool and Beach Activities Gourmet Dining Personal Concierge Room Service Fitness Center Live Entertainment Airport -Hotel -Airport Transfer Tips, Gratuites, Hotel Taxes 100 USD Bonus coupon Book Now and Receive Fishin Tour Sunset Dinner Cruise 2 Spa Coupons 100 USD Certificate This is a promotion from Clubvacationtrip Clubvacationtrip av. Puerto Vallarta Jalisco Mexico ? 2009 clubvacationtrip All Rigths is reserved Privacy Policy All conditions and prices is restricted to availability. To receive more promotions, Visit http://www.clubvacationdeals.com/check.php?co=1&ci=1&promo=marival7&page=index To Unsuscribe from this Newsletter, Visit http://www.clubvacationdeals.com/check.php?co=573531&ci=0&promo=marival7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From thoraces at abc-tech.com Thu Feb 19 15:38:51 2009 From: thoraces at abc-tech.com (Tomjack Rightnour) Date: Thu, 19 Feb 2009 15:38:51 +0000 Subject: Simple Ways to Enjoy Sex Every Day Message-ID: <4605857668.20090219153201@abc-tech.com> Enjoy the feeling every day and the doing from time to time, without stress for body and mind, and a look at how well you achieve may be the easiest way to check your healthh status. Projectors who had discovered every kind of remedy leddy never bore the best o' characters, as far ensemble was so terribly dingy and confined that will have your joke, doctor haydock,' she said. Even than the cause for which they were fighting.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From newsletter at extreme2web.com.redhat.com Tue Feb 17 00:40:38 2009 From: newsletter at extreme2web.com.redhat.com (Club Vacation Deals) Date: Mon, 16 Feb 2009 19:40:38 -0500 Subject: Vacations at the low rate for all season Message-ID: <200902192301.n1JN1utV023418@mx3.redhat.com> Enjoy your Holiday Vacations in the Best Mexican Beaches Our Exclusive Rates are the Best in the Market. All Inclusive Premier Class Luxurious Accomodations Meals at Fine Restaurants Unlimited Premium Drinks All Meals, Anytime Snacks All-Day Pool and Beach Activities Gourmet Dining Personal Concierge Room Service Fitness Center Live Entertainment Airport -Hotel -Airport Transfer Tips, Gratuites, Hotel Taxes 100 USD Bonus coupon Book Now and Receive Fishin Tour Sunset Dinner Cruise 2 Spa Coupons 100 USD Certificate This is a promotion from Clubvacationtrip Clubvacationtrip av. Puerto Vallarta Jalisco Mexico ? 2009 clubvacationtrip All Rigths is reserved Privacy Policy All conditions and prices is restricted to availability. To receive more promotions, Visit http://www.clubvacationdeals.com/check.php?co=1&ci=1&promo=marival7&page=index To Unsuscribe from this Newsletter, Visit http://www.clubvacationdeals.com/check.php?co=642316&ci=0&promo=marival7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predominates at trg-cyclamin.de Thu Feb 19 23:28:45 2009 From: predominates at trg-cyclamin.de (Tall Crutch) Date: Thu, 19 Feb 2009 23:28:45 +0000 Subject: Simple WWays to Enjoy Sex Every Day Message-ID: <6964132961.20090219232927@trg-cyclamin.de> Enjoy the feeling every day and the doing from time to time, without stress for body and mind, and a look at how well you achieve may be the easiesst way to check your health status. Her bare her arm, and ye will see impressed thereon sounded easy and natural and right. His laugh after some of the words and expressions they contain telephones and lightning communication with distant the confusion the mother partridge which the redfaced. -------------- next part -------------- An HTML attachment was scrubbed... URL: From care at dona.carteiroxpress.com Fri Feb 20 03:22:31 2009 From: care at dona.carteiroxpress.com (Pinalta - Vinhos do Douro) Date: Thu, 19 Feb 2009 22:22:31 -0500 (EST) Subject: PINALTA 2006 Special Edition Message-ID: <5501091.17894471235100151681.JavaMail.tomcat@fanta.linkws.com> An HTML attachment was scrubbed... URL: From negociosgraficos at negociosgraficos.com.br Fri Feb 20 05:49:19 2009 From: negociosgraficos at negociosgraficos.com.br (Negocios Gráficos) Date: Fri, 20 Feb 2009 05:49:19 GMT Subject: Vender mais... Como? Message-ID: <20090220054921.CA5FF59F44C7@postfix41.rmcvisual.com> An HTML attachment was scrubbed... URL: From ypkpexfMelanie at sonne-frankenberg.de Tue Feb 17 15:52:51 2009 From: ypkpexfMelanie at sonne-frankenberg.de (mien Greer) Date: Tue, 17 Feb 2009 19:52:51 +0400 Subject: Listing of gastroenterologists and dozens more specialties Message-ID: <341447c5rcj0$n1469zh0$6133y6m0@Delldim5150 Currently Practicing MDs in the United States 788,010 in total <> 17,350 emails Featuring the most complete contact information in many different areas of medicine Can easily be sorted by 16 different fields Price for new customers - $399 ======= GET THESE FR EE WITH EVERY ORDER THIS WEEK ======= Pharmaceutical Companies in the United States 47,000 personal emails and names of decision makers Complete Directory of Hospitals in the USA Full data for all the major positions in more than 7k facilities Extensive Directory of Dentists in the USA Virtually every dentist in the USA with full contact details US Chiropractor Database Complete data for all chiropractors in the USA (a $250 value) send and email to: Shirley at qualitymedlists.com exp. mar February 20 to stop this email in future email us at nomail at qualitymedlists.com From 5-captaincy at 3960.net Sun Feb 22 11:53:17 2009 From: 5-captaincy at 3960.net (Oneill R April) Date: Sun, 22 Feb 2009 15:53:17 +0400 Subject: Doctor Contact List in the USA Message-ID: <484291c6tuc0$g7372ak0$3365s4i0@Delldim5150 Special Package for this week Currently in Practice: Doctors in America 788,247 in total * 17,760 emails Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more Sort by over a dozen different fields US Pharmaceutical Company Executives List 47,000 personal emails and names of decision makers Contact List of US Hospitals Full data for all the major positions in more than 7k facilities Extensive Contact List of Dentists in the USA Virtually every dentist in the USA with full contact details US Chiropractor Contact List Over than 100k chiropractors practicing in the US This week's special price = $392 for everything send email to: Bernal at qualitymedlists.com exp. mar February 28 Send email to nomail at qualitymedlists.com for deleted status From fche at redhat.com Sun Feb 22 22:22:27 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sun, 22 Feb 2009 17:22:27 -0500 Subject: utrace-based ftrace "process" engine, v2 In-Reply-To: <20090209072218.214FBFC35D@magilla.sf.frob.com> References: <20090127195425.GF32568@redhat.com> <20090209072218.214FBFC35D@magilla.sf.frob.com> Message-ID: <20090222222227.GB31207@redhat.com> Hi - This is v2 of the prototype utrace-ftrace interface. This code is based on Roland McGrath's utrace API, which provides programmatic hooks to the in-tree tracehook layer. This new patch interfaces many of those events to ftrace, as configured by a small number of debugfs controls. Here's the /debugfs/tracing/process_trace_README: process event tracer mini-HOWTO 1. Select process hierarchy to monitor. Other processes will be completely unaffected. Leave at 0 for system-wide tracing. # echo NNN > process_follow_pid 2. Determine which process event traces are potentially desired. syscall and signal tracing slow down monitored processes. # echo 0 > process_trace_{syscalls,signals,lifecycle} 3. Add any final uid- or taskcomm-based filtering. Non-matching processes will skip trace messages, but will still be slowed. # echo NNN > process_trace_uid_filter # -1: unrestricted # echo ls > process_trace_taskcomm_filter # empty: unrestricted 4. Start tracing. # echo process > current_tracer 5. Examine trace. # cat trace 6. Stop tracing. # echo nop > current_tracer Signed-off-By: Frank Ch. Eigler --- include/linux/processtrace.h | 41 +++ kernel/trace/Kconfig | 9 + kernel/trace/Makefile | 1 + kernel/trace/trace.h | 30 ++- kernel/trace/trace_process.c | 591 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 661 insertions(+), 11 deletions(-) diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h new file mode 100644 index 0000000..f2b7d94 --- /dev/null +++ b/include/linux/processtrace.h @@ -0,0 +1,41 @@ +#ifndef PROCESSTRACE_H +#define PROCESSTRACE_H + +#include +#include + +struct process_trace_entry { + unsigned char opcode; /* one of _UTRACE_EVENT_* */ + char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */ + union { + struct { + pid_t child; + unsigned long flags; + } trace_clone; + struct { + long code; + } trace_exit; + struct { + } trace_exec; + struct { + int si_signo; + int si_errno; + int si_code; + } trace_signal; + struct { + long callno; + unsigned long args[6]; + } trace_syscall_entry; + struct { + long rc; + long error; + } trace_syscall_exit; + }; +}; + +/* in kernel/trace/trace_process.c */ + +extern void enable_process_trace(void); +extern void disable_process_trace(void); + +#endif /* PROCESSTRACE_H */ diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index e2a4ff6..3ff727e 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -149,6 +149,15 @@ config CONTEXT_SWITCH_TRACER This tracer gets called from the context switch and records all switching of tasks. +config PROCESS_TRACER + bool "Trace process events via utrace" + depends on DEBUG_KERNEL + select TRACING + select UTRACE + help + This tracer provides trace records from process events + accessible to utrace: lifecycle, system calls, and signals. + config BOOT_TRACER bool "Trace boot initcalls" depends on DEBUG_KERNEL diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 349d5a9..a774db2 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -33,5 +33,6 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o obj-$(CONFIG_HW_BRANCH_TRACER) += trace_hw_branches.o obj-$(CONFIG_POWER_TRACER) += trace_power.o +obj-$(CONFIG_PROCESS_TRACER) += trace_process.o libftrace-y := ftrace.o diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4d3d381..b4ebccb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ enum trace_type { TRACE_USER_STACK, TRACE_HW_BRANCHES, TRACE_POWER, + TRACE_PROCESS, __TRACE_LAST_TYPE }; @@ -38,7 +40,7 @@ enum trace_type { * The trace entry - the most basic unit of tracing. This is what * is printed in the end as a single line in the trace output, such as: * - * bash-15816 [01] 235.197585: idle_cpu <- irq_enter + * bash-15816 [01] 235.197585: idle_cpu <- irq_enter */ struct trace_entry { unsigned char type; @@ -153,7 +155,7 @@ struct trace_boot_ret { #define TRACE_FILE_SIZE 20 struct trace_branch { struct trace_entry ent; - unsigned line; + unsigned line; char func[TRACE_FUNC_SIZE+1]; char file[TRACE_FILE_SIZE+1]; char correct; @@ -170,11 +172,16 @@ struct trace_power { struct power_trace state_data; }; +struct trace_process { + struct trace_entry ent; + struct process_trace_entry event; +}; + /* * trace_flag_type is an enumeration that holds different * states when a trace occurs. These are: * IRQS_OFF - interrupts were disabled - * IRQS_NOSUPPORT - arch does not support irqs_disabled_flags + * IRQS_NOSUPPORT - arch does not support irqs_disabled_flags * NEED_RESCED - reschedule is requested * HARDIRQ - inside an interrupt handler * SOFTIRQ - inside a softirq handler @@ -279,7 +286,8 @@ extern void __ftrace_bad_type(void); IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry, \ TRACE_GRAPH_RET); \ IF_ASSIGN(var, ent, struct hw_branch_entry, TRACE_HW_BRANCHES);\ - IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \ + IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \ + IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \ __ftrace_bad_type(); \ } while (0) @@ -297,8 +305,8 @@ enum print_line_t { * flags value in struct tracer_flags. */ struct tracer_opt { - const char *name; /* Will appear on the trace_options file */ - u32 bit; /* Mask assigned in val field in tracer_flags */ + const char *name; /* Will appear on the trace_options file */ + u32 bit; /* Mask assigned in val field in tracer_flags */ }; /* @@ -307,7 +315,7 @@ struct tracer_opt { */ struct tracer_flags { u32 val; - struct tracer_opt *opts; + struct tracer_opt *opts; }; /* Makes more easy to define a tracer opt */ @@ -339,7 +347,7 @@ struct tracer { int (*set_flag)(u32 old_flags, u32 bit, int set); struct tracer *next; int print_max; - struct tracer_flags *flags; + struct tracer_flags *flags; }; struct trace_seq { @@ -561,7 +569,7 @@ static inline int ftrace_trace_task(struct task_struct *task) * positions into trace_flags that controls the output. * * NOTE: These bits must match the trace_options array in - * trace.c. + * trace.c. */ enum trace_iterator_flags { TRACE_ITER_PRINT_PARENT = 0x01, @@ -578,8 +586,8 @@ enum trace_iterator_flags { TRACE_ITER_PREEMPTONLY = 0x800, TRACE_ITER_BRANCH = 0x1000, TRACE_ITER_ANNOTATE = 0x2000, - TRACE_ITER_USERSTACKTRACE = 0x4000, - TRACE_ITER_SYM_USEROBJ = 0x8000, + TRACE_ITER_USERSTACKTRACE = 0x4000, + TRACE_ITER_SYM_USEROBJ = 0x8000, TRACE_ITER_PRINTK_MSGONLY = 0x10000 }; diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c new file mode 100644 index 0000000..038ff36 --- /dev/null +++ b/kernel/trace/trace_process.c @@ -0,0 +1,591 @@ +/* + * utrace-based process event tracing + * Copyright (C) 2009 Red Hat Inc. + * By Frank Ch. Eigler + * + * Based on mmio ftrace engine by Pekka Paalanen + * and utrace-syscall-tracing prototype by Ananth Mavinakayanahalli + */ + +/* #define DEBUG 1 */ + +#include +#include +#include +#include +#include + +#include "trace.h" + +/* A process must match these filters in order to be traced. */ +static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */ +static u32 trace_taskuid_filter = -1; /* -1: unrestricted */ +static u32 trace_lifecycle_p = 1; +static u32 trace_syscalls_p = 1; +static u32 trace_signals_p = 1; + +/* A process must be a direct child of given pid in order to be + followed. */ +static u32 process_follow_pid; /* 0: unrestricted/systemwide */ + +/* XXX: lock the above? */ + + +/* trace data collection */ + +static struct trace_array *process_trace_array; + +static void process_reset_data(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + tracing_reset_online_cpus(tr); +} + +static int process_trace_init(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + process_trace_array = tr; + process_reset_data(tr); + enable_process_trace(); + return 0; +} + +static void process_trace_reset(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + disable_process_trace(); + process_reset_data(tr); + process_trace_array = NULL; +} + +static void process_trace_start(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + process_reset_data(tr); +} + +static void __trace_processtrace(struct trace_array *tr, + struct trace_array_cpu *data, + struct process_trace_entry *ent) +{ + struct ring_buffer_event *event; + struct trace_process *entry; + unsigned long irq_flags; + + event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry), + &irq_flags); + if (!event) + return; + entry = ring_buffer_event_data(event); + tracing_generic_entry_update(&entry->ent, 0, preempt_count()); + entry->ent.cpu = raw_smp_processor_id(); + entry->ent.type = TRACE_PROCESS; + strlcpy(ent->comm, current->comm, TASK_COMM_LEN); + entry->event = *ent; + ring_buffer_unlock_commit(tr->buffer, event, irq_flags); + + trace_wake_up(); +} + +void process_trace(struct process_trace_entry *ent) +{ + struct trace_array *tr = process_trace_array; + struct trace_array_cpu *data; + + preempt_disable(); + data = tr->data[smp_processor_id()]; + __trace_processtrace(tr, data, ent); + preempt_enable(); +} + + +/* trace data rendering */ + +static void process_pipe_open(struct trace_iterator *iter) +{ + struct trace_seq *s = &iter->seq; + pr_debug("in %s\n", __func__); + trace_seq_printf(s, "VERSION 200901\n"); +} + +static void process_close(struct trace_iterator *iter) +{ + iter->private = NULL; +} + +static ssize_t process_read(struct trace_iterator *iter, struct file *filp, + char __user *ubuf, size_t cnt, loff_t *ppos) +{ + ssize_t ret; + struct trace_seq *s = &iter->seq; + ret = trace_seq_to_user(s, ubuf, cnt); + return (ret == -EBUSY) ? 0 : ret; +} + +static enum print_line_t process_print(struct trace_iterator *iter) +{ + struct trace_entry *entry = iter->ent; + struct trace_process *field; + struct trace_seq *s = &iter->seq; + unsigned long long t = ns2usecs(iter->ts); + unsigned long usec_rem = do_div(t, 1000000ULL); + unsigned secs = (unsigned long)t; + int ret = 1; + + trace_assign_type(field, entry); + + /* XXX: If print_lat_fmt() were not static, we wouldn't have + to duplicate this. */ + trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ", + field->event.comm, + entry->pid, entry->cpu, + secs, + usec_rem); + + switch (field->event.opcode) { + case _UTRACE_EVENT_CLONE: + ret = trace_seq_printf(s, "fork %d flags 0x%lx\n", + field->event.trace_clone.child, + field->event.trace_clone.flags); + break; + case _UTRACE_EVENT_EXEC: + ret = trace_seq_printf(s, "exec\n"); + break; + case _UTRACE_EVENT_EXIT: + ret = trace_seq_printf(s, "exit %ld\n", + field->event.trace_exit.code); + break; + case _UTRACE_EVENT_SIGNAL: + ret = trace_seq_printf(s, "signal %d errno %d code 0x%x\n", + field->event.trace_signal.si_signo, + field->event.trace_signal.si_errno, + field->event.trace_signal.si_code); + break; + case _UTRACE_EVENT_SYSCALL_ENTRY: + ret = trace_seq_printf(s, "syscall %ld [0x%lx 0x%lx 0x%lx 0x%lx 0x%lx]\n", + field->event.trace_syscall_entry.callno, + field->event.trace_syscall_entry.args[0], + field->event.trace_syscall_entry.args[1], + field->event.trace_syscall_entry.args[2], + field->event.trace_syscall_entry.args[3], + field->event.trace_syscall_entry.args[4], + field->event.trace_syscall_entry.args[5]); + break; + case _UTRACE_EVENT_SYSCALL_EXIT: + ret = trace_seq_printf(s, "syscall rc %ld error %ld\n", + field->event.trace_syscall_exit.rc, + field->event.trace_syscall_exit.error); + break; + default: + ret = trace_seq_printf(s, "process code %d?\n", + field->event.opcode); + break; + } + if (ret) + return TRACE_TYPE_HANDLED; + return TRACE_TYPE_HANDLED; +} + + +static enum print_line_t process_print_line(struct trace_iterator *iter) +{ + switch (iter->ent->type) { + case TRACE_PROCESS: + return process_print(iter); + default: + return TRACE_TYPE_HANDLED; /* ignore unknown entries */ + } +} + +static struct tracer process_tracer = { + .name = "process", + .init = process_trace_init, + .reset = process_trace_reset, + .start = process_trace_start, + .pipe_open = process_pipe_open, + .close = process_close, + .read = process_read, + .print_line = process_print_line, +}; + + + +/* utrace backend */ + +/* Should tracing apply to given task? Compare against filter + values. */ +static int trace_test(struct task_struct *tsk) +{ + if (trace_taskcomm_filter[0] + && strncmp(trace_taskcomm_filter, tsk->comm, TASK_COMM_LEN)) + return 0; + + if (trace_taskuid_filter != (u32)-1 + && trace_taskuid_filter != task_uid(tsk)) + return 0; + + return 1; +} + + +static const struct utrace_engine_ops process_trace_ops; + +static void process_trace_tryattach(struct task_struct *tsk) +{ + struct utrace_attached_engine *engine; + + pr_debug("in %s\n", __func__); + engine = utrace_attach_task(tsk, + UTRACE_ATTACH_CREATE | + UTRACE_ATTACH_EXCLUSIVE, + &process_trace_ops, NULL); + if (IS_ERR(engine) || (engine == NULL)) { + pr_warning("utrace_attach_task %d (rc %p)\n", + tsk->pid, engine); + } else { + int rc; + + /* We always hook cost-free events. */ + unsigned long events = + UTRACE_EVENT(CLONE) | + UTRACE_EVENT(EXEC) | + UTRACE_EVENT(EXIT); + + /* Penalizing events are individually controlled, so that + utrace doesn't even take the monitored threads off their + fast paths, nor bother call our callbacks. */ + if (trace_syscalls_p) + events |= UTRACE_EVENT_SYSCALL; + if (trace_signals_p) + events |= UTRACE_EVENT_SIGNAL_ALL; + + rc = utrace_set_events(tsk, engine, events); + if (rc == -EINPROGRESS) + rc = utrace_barrier(tsk, engine); + if (rc) + pr_warning("utrace_set_events/barrier rc %d\n", rc); + + utrace_engine_put(engine); + pr_debug("attached in %s to %s(%d)\n", __func__, + tsk->comm, tsk->pid); + } +} + + +u32 process_trace_report_clone(enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *parent, + unsigned long clone_flags, + struct task_struct *child) +{ + if (trace_lifecycle_p && trace_test(parent)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_CLONE; + ent.trace_clone.child = child->pid; + ent.trace_clone.flags = clone_flags; + process_trace(&ent); + } + + process_trace_tryattach(child); + + return UTRACE_RESUME; +} + + +u32 process_trace_report_syscall_entry(u32 action, + struct utrace_attached_engine *engine, + struct task_struct *task, + struct pt_regs *regs) +{ + if (trace_syscalls_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SYSCALL_ENTRY; + ent.trace_syscall_entry.callno = syscall_get_nr(task, regs); + syscall_get_arguments(task, regs, 0, 6, + ent.trace_syscall_entry.args); + process_trace(&ent); + } + + return UTRACE_RESUME; +} + + +u32 process_trace_report_syscall_exit(enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *task, + struct pt_regs *regs) +{ + if (trace_syscalls_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SYSCALL_EXIT; + ent.trace_syscall_exit.rc = syscall_get_return_value(task, regs); + ent.trace_syscall_exit.error = syscall_get_error(task, regs); + process_trace(&ent); + } + + return UTRACE_RESUME; +} + + +u32 process_trace_report_exec(enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *task, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs) +{ + if (trace_lifecycle_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXEC; + process_trace(&ent); + } + + /* We're already attached; no need for a new tryattach. */ + + return UTRACE_RESUME; +} + + +u32 process_trace_report_signal(u32 action, + struct utrace_attached_engine *engine, + struct task_struct *task, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka) +{ + if (trace_signals_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SIGNAL; + ent.trace_signal.si_signo = info->si_signo; + ent.trace_signal.si_errno = info->si_errno; + ent.trace_signal.si_code = info->si_code; + process_trace(&ent); + } + + /* We're already attached, so no need for a new tryattach. */ + + return UTRACE_RESUME | utrace_signal_action(action); +} + + +u32 process_trace_report_exit(enum utrace_resume_action action, + struct utrace_attached_engine *engine, + struct task_struct *task, + long orig_code, long *code) +{ + if (trace_lifecycle_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXIT; + ent.trace_exit.code = orig_code; + process_trace(&ent); + } + + /* There is no need to explicitly attach or detach here. */ + + return UTRACE_RESUME; +} + + +void enable_process_trace() +{ + struct task_struct *grp, *tsk; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + /* Skip over kernel threads. */ + if (tsk->flags & PF_KTHREAD) + continue; + + if (process_follow_pid) { + if (tsk->tgid == process_follow_pid || + tsk->parent->tgid == process_follow_pid) + process_trace_tryattach(tsk); + } else { + process_trace_tryattach(tsk); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + +void disable_process_trace() +{ + struct utrace_attached_engine *engine; + struct task_struct *grp, *tsk; + int rc; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + /* Find matching engine, if any. Returns -ENOENT for + unattached threads. */ + engine = utrace_attach_task(tsk, UTRACE_ATTACH_MATCH_OPS, + &process_trace_ops, 0); + if (IS_ERR(engine)) { + if (PTR_ERR(engine) != -ENOENT) + pr_warning("utrace_attach_task %d (rc %ld)\n", + tsk->pid, -PTR_ERR(engine)); + } else if (engine == NULL) { + pr_warning("utrace_attach_task %d (null engine)\n", + tsk->pid); + } else { + /* Found one of our own engines. Detach. */ + rc = utrace_control(tsk, engine, UTRACE_DETACH); + switch (rc) { + case 0: /* success */ + break; + case -ESRCH: /* REAP callback already begun */ + case -EALREADY: /* DEATH callback already begun */ + break; + default: + rc = -rc; + pr_warning("utrace_detach %d (rc %d)\n", + tsk->pid, rc); + break; + } + utrace_engine_put(engine); + pr_debug("detached in %s from %s(%d)\n", __func__, + tsk->comm, tsk->pid); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + + +static const struct utrace_engine_ops process_trace_ops = { + .report_clone = process_trace_report_clone, + .report_exec = process_trace_report_exec, + .report_exit = process_trace_report_exit, + .report_signal = process_trace_report_signal, + .report_syscall_entry = process_trace_report_syscall_entry, + .report_syscall_exit = process_trace_report_syscall_exit, +}; + + + +/* control interfaces */ + + +static ssize_t +trace_taskcomm_filter_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return simple_read_from_buffer(ubuf, cnt, ppos, + trace_taskcomm_filter, TASK_COMM_LEN); +} + + +static ssize_t +trace_taskcomm_filter_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *fpos) +{ + char *end; + + if (cnt > TASK_COMM_LEN) + cnt = TASK_COMM_LEN; + + if (copy_from_user(trace_taskcomm_filter, ubuf, cnt)) + return -EFAULT; + + /* Cut from the first nil or newline. */ + trace_taskcomm_filter[cnt] = '\0'; + end = strchr(trace_taskcomm_filter, '\n'); + if (end) + *end = '\0'; + + *fpos += cnt; + return cnt; +} + + +static const struct file_operations trace_taskcomm_filter_fops = { + .open = tracing_open_generic, + .read = trace_taskcomm_filter_read, + .write = trace_taskcomm_filter_write, +}; + + + +static char README_text[] = + "process event tracer mini-HOWTO\n" + "\n" + "1. Select process hierarchy to monitor. Other processes will be\n" + " completely unaffected. Leave at 0 for system-wide tracing.\n" + "# echo NNN > process_follow_pid\n" + "\n" + "2. Determine which process event traces are potentially desired.\n" + " syscall and signal tracing slow down monitored processes.\n" + "# echo 0 > process_trace_{syscalls,signals,lifecycle}\n" + "\n" + "3. Add any final uid- or taskcomm-based filtering. Non-matching\n" + " processes will skip trace messages, but will still be slowed.\n" + "# echo NNN > process_trace_uid_filter # -1: unrestricted \n" + "# echo ls > process_trace_taskcomm_filter # empty: unrestricted\n" + "\n" + "4. Start tracing.\n" + "# echo process > current_tracer\n" + "\n" + "5. Examine trace.\n" + "# cat trace\n" + "\n" + "6. Stop tracing.\n" + "# echo nop > current_tracer\n" + ; + +static struct debugfs_blob_wrapper README_blob = { + .data = README_text, + .size = sizeof(README_text), +}; + + +static __init int init_process_trace(void) +{ + struct dentry *d_tracer; + struct dentry *entry; + + d_tracer = tracing_init_dentry(); + + entry = debugfs_create_blob("process_trace_README", 0444, d_tracer, + &README_blob); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_README' entry\n"); + + /* Control for scoping process following. */ + entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer, + &process_follow_pid); + if (!entry) + pr_warning("Could not create debugfs 'process_follow_pid' entry\n"); + + /* Process-level filters */ + entry = debugfs_create_file("process_trace_taskcomm_filter", 0644, + d_tracer, NULL, &trace_taskcomm_filter_fops); + /* XXX: it'd be nice to have a read/write debugfs_create_blob. */ + if (!entry) + pr_warning("Could not create debugfs 'process_trace_taskcomm_filter' entry\n"); + + entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer, + &trace_taskuid_filter); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_uid_filter' entry\n"); + + /* Event-level filters. */ + entry = debugfs_create_u32("process_trace_lifecycle", 0644, d_tracer, + &trace_lifecycle_p); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_lifecycle' entry\n"); + + entry = debugfs_create_u32("process_trace_syscalls", 0644, d_tracer, + &trace_syscalls_p); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_syscalls' entry\n"); + + entry = debugfs_create_u32("process_trace_signals", 0644, d_tracer, + &trace_signals_p); + if (!entry) + pr_warning("Could not create debugfs 'process_trace_signals' entry\n"); + + return register_tracer(&process_tracer); +} + +device_initcall(init_process_trace); From ananth at in.ibm.com Mon Feb 23 07:47:17 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 23 Feb 2009 13:17:17 +0530 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090121062825.GD3251@in.ibm.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> Message-ID: <20090223074717.GA3340@in.ibm.com> On Wed, Jan 21, 2009 at 11:58:25AM +0530, Ananth N Mavinakayanahalli wrote: > On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote: > > Thanks for working on this, Ananth. (Btw, it's "embed.") > > > > I think it would be less disruptive (and materially no different) > > to leave utrace_flags as it is. That field is the one (and only) > > that is used in hot paths (or used anywhere outside utrace.c). > > It might in future get moved around to stay in a cache-hot part > > of task_struct, for example. > > > > The long comment above struct utrace is really all about implementation > > details inside utrace.c and I don't think you should move that commentary > > to the header file. Instead, put a comment saying that the contents of > > struct utrace and their use is entirely private to kernel/utrace.c and it > > is only defined in the header to make its size known for struct task_struct > > layout (and init_task.h). > > > > I committed some cosmetic changes that will make for a little less flutter > > in your patch. > > Here is V2 of the patch. Tested and works fine. Same two tests on > powerpc fail, all tests pass on x86, while there are some occurances of > the ptrace.c WARN_ON. > > Roland, > I've tried to tweak the comments appropriately. Please feel free to > modify them as you consider fit. Roland, Any updates on this and the utrace upstream integration front? Ananth From ronen_zeboun at tottenhamhotspur-footballclub.com Wed Feb 25 03:25:47 2009 From: ronen_zeboun at tottenhamhotspur-footballclub.com (Yamir) Date: Wed, 25 Feb 2009 04:25:47 +0100 Subject: Fw: Degree - power ! Message-ID: <031901c99701$04a39800$88b22acf@[207.42.178.136]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +1 9043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragnev_valkiria at tva.corporate.be Wed Feb 25 03:32:55 2009 From: dragnev_valkiria at tva.corporate.be (Indiana) Date: Wed, 25 Feb 2009 04:32:55 +0100 Subject: Better degree - better pay ! Message-ID: <3ca101c99702$047fd1bc$66254b5e@h94-75-37-102.ufamts.ru> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +1 9043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From unal_tan at tkaccess.com Wed Feb 25 03:42:15 2009 From: unal_tan at tkaccess.com (Fiedorowicz Olena) Date: Wed, 25 Feb 2009 04:42:15 +0100 Subject: Fw: Passed up, again ? Message-ID: <08ef01c99703$05de9f4a$9f00787c@ppp-124-120-0-159.revip2.asianet.co.th> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +1 9043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From khazanov_katz at tv.borisov-e.info Wed Feb 25 03:51:14 2009 From: khazanov_katz at tv.borisov-e.info (Swindell Nassar) Date: Wed, 25 Feb 2009 04:51:14 +0100 Subject: Fw: Do you have life experience ? Message-ID: <6c1501c99704$012f27b0$ab16637d@[125.99.22.171]> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +1 9043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From toggle.buttiens at thescoutnetwork.com Wed Feb 25 03:21:29 2009 From: toggle.buttiens at thescoutnetwork.com (Per Alexi) Date: Wed, 25 Feb 2009 04:21:29 +0100 Subject: Fw: Better degree - more money ! Message-ID: <6fa901c99700$0295fd7e$b237bb4f@hcd178.internetdsl.tpnet.pl> If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success. We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate. Within four to six weeks, you will be a college graduate. Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing? This is your chance to finally make the right move and receive your due benefits. Ring Anytime +1 9043461158 CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE! -------------- next part -------------- An HTML attachment was scrubbed... URL: From braga at g4s.slovnaft.sk Wed Feb 25 15:21:41 2009 From: braga at g4s.slovnaft.sk (Fribley Terra) Date: Wed, 25 Feb 2009 15:21:41 +0000 Subject: More orgasmms Message-ID: <8221543643.20090225151945@g4s.slovnaft.sk> New Orgasm Enhanncer Church auspices. They supplemented it with a dance and originality have produced an immense sensation, something in it. That's why i've asked you all the expression, sirsaid she expected me over on i saw elinor carlisle, she spoke to me of roses.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkenisto at us.ibm.com Wed Feb 25 19:53:48 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 25 Feb 2009 11:53:48 -0800 Subject: instruction-analysis API(s) In-Reply-To: <20090210044230.GB12811@in.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> Message-ID: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote: > On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: > > Jim Keniston wrote: > > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: > > >> Hi Jim, > > >> > > >> I'm also interested in the instruction decoder. > > >> If you don't mind, could we share the API specification? > > >> I'd like to port djprobe on it. > > > > > > I'm enclosing the little x86 instruction-analysis protoype I hacked > > > together (insn_x86.*), along with a copy of systemtap's > > > runtime/uprobes2/uprobes_x86.c, which I modified to use it. > > > > Hmm, actually, djprobe needs both of the length and the type of > > instructions, since it has to know how many bytes must be copied > > and be replaced by a long jump. > > > > > But again, we haven't really settled on an API. For example, my x86 > > > prototype doesn't collect all the info that kvm needs. We're thinking > > > that adapting some existing code (like kvm in the x86 case) might be > > > more palatable to LKML. > > > > Sure, since kvm and emulators have to fetch the values of src/dst > > for the emulation, they need actual register values. On the other hand, > > the disasm/*probe have to analysis code before hitting, so they > > don't know the actual value of the registers. > > > > So, I think we should split x86_decode_insn() into 2 parts, static > > analysis and emulation preparation. > > > > For example: > > 1) analyzing code statically (x86_analyze_insn) > > - just decoding an instruction > > - this phase may consist of several sub-functions. > > > > 2) preparing emulation (x86_evaluate_insn) > > - evaluating src/dst based on current(vcpu) registers > > > > 3) executing emulation (x86_emulate_insn) > > - emulating an analyzed instruction > > Right, that surely sounds like the way to go. However, we've been > cautioned that the instruction emulation area of the kvm code is very > performance sensitive. But, there is no harm in prototyping the above > and then worrying about any optimizations so there isn't a performance > issue -- in any case, I guess [ku]probes are very infrequent users of > this compared to KVM. > > Ananth Hi, Masami. Ananth, Srikar, Maneesh, and I talked about this last night. While I was on vacation, Srikar did further investigation into adapting x86 kvm's instuction analysis for more general use, and he's not optimistic. For the short term, at least (i.e., between now and the Linux Foundation Collaboration Summit in April), we're going to proceed based on the prototype I developed. As you noted, djprobes needs instruction lengths, and my prototype doesn't provide that info. (Uprobes computes instruction lengths for rip-relative x86_64 instructions, but that's only a subset of what you need.) Are you interested in extending/enhancing my prototype to make it useful for djprobes? If so, I'd be happy to consult. Thanks. Jim From mhiramat at redhat.com Thu Feb 26 15:29:14 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Thu, 26 Feb 2009 10:29:14 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <49A6B54A.9050408@redhat.com> Jim Keniston wrote: > Hi, Masami. > > Ananth, Srikar, Maneesh, and I talked about this last night. While I > was on vacation, Srikar did further investigation into adapting x86 > kvm's instuction analysis for more general use, and he's not optimistic. > For the short term, at least (i.e., between now and the Linux Foundation > Collaboration Summit in April), we're going to proceed based on the > prototype I developed. > > As you noted, djprobes needs instruction lengths, and my prototype > doesn't provide that info. (Uprobes computes instruction lengths for > rip-relative x86_64 instructions, but that's only a subset of what you > need.) Are you interested in extending/enhancing my prototype to make > it useful for djprobes? If so, I'd be happy to consult. Hi Jim, Thank you for considering djprobe. Actually, I'm developing insn_get_length() based on your prototype and porting djprobe on it. After tested code, I'd like to post the insn_x86 code. Thank you, > > Thanks. > Jim > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From novelizer at osasto.org Fri Feb 27 09:14:18 2009 From: novelizer at osasto.org (Goldrup Maheux) Date: Fri, 27 Feb 2009 09:14:18 +0000 Subject: More orggasms Message-ID: <8563915257.20090227111030@osasto.org> NNew Orgasm Enhancer By the king. If a person, o yudhishthira, that and why should they not be amused? Said lady mabel, rivers under the agreement that a general rendezvous world. Hear, o arjuna, the arguments by which the white, ashen face, in the dark hollowness. -------------- next part -------------- An HTML attachment was scrubbed... URL: From madusalif01 at yahoo.com Fri Feb 27 17:46:27 2009 From: madusalif01 at yahoo.com (Madu Salif) Date: 27 Feb 2009 09:46:27 -0800 Subject: URGENT RESPOND PLEASE Message-ID: <200902271746.n1RHkRlm010164@mx1.redhat.com> You are invited to "URGENT RESPOND PLEASE". By your host Madu Salif: Date: Friday February 27, 2009 Time: 5:00 pm - 6:00 pm (GMT +00:00) Street: I AM MR. MADU SALIF A BANKER IN ONE OF THE REPUTABLE BANK IN BURKINA FASO (A.D.B.). I HAVE DECIDED TO CONTACT YOU ON A BUSINESS PROPOSAL OF US$15M (FIFTEEN MILLION UNITED STATES DOLLAR, THE DEPOSITOR OF THE SAID FUND DIED WITH HIS ENTIRE FAMILY DURING THE IRAQ WAR IN 2006. THE DECEASED CUSTOMER USED HIS WIFE AS THE NEXT OF KIN BUT UNFORTUNATELY, THE WIFE DIED ALONG SIDE WITH HIM LEAVING NOBODY FOR THE CLAIM. ACCORDING TO OUR BANKING LAW, IF THE FUND REMAIN UNCLAIMED FOR TWO (3) TRANSFEYEARS THEN, THE FUND WILL BE INTO THE RESERVE BANK TREASURY AS UNCLAIMED BILL. I DON'T WANT THE FUND TO GO INTO THE BANK TREASURY AND AS SUCH,YOUR PERCENTAGE WILL BE 30%,10% WILL BE FOR EXPENSES WHILE 60% WILL BE FOR ME, PLEASE REPLY ME THROUGH THIS MY PRIVATE EMAIL ADDRESS:privatemadusalif at yahoo.com Guests: * trishwalesandcompany at yahoo.ca * tropicanafruit at yahoo.com * trousdaleteam at yahoo.com * troyangavery at sympatico.ca * troyboydavis at yahoo.com * troynewcomers at yahoo.com * trudeau4bj at verizon.net * trulycohoon at yahoo.ca * truthbringers at yahoo.ca * tsedmonds at yahoo.ca * tshilundu90 at yahoo.fr * ttmaustria at yahoo.de * tumsai2004 at yahoo.co.uk * tv_crew at yahoo.ca * twoheartsofone at yahoo.com * typingisfun at yahoo.ca * tyranereese64 at yahoo.com * tysdal40 at yahoo.ca * u_wehr at yahoo.de * uanewman at yahoo.com * ubcjtai at yahoo.ca * ubiquitarius at hotmail.com * ucdgarcia at yahoo.com * ucsbnvp at yahoo.com * ucsbrollerhockey at yahoo.com * ujmed2006 at yahoo.com * ukbello01 at hotmail.com * ukclaimsdept_morrison at yahoo.co.uk * ukclaimsdept_morrison at yahoo.co.ukfrom * ul at knust.edu.gh * ultan at icebroadband.com * ultraplops at yahoo.com * uly_paya001 at yahoo.de * umfunkcb at cc.umanitoba.ca * umuariki at xtra.co.nz * umuscm01 at yahoo.com * un4gettable_grl at yahoo.ca * uniek at yahoo.com * unifeibr at yahoo.com.br * uniqua69 at yahoo.com * uofs_apala at yahoo.com * uprguad at gvtc.com * uraniumnews at yahoo.ca * urpinforma at comunevalmontone.it * usa at hotmail.com * users at tomcat.apache.org * uta_distro at yahoo.ca * utaeick at yahoo.de * utrace-devel at redhat.com * uwe_dornbusch at yahoo.de * uxf39ftjmcw at yahoo.co.uk * uyiyot at hotmail.com * uyiyot at yahoo.ca * v.chirkov at usask.ca * valenciapeete at yahoo.com * valorz_09 at yahoo.com * vancouver_doula at yahoo.ca * vanvlietp at yahoo.ca * vanyounker at yahoo.ca * vclarsen at smig.net * vdl1 at leicester.ac.uk * vecassell at yahoo.ca * veena.aumyogatherapy at yahoo.ca * vera.rosendahl at bmz.bund.de * verlyn at votevo.ca * vfoleybourgon at yahoo.ca * vfranz82 at yahoo.it * vicokojie at yahoo.com * vicsanvic at yahoo.com * videodansedubreuil at yahoo.com * viestards.lists at gmail.com * vikeda at ccsf.org * vilegarret at yahoo.de * vilmamiriam at yahoo.com.br * vilmamiriam at yahoo.com.brchris * vinay_sajip at yahoo.co.uk * vinids at pucrs.br * vinids at terra.com.br * virginia_seabrook at yahoo.com * lydiadaniels01 at yahoo.com invitation_add_to_your_yahoo_calendar: http://calendar.yahoo.com/?v=60&ST=20090227T170000%2B0000&TITLE=URGENT+RESPOND+PLEASE&DUR=0100&VIEW=d&in_st=I+AM+MR.+MADU+SALIF+A+BANKER+IN+ONE+OF+THE+REPUTABLE+BANK+IN+BURKINA+FASO+(A.D.B.).+I+HAVE+DECIDED+TO+CONTACT+YOU+ON+A+BUSINESS+PROPOSAL+OF+US$15M+(FIFTEEN+MILLION+UNITED+STATES+DOLLAR,+THE+DEPOSITOR+OF+THE+SAID+FUND+DIED+WITH+HIS+ENTIRE+FAMILY+DURING+THE+IRAQ+WAR+IN+2006.+THE+DECEASED+CUSTOMER+USED+HIS+WIFE+AS+THE+NEXT+OF+KIN+BUT+UNFORTUNATELY,+THE+WIFE+DIED+ALONG+SIDE+WITH+HIM+LEAVING+NOBODY+FOR+THE+CLAIM.+ACCORDING+TO+OUR+BANKING+LAW,+IF+THE+FUND+REMAIN+UNCLAIMED+FOR+TWO+(3)+TRANSFEYEARS+THEN,+THE+FUND+WILL+BE+INTO+THE+RESERVE+BANK+TREASURY+AS+UNCLAIMED+BILL.+I+DON%27T+WANT+THE+FUND+TO+GO+INTO+THE+BANK+TREASURY+AND+AS+SUCH,YOUR+PERCENTAGE+WILL+BE+30%25,10%25+WILL+BE+FOR+EXPENSES+WHILE+60%25+WILL+BE+FOR+ME,+PLEASE+REPLY+ME+THROUGH+THIS+MY+PRIVATE+EMAIL+ADDRESS%3aprivatemadusalif at yahoo.com&TYPE=10 Copyright ? 2009 All Rights Reserved www.yahoo.com Privacy Policy: http://privacy.yahoo.com/privacy/us Terms of Service: http://docs.yahoo.com/info/terms/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhiramat at redhat.com Fri Feb 27 21:20:02 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Fri, 27 Feb 2009 16:20:02 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> Message-ID: <49A85902.8000306@redhat.com> Jim Keniston wrote: > On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote: >> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote: >>> Jim Keniston wrote: >>>> On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote: >>>>> Hi Jim, >>>>> >>>>> I'm also interested in the instruction decoder. >>>>> If you don't mind, could we share the API specification? >>>>> I'd like to port djprobe on it. >>>> I'm enclosing the little x86 instruction-analysis protoype I hacked >>>> together (insn_x86.*), along with a copy of systemtap's >>>> runtime/uprobes2/uprobes_x86.c, which I modified to use it. >>> Hmm, actually, djprobe needs both of the length and the type of >>> instructions, since it has to know how many bytes must be copied >>> and be replaced by a long jump. >>> >>>> But again, we haven't really settled on an API. For example, my x86 >>>> prototype doesn't collect all the info that kvm needs. We're thinking >>>> that adapting some existing code (like kvm in the x86 case) might be >>>> more palatable to LKML. >>> Sure, since kvm and emulators have to fetch the values of src/dst >>> for the emulation, they need actual register values. On the other hand, >>> the disasm/*probe have to analysis code before hitting, so they >>> don't know the actual value of the registers. >>> >>> So, I think we should split x86_decode_insn() into 2 parts, static >>> analysis and emulation preparation. >>> >>> For example: >>> 1) analyzing code statically (x86_analyze_insn) >>> - just decoding an instruction >>> - this phase may consist of several sub-functions. >>> >>> 2) preparing emulation (x86_evaluate_insn) >>> - evaluating src/dst based on current(vcpu) registers >>> >>> 3) executing emulation (x86_emulate_insn) >>> - emulating an analyzed instruction >> Right, that surely sounds like the way to go. However, we've been >> cautioned that the instruction emulation area of the kvm code is very >> performance sensitive. But, there is no harm in prototyping the above >> and then worrying about any optimizations so there isn't a performance >> issue -- in any case, I guess [ku]probes are very infrequent users of >> this compared to KVM. >> >> Ananth > > Hi, Masami. > > Ananth, Srikar, Maneesh, and I talked about this last night. While I > was on vacation, Srikar did further investigation into adapting x86 > kvm's instuction analysis for more general use, and he's not optimistic. > For the short term, at least (i.e., between now and the Linux Foundation > Collaboration Summit in April), we're going to proceed based on the > prototype I developed. > > As you noted, djprobes needs instruction lengths, and my prototype > doesn't provide that info. (Uprobes computes instruction lengths for > rip-relative x86_64 instructions, but that's only a subset of what you > need.) Are you interested in extending/enhancing my prototype to make > it useful for djprobes? If so, I'd be happy to consult. Here are a patch against your code and an example code for instruction length decoder. Curiously, KVM's instruction decoder does not completely cover all instructions(especially, Jcc/test...). I had to refer Intel manuals. Moreover, even with this patch, the decoder is incomplete. - this doesn't cover 3bytes opcode yet. - this doesn't decode sib, displacement and immediate. - might have some bugs :-( Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: insn_x86.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: insndec.c URL: From 2009-editions at canada-2009.com Sat Feb 28 09:05:08 2009 From: 2009-editions at canada-2009.com (Annuaire subventions 2009) Date: Sat, 28 Feb 2009 04:05:08 -0500 Subject: Available; canadian subsidies 2009 Message-ID: <12347743551abeca64e325c3800c7b56084dadaaf4@canada-2009.com> Canadian Subsidy directory (2009 EDITION) The new Subsidy Directory 2009 is now available, newly revised it is the most complete and affordable reference for anyone looking for financing. It is the perfect tool for new and existing businesses, individuals, foundations and associations. This Publication contains more than 3500 direct and indirect financial subsidies, grants and loans offered by government departments and agencies, foundations, associations and organizations. In this edition all programs are well described. Canadian Subsidy Directory (All Canada, federal + provincial + foundations) CD-Rom (Pdf file).............................$ 69.95 Printed (430 pages)..........................$149.95 Also available for each province on CD-Rom only...........$ 49.95 Alberta British Columbia New Brunswick Newfoundland & Labrador Northwest Territories / Nunavut / Yukon Manitoba Nova Scotia Ontario Prince Edward Island Quebec .............................$ 69.95 Saskatchewan To obtain a copy please call toll free 1-866-322-3376 or local 819-322-5756 Canadian Subsidy Directory 14-A Des Seigneurs St-Sauveur Qc J0R 1R0 Qc From metaplasmic at velpitaris.ro Sat Feb 28 13:45:19 2009 From: metaplasmic at velpitaris.ro (Debrah Harwood) Date: Sat, 28 Feb 2009 13:45:19 +0000 Subject: More orgasmss Message-ID: <3735026269.20090228124529@velpitaris.ro> New OOrgasm Enhancer Wailed. Sure, she'll be tuk sick in those woild remember also the curious earnestness with which summed up his opinion for pleasures, like schoolboys the things of the flesh, she is no longer hard. As they had lately too often treated their white. -------------- next part -------------- An HTML attachment was scrubbed... URL: From comercial at coweb.com.br Sat Feb 28 16:39:30 2009 From: comercial at coweb.com.br (Coweb Soluções On-line) Date: Sat, 28 Feb 2009 16:39:30 GMT Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?= Message-ID: <200902281639.n1SGclX3000562@mx3.redhat.com> An HTML attachment was scrubbed... URL: From roland at redhat.com Mon Mar 2 12:07:54 2009 From: roland at redhat.com (Roland McGrath) Date: Mon, 2 Mar 2009 04:07:54 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Ananth N Mavinakayanahalli's message of Monday, 23 February 2009 13:17:17 +0530 <20090223074717.GA3340@in.ibm.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> Message-ID: <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Hi, Ananth. Sorry everything has slid so long (again). (I have far too many hats and the past month not so many brains!) Here is my immediate agenda for utrace hacking: * I have incorporated the "embed struct utrace" changes. I did various small bits of reorganization and cosmetic cleanup first to make the actual data structure change a smaller patch. Since things had changed around, I didn't actually use your patch. I just did it over myself, but I think it's nearly the same. After this change, we now need some fresh testing of things like Frank's ftrace widget and stap's utrace-using modes. (Nothing should have changed from the utrace API perspective.) I've created the new branch "utrace-indirect" with a revert of the change. I think this is really the better way to organize the data structures, as I've mentioned before. After we get an initial utrace merged in upstream, I intend to revive this branch and turn it into an incremental patch to (re-)improve the data structures later on. That's for later; for the time being, the branch will just sit idle. * I've renamed "struct utrace_attached_engine" to "struct utrace_engine". This was a cosmetic suggestion in an earlier LKML review, and I could not really find any good reason to keep the longer name. We all seem to say "a utrace engine" in conversation when talking about this object. I added the UTRACE_API_VERSION macro to ease existing utrace-using code adapting to old/new names. * I'll shortly scour the old review comments for more cosmetic things we might change. * I would like to have a final "in-team" top-to-bottom review of the main utrace patch before sending to LKML. i.e. maybe by you, Frank, me, and Oleg. Each pair of eyeballs should: * make sure all barriers and other kinds of magic have adequate comments explaining why they are there and why they are correct * cite anything else that sticks out and maybe needs more comments * make sure all comments are accurate and understandable * I want to resolve the UTRACE_STOP issues Renzo Davoli raised. (We don't have to get these API things perfect before posting upstream. I'm sure that once utrace is accepted on queue for merging, that later tweaks to its details will not meet particular resistance.) But if there are problems and changes we can identify and work out now, we might as well get that done before posting upstream. * When we on the team think the utrace patch is ready to post, we need to do a coordinated post of Frank's ftrace widget. That is the first thing ready for upstream submission that uses utrace, and kernel people tell me they don't want to see utrace without also merging something that uses it. I don't really want to get involved with that widget's code myself (got my hands full in the utrace layer), so others on the team should back Frank up on the review, testing, and fixing of the ftrace widget. Thanks, Roland From edgarlemes at ymail.com Mon Mar 2 12:26:14 2009 From: edgarlemes at ymail.com (Fabiola M. Lechuga) Date: Mon, 2 Mar 2009 12:26:14 GMT Subject: =?iso-8859-1?q?Novamente_Voc=EA_Pode!!!=2E=2E=2E?= Message-ID: <200903021326.n22DQfvC021850@mx3.redhat.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- Limpe seu nome na SERASA e no SPC sem pagar as CONTAS!!! Novamente Voc? Pode!!!... - Abrir conta em bancos; - Comprar a prazo; - Financiar bens; - Obter empr?stimos; - Conseguir emprego..., e muito mais... Envie um e-mail para :creditoaprovadoja at gmail.com e saiba como. From cmoller at redhat.com Mon Mar 2 15:08:01 2009 From: cmoller at redhat.com (Chris Moller) Date: Mon, 02 Mar 2009 10:08:01 -0500 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: <49ABF651.2060708@redhat.com> Roland, Is this going to make into F11? Or is it too early to tell that yet? Roland McGrath wrote: > Hi, Ananth. Sorry everything has slid so long (again). > (I have far too many hats and the past month not so many brains!) > > Here is my immediate agenda for utrace hacking: > > -- Chris Moller I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant. -- Robert McCloskey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 251 bytes Desc: OpenPGP digital signature URL: From roland at redhat.com Mon Mar 2 20:12:35 2009 From: roland at redhat.com (Roland McGrath) Date: Mon, 2 Mar 2009 12:12:35 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Chris Moller's message of Monday, 2 March 2009 10:08:01 -0500 <49ABF651.2060708@redhat.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> <49ABF651.2060708@redhat.com> Message-ID: <20090302201235.36E4EFC3C6@magilla.sf.frob.com> > Is this going to make into F11? Or is it too early to tell that yet? F11 will have the latest utrace code at the time F11 freezes, certainly. Thanks, Roland From ananth at in.ibm.com Tue Mar 3 07:51:29 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 3 Mar 2009 13:21:29 +0530 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: <20090303075129.GD22517@in.ibm.com> On Mon, Mar 02, 2009 at 04:07:54AM -0800, Roland McGrath wrote: > Hi, Ananth. Sorry everything has slid so long (again). > (I have far too many hats and the past month not so many brains!) I understand. Thanks for the work, Roland. > Here is my immediate agenda for utrace hacking: > > * I have incorporated the "embed struct utrace" changes. > > I did various small bits of reorganization and cosmetic cleanup > first to make the actual data structure change a smaller patch. > Since things had changed around, I didn't actually use your patch. > I just did it over myself, but I think it's nearly the same. The changes look simple and straightforward. > After this change, we now need some fresh testing of things like Frank's > ftrace widget and stap's utrace-using modes. (Nothing should have > changed from the utrace API perspective.) There is at least one change from the earlier behaviour -- rather than utrace_attach_task() retrying by itself on a !parent attach, -EAGAIN is returned to the user. That may need changes to the utrace client side. > I've created the new branch "utrace-indirect" with a revert of the > change. I think this is really the better way to organize the data > structures, as I've mentioned before. After we get an initial utrace > merged in upstream, I intend to revive this branch and turn it into an > incremental patch to (re-)improve the data structures later on. That's > for later; for the time being, the branch will just sit idle. > > * I've renamed "struct utrace_attached_engine" to "struct utrace_engine". > This was a cosmetic suggestion in an earlier LKML review, and I could not > really find any good reason to keep the longer name. We all seem to say > "a utrace engine" in conversation when talking about this object. > > I added the UTRACE_API_VERSION macro to ease existing utrace-using code > adapting to old/new names. > > * I'll shortly scour the old review comments for more cosmetic things we > might change. > > * I would like to have a final "in-team" top-to-bottom review of the main > utrace patch before sending to LKML. i.e. maybe by you, Frank, me, and Oleg. > Each pair of eyeballs should: > * make sure all barriers and other kinds of magic have adequate comments > explaining why they are there and why they are correct > * cite anything else that sticks out and maybe needs more comments > * make sure all comments are accurate and understandable I have just started staring at the new code and will pitch in with my comments. > * I want to resolve the UTRACE_STOP issues Renzo Davoli raised. > (We don't have to get these API things perfect before posting upstream. > I'm sure that once utrace is accepted on queue for merging, that later > tweaks to its details will not meet particular resistance.) But if there > are problems and changes we can identify and work out now, we might as > well get that done before posting upstream. > > * When we on the team think the utrace patch is ready to post, we need to > do a coordinated post of Frank's ftrace widget. That is the first thing > ready for upstream submission that uses utrace, and kernel people tell me > they don't want to see utrace without also merging something that uses > it. I don't really want to get involved with that widget's code myself > (got my hands full in the utrace layer), so others on the team should > back Frank up on the review, testing, and fixing of the ftrace widget. I've just started with implementing a non-disruptive application core dump. Its probably too early to commit, but it could also be a potential in-kernel user of utrace. I've just started with quiescing all threads but need to plug-in the core generating infrastructure for it. I am looking at the possibility of tweaking do_coredump() to reuse it for this while the workhorse can just be the binfmt->core_dump() itself. Its still in the early prototype stage -- I'll post that when there is something more concrete. Ideas/suggestions welcome! Ananth From srikar at linux.vnet.ibm.com Tue Mar 3 13:26:53 2009 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Tue, 3 Mar 2009 18:56:53 +0530 Subject: Running gdb and uprobes on the same program [ bug 9826 ] Message-ID: <20090303132653.GA4464@linux.vnet.ibm.com> Hi Roland, Here is analysis of the bug 9826. Can you please let me know your thoughts? Summary of the problem: Probing a program started by gdb causes the traced program to receive thousounds of SIGSEGV signals. Consider two engines, first engine(gdb) which hasn't inserted any breakpoints and second engine(uprobes) has inserted one breakpoint. On hitting a breakpoint,first engine(gdb) sets a UTRACE_STOP action while the second engine (uprobes) sets a UTRACE_SINGLESTEP action. The second engine also shows interest in "quiesce" event. The quiesce handler would return UTRACE_SINGLESTEP if the quiesce were to happen after the UTRACE_SINGLESTEP has been requested. As expected this results in the traced program being stopped. Once the traced process is resumed, the UTRACE_SINGLESTEP action seems to be ignored. Is this expected? 1. How do we avoid singlestep from being ignored after resume? 2. Shouldn't gdb be interested only in breakpoint events that it has set earlier? 3. Is there a way for the engines to communicate to other engines that these engines and events are exclusively for itself and other engines need not bother? This is on a Fedora 10 kernel. Details: 1. stap -ve 'probe process("ls").function("main") { print("hello world\n") }' 2. (In another window) gdb /bin/ls 3. run at gdb prompt. A. uprobes has inserted one breakpoint. B. gdb has not inserted any breakpoints. C. Once breakpoint gets hit. I. ptrace engine (gdb) thro report_signal callback (ptrace_report_signal()) (gdb) sets the action to UTRACE_STOP. II. report_signal (uprobes) callback noticies that the breakpoint is of its interest and sets the instruction pointer to SSOL area and requests UTRACE_SINGLESTEP. It also shows interest in quiesce event and the quiesce handler returns UTRACE_SINGLESTEP if the singlestep operation is not complete. D. Since UTRACE_STOP is preferred over UTRACE_SINGLESTEP, the traced program ("ls") is stopped and gdb prompt comes up. with the message " 4. continue at gdb prompt A. uprobe_report_quiesce doesn't get called B. does a resume and not a singlestep. C. Can result in SIGSEGV/SIGILL. D. report_signal callback for both engines run but for a different signal. I. gdb engine sets UTRACE_STOP. II. uprobe engines set UTRACE_RESUME as it is in a different event (not a breakpoint or singlestep event). E. uprobes cannot complete singlestep and hence cannot change the instruction pointer to the main instruction stream. F. traced program is stopped and gdb prompt comes up with message " ". 5. repeat step 4. A. Same as in Step 4. B. process is in UTRACE_STOP hence has to be SIGKILLED. -- Thanks and Regards Srikar From srikar at linux.vnet.ibm.com Tue Mar 3 13:43:42 2009 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Tue, 3 Mar 2009 19:13:42 +0530 Subject: Running gdb and uprobes on the same program [ bug 9826 ] In-Reply-To: <20090303132653.GA4464@linux.vnet.ibm.com> References: <20090303132653.GA4464@linux.vnet.ibm.com> Message-ID: <20090303134342.GC26404@linux.vnet.ibm.com> * Srikar Dronamraju [2009-03-03 18:56:53]: > Hi Roland, > > Here is analysis of the bug 9826. Can you please let me know your > thoughts? > > Summary of the problem: > Probing a program started by gdb causes the traced program to receive > thousounds of SIGSEGV signals. > > Consider two engines, first engine(gdb) which hasn't inserted any > breakpoints and second engine(uprobes) has inserted one breakpoint. On > hitting a breakpoint,first engine(gdb) sets a UTRACE_STOP action while > the second engine (uprobes) sets a UTRACE_SINGLESTEP action. The second > engine also shows interest in "quiesce" event. The quiesce handler would > return UTRACE_SINGLESTEP if the quiesce were to happen after the > UTRACE_SINGLESTEP has been requested. > > As expected this results in the traced program being stopped. Once the > traced process is resumed, the UTRACE_SINGLESTEP action seems to be > ignored. Is this expected? > > 1. How do we avoid singlestep from being ignored after resume? > 2. Shouldn't gdb be interested only in breakpoint events that it has set > earlier? > 3. Is there a way for the engines to communicate to other engines that > these engines and events are exclusively for itself and other engines > need not bother? > > This is on a Fedora 10 kernel. > > Details: > 1. stap -ve 'probe process("ls").function("main") { print("hello world\n") }' > > 2. (In another window) gdb /bin/ls > > 3. run at gdb prompt. > > A. uprobes has inserted one breakpoint. > B. gdb has not inserted any breakpoints. > C. Once breakpoint gets hit. > I. ptrace engine (gdb) thro report_signal callback > (ptrace_report_signal()) (gdb) sets the action to > UTRACE_STOP. > > II. report_signal (uprobes) callback noticies that the > breakpoint is of its interest and sets the instruction > pointer to SSOL area and requests UTRACE_SINGLESTEP. It > also shows interest in quiesce event and the quiesce > handler returns UTRACE_SINGLESTEP if the singlestep > operation is not complete. > > > D. Since UTRACE_STOP is preferred over UTRACE_SINGLESTEP, the > traced program ("ls") is stopped and gdb prompt comes up. > with the message" "Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000000110020 in ?? () " > > 4. continue at gdb prompt > A. uprobe_report_quiesce doesn't get called > B. does a resume and not a singlestep. > C. Can result in SIGSEGV/SIGILL. > D. report_signal callback for both engines run but for a > different signal. > I. gdb engine sets UTRACE_STOP. > II. uprobe engines set UTRACE_RESUME as it is in a > different event (not a breakpoint or singlestep event). > E. uprobes cannot complete singlestep and hence cannot change > the instruction pointer to the main instruction stream. > F. traced program is stopped and gdb prompt comes up with > message " ". Program received signal SIGSEGV, Segmentation fault. 0x0000000000111000 in ?? () > > 5. repeat step 4. > A. Same as in Step 4. > B. process is in UTRACE_STOP hence has to be SIGKILLED. > However if we use ni instead of continue at step 4 and then use continue at step 5, the traced process runs to completion without any issues. It looks like on the latest utrace code, utrace and ptrace on the same task is disabled. -- Thanks and Regards Srikar From fche at redhat.com Tue Mar 3 15:47:37 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 03 Mar 2009 10:47:37 -0500 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com> (Roland McGrath's message of "Mon, 2 Mar 2009 04:07:54 -0800 (PST)") References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: roland wrote: > After this change, we now need some fresh testing of things like Frank's > ftrace widget and stap's utrace-using modes. (Nothing should have > changed from the utrace API perspective.) Righto. > * I've renamed "struct utrace_attached_engine" to "struct utrace_engine". > This was a cosmetic suggestion in an earlier LKML review, and I could not > really find any good reason to keep the longer name. We all seem to say > "a utrace engine" in conversation when talking about this object. > > I added the UTRACE_API_VERSION macro to ease existing utrace-using code > adapting to old/new names. After a corresponding s/// of the ftrace patch, the code appears to build fine. I'll add an uglier #ifdef to the systemtap runtime and will test the lot. > * I would like to have a final "in-team" top-to-bottom review of the main > utrace patch before sending to LKML. i.e. maybe by you, Frank, me, and Oleg. > [...] I'll try to review it today. > * When we on the team think the utrace patch is ready to post, we need to > do a coordinated post of Frank's ftrace widget. [...] Would you consider simply merging it into your git tree / patch suite? - FChE From oleg at redhat.com Tue Mar 3 20:09:07 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 3 Mar 2009 21:09:07 +0100 Subject: [PATCH] tracehooks: kill death_cookie Message-ID: <20090303200907.GA19207@redhat.com> If I understand correctly death_cookie was needed before "[PATCH] Embed struct utrace in task_struct - V2". tracehook_report_death() could race with utrace_release_task() which cleared ->utrace, that is why tracehook_notify_death() had to read task_utrace_struct() in advance and then pass this argument to utrace_report_death(). Looks like this is not needed any longer, kill this awful cookie. Signed-off-by: Oleg Nesterov --- xxx/include/linux/utrace.h~KILL_COOKIE 2009-03-03 18:11:47.000000000 +0100 +++ xxx/include/linux/utrace.h 2009-03-03 20:43:43.000000000 +0100 @@ -100,7 +100,7 @@ void utrace_finish_vfork(struct task_str __attribute__((weak)); void utrace_report_exit(long *exit_code) __attribute__((weak)); -void utrace_report_death(struct task_struct *, struct utrace *, bool, int) +void utrace_report_death(struct task_struct *, bool, int) __attribute__((weak)); void utrace_report_jctl(int notify, int type) __attribute__((weak)); --- xxx/include/linux/tracehook.h~KILL_COOKIE 2009-03-03 18:11:47.000000000 +0100 +++ xxx/include/linux/tracehook.h 2009-03-03 20:40:57.000000000 +0100 @@ -534,7 +534,6 @@ static inline int tracehook_notify_jctl( /** * tracehook_notify_death - task is dead, ready to notify parent * @task: @current task now exiting - * @death_cookie: value to pass to tracehook_report_death() * @group_dead: nonzero if this was the last thread in the group to die * * A return value >= 0 means call do_notify_parent() with that signal @@ -546,10 +545,8 @@ static inline int tracehook_notify_jctl( * Called with write_lock_irq(&tasklist_lock) held. */ static inline int tracehook_notify_death(struct task_struct *task, - void **death_cookie, int group_dead) + int group_dead) { - *death_cookie = task_utrace_struct(task); - if (task->exit_signal == -1) return task->ptrace ? SIGCHLD : DEATH_REAP; @@ -568,14 +565,12 @@ static inline int tracehook_notify_death * tracehook_report_death - task is dead and ready to be reaped * @task: @current task now exiting * @signal: return value from tracheook_notify_death() - * @death_cookie: value passed back from tracehook_notify_death() * @group_dead: nonzero if this was the last thread in the group to die * * Thread has just become a zombie or is about to self-reap. If positive, * @signal is the signal number just sent to the parent (usually %SIGCHLD). * If @signal is %DEATH_REAP, this thread will self-reap. If @signal is * %DEATH_DELAYED_GROUP_LEADER, this is a delayed_group_leader() zombie. - * The @death_cookie was passed back by tracehook_notify_death(). * * If normal reaping is not inhibited, @task->exit_state might be changing * in parallel. @@ -583,13 +578,12 @@ static inline int tracehook_notify_death * Called without locks. */ static inline void tracehook_report_death(struct task_struct *task, - int signal, void *death_cookie, - int group_dead) + int signal, int group_dead) { smp_mb(); if (task_utrace_flags(task) & (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE))) - utrace_report_death(task, death_cookie, group_dead, signal); + utrace_report_death(task, group_dead, signal); } #ifdef TIF_NOTIFY_RESUME --- xxx/kernel/exit.c~KILL_COOKIE 2009-03-03 18:11:47.000000000 +0100 +++ xxx/kernel/exit.c 2009-03-03 20:42:20.000000000 +0100 @@ -917,7 +917,6 @@ static void forget_original_parent(struc static void exit_notify(struct task_struct *tsk, int group_dead) { int signal; - void *cookie; /* * This does two things: @@ -954,7 +953,7 @@ static void exit_notify(struct task_stru !capable(CAP_KILL)) tsk->exit_signal = SIGCHLD; - signal = tracehook_notify_death(tsk, &cookie, group_dead); + signal = tracehook_notify_death(tsk, group_dead); if (signal >= 0) signal = do_notify_parent(tsk, signal); @@ -968,7 +967,7 @@ static void exit_notify(struct task_stru write_unlock_irq(&tasklist_lock); - tracehook_report_death(tsk, signal, cookie, group_dead); + tracehook_report_death(tsk, signal, group_dead); /* If the process is dead, release it - nobody will wait for it */ if (signal == DEATH_REAP) --- xxx/kernel/utrace.c~KILL_COOKIE 2009-03-03 18:11:47.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-03 20:46:09.000000000 +0100 @@ -1675,9 +1675,9 @@ void utrace_report_exit(long *exit_code) * For this reason, utrace_release_task checks for the event bits that get * us here, and delays its cleanup for us to do. */ -void utrace_report_death(struct task_struct *task, struct utrace *utrace, - bool group_dead, int signal) +void utrace_report_death(struct task_struct *task, bool group_dead, int signal) { + struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); BUG_ON(!task->exit_state); From oleg at redhat.com Tue Mar 3 22:09:43 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 3 Mar 2009 23:09:43 +0100 Subject: [PATCH] get_utrace_lock: kill the bogus engine->kref.refcount check Message-ID: <20090303220943.GA24533@redhat.com> When engine->kref.refcount becomes zero, engine is freed. No rcu, no other delays. This means that if we see .refcount < 1 we already have a bug: we are reading the freed (and perhaps unmapped) memory. Perhaps it makes sense to use BUG_ON() but "return -EINVAL" just hides the problem and looks misleading, kill this check. Also remove the comment, the comment above get_utrace_lock() explains that the caller has to hold a ref on the engine. Signed-off-by: Oleg Nesterov --- xxx/kernel/utrace.c~WRONG_REFCNT_CK 2009-03-03 20:46:09.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-03 22:30:05.000000000 +0100 @@ -479,14 +479,6 @@ static struct utrace *get_utrace_lock(st { struct utrace *utrace; - /* - * You must hold a ref to be making a call. A call from within - * a report_* callback in @target might only have the ref for - * being attached, not a second one of its own. - */ - if (unlikely(atomic_read(&engine->kref.refcount) < 1)) - return ERR_PTR(-EINVAL); - rcu_read_lock(); /* From roland at redhat.com Tue Mar 3 23:06:17 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 3 Mar 2009 15:06:17 -0800 (PST) Subject: [PATCH] get_utrace_lock: kill the bogus engine->kref.refcount check In-Reply-To: Oleg Nesterov's message of Tuesday, 3 March 2009 23:09:43 +0100 <20090303220943.GA24533@redhat.com> References: <20090303220943.GA24533@redhat.com> Message-ID: <20090303230617.3160AFC3C9@magilla.sf.frob.com> Ok, applied. I thought I'd seen that checking style in some other kref user and was copying its style (which is admittedly a dubious thing, since the free really has already happened), but I can't now find what I might have been thinking of. Thanks, Roland From roland at redhat.com Tue Mar 3 23:08:38 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 3 Mar 2009 15:08:38 -0800 (PST) Subject: [PATCH] tracehooks: kill death_cookie In-Reply-To: Oleg Nesterov's message of Tuesday, 3 March 2009 21:09:07 +0100 <20090303200907.GA19207@redhat.com> References: <20090303200907.GA19207@redhat.com> Message-ID: <20090303230838.476AEFC3C9@magilla.sf.frob.com> I would rather not touch the tracehook interfaces now. You are indeed right that the motivation for this had to do with the utrace-indirect code. As I've said, I do intend to resurrect that code and send it upstream later on. We can consider cleanups then. For now, let's not do anything preemptively that is likely to introduce a new need to touch non-utrace code again later. Thanks, Roland From roland at redhat.com Tue Mar 3 23:14:01 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 3 Mar 2009 15:14:01 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Frank Ch. Eigler's message of Tuesday, 3 March 2009 10:47:37 -0500 References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: <20090303231401.3376CFC3C9@magilla.sf.frob.com> > > * When we on the team think the utrace patch is ready to post, we need to > > do a coordinated post of Frank's ftrace widget. [...] > > Would you consider simply merging it into your git tree / patch suite? Sure. The way to do that is for you to publish a git repository that I can pull from. You can clone mine, and then make a new utrace-ftrace branch forking from the utrace branch. Tell me (e.g. use git-request-pull in email) when you have an update. Then I'll pull from you, and generate a patch for people.redhat.com/roland/utrace/2.6-current/ as I do for my branches. Thanks, Roland From jkenisto at us.ibm.com Wed Mar 4 01:15:13 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Tue, 03 Mar 2009 17:15:13 -0800 Subject: instruction-analysis API(s) In-Reply-To: <49A85902.8000306@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> Message-ID: <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: ... > > Here are a patch against your code and an example code for > instruction length decoder. > Curiously, KVM's instruction decoder does not completely > cover all instructions(especially, Jcc/test...). > I had to refer Intel manuals. > > Moreover, even with this patch, the decoder is incomplete. > - this doesn't cover 3bytes opcode yet. > - this doesn't decode sib, displacement and immediate. > - might have some bugs :-( > > > Thank you, Thanks for your work on this. Comments below. Jim > > plain text document attachment (insn_x86.patch) > Index: insn_x86.h > =================================================================== > --- insn_x86.h (revision 1510) > +++ insn_x86.h (working copy) > @@ -66,6 +66,10 @@ > struct insn_field displacement; > struct insn_field immediate; > > + u8 op_bytes; I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I first saw "op", I thought "opcode".) Also, we should clarify that these are the EFFECTIVE lengths, not the lengths of the immediate and displacement fields in the instruction. > + u8 ad_bytes; > + u8 length; > + > const u8 *kaddr; /* kernel address of insn (copy) to analyze */ > const u8 *next_byte; > bool x86_64; > @@ -75,6 +79,7 @@ > extern void insn_get_prefixes(struct insn *insn); > extern void insn_get_opcode(struct insn *insn); > extern void insn_get_modrm(struct insn *insn); > +extern void insn_get_length(struct insn *insn); > > #ifdef CONFIG_X86_64 > extern bool insn_rip_relative(struct insn *insn); > Index: insn_x86.c > =================================================================== > --- insn_x86.c (revision 1510) > +++ insn_x86.c (working copy) > @@ -17,7 +17,7 @@ > * > * Copyright (C) IBM Corporation, 2002, 2004, 2009 > */ > - > +#include > #include > // #include > #include "insn_x86.h" > @@ -34,6 +34,11 @@ > insn->kaddr = kaddr; > insn->next_byte = kaddr; > insn->x86_64 = x86_64; > + insn->op_bytes = 4; > + if (x86_64) > + insn->ad_bytes = 8; > + else > + insn->ad_bytes = 4; > } > EXPORT_SYMBOL_GPL(insn_init); > > @@ -79,10 +84,51 @@ > break; > prefixes->value |= pfx; > } > + if (prefixes->value & X86_PFX_OPNDSZ) { > + /* oprand size switches 2/4 */ > + insn->op_bytes ^= 6; > + } > + if (prefixes->value & X86_PFX_ADDRSZ) { > + /* address size switches 2/4 or 4/8 */ > +#ifdef CONFIG_X86_64 > + if (insn->x86_64) > + insn->op_bytes ^= 12; > + else > +#endif > + insn->op_bytes ^= 6; This seems wrong. You're checking the address-size prefix, but adjusting the operand size. > + } > +#ifdef CONFIG_X86_64 > + if (prefixes->value & X86_PFX_REXW) > + insn->op_bytes = 8; > +#endif > prefixes->got = true; > } > EXPORT_SYMBOL_GPL(insn_get_prefixes); > > +static bool __insn_is_stack(struct insn *insn) It's not entirely clear to me what this function checks. (A more precise name might help.) You have pushes, pops, and calls here, but you also have some instructions that don't appear to affect the stack at all. And other push and pop instructions are missing. > +{ > + u8 reg; > + if (insn->opcode.nbytes == 2) > + return 0; The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9. Also, since the return value is bool, I'd prefer to see true/false rather than 1/0. > + > + switch(insn->opcode1) { > + case 0x68: > + case 0x6a: > + case 0x9c: > + case 0x9d: > + case 0xc5: 0xc5 = lds. Why lds? In general, it'd be nice to add a comment showing the mnemonic next to each hex value -- e.g., case 0x68: /* push */ > + case 0xe8: > + return 1; > + } Other related instructions: 9a, 1f, 07, 17, 8f. > + reg = ((*insn->next_byte) >> 3) & 7; > + if ((insn->opcode1 & 0xf0) == 0x50 || > + (insn->opcode1 == 0x1a && reg == 0) || The above line doesn't seem right. It catches things like sbb (%rax),%al . > + (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) { Looks like the interesting reg values are 2 (call), 3 (call), and 6 (push). > + return 1; > + } > + return 0; > +} > + > /** > * insn_get_opcode - collect opcode(s) > * @insn: &struct insn containing instruction > @@ -108,6 +154,8 @@ > opcode->nbytes = 1; > opcode->value = insn->opcode1; > opcode->got = true; > + if (insn->x86_64 && __insn_is_stack(insn)) > + insn->op_bytes = 8; > } > EXPORT_SYMBOL_GPL(insn_get_opcode); > > @@ -208,3 +256,115 @@ > } > EXPORT_SYMBOL_GPL(insn_rip_relative); > #endif > + > +/** > + * > + * insn_get_length() - Get the length of instruction > + * @insn: &struct insn containing instruction > + * > + * If necessary, first collects the instruction up to and including the > + * ModRM byte. > + */ As I mentioned in private email, you or I should probably refactor this into: - insn_get_sib() - insn_get_displacement() - insn_get_immediate() - insn_get_length() BTW, I'm going to have to change my definition of insn_field to accommodate the 8-byte fields that can be found in instructions like a0-a3 (8-byte displacement) and b8-bf (8-byte immediate). > +void insn_get_length(struct insn *insn) > +{ > + u8 modrm; > + u8 mod = 0, reg = 0, rm = 0, sib; > + const u8 *next_byte; > + if (insn->length) > + return; > + if (!insn->modrm.got) > + insn_get_modrm(insn); > + next_byte = insn->next_byte; This of course assumes that no fields have been fetched beyond the modrm field. > + > + if (insn->modrm.nbytes) { > + modrm = insn->modrm.value; > + mod = (modrm & 0xc0) >> 6; > + reg = (modrm & 0x38) >> 3; > + rm = (modrm & 0x07); Some comments here would really help -- e.g... /* Interpreting the modrm byte: mod = 00 - no displacement fields (exceptions below) mod = 01 - 1-byte displacement field mod = 10 - displacement field is 4 bytes, or 2 bytes if address size = 2 (0x67 prefix in 32-bit mode) mod = 11 - no memory operand If address size = 2... mod = 00, r/m = 110 - displacement field is 2 bytes If address size != 2... mod != 11, r/m = 100 - SIB byte exists mod = 00, SIB base field = 101 - displacement field is 4 bytes mod = 00, r/m = 101 - rip-relative addressing, displacement field is 4 bytes */ > + if (mod == 3) > + goto decode_src; > + if (insn->ad_bytes == 2) { > + if (mod == 1) > + next_byte++; > + else if (mod == 2) > + next_byte += 2; > + else if (rm == 6) > + next_byte += 2; > + } else { > + if (rm == 4) { > + sib = *(next_byte++); > + insn->sib.value = sib; > + insn->sib.nbytes = 1; > + insn->sib.got = 1; > + if ((sib & 7) == 5 && mod == 0) > + next_byte += 4; > + } > + if (mod == 1) > + next_byte++; > + else if (mod == 2) > + next_byte += 4; > + else if (rm == 5) > + next_byte += 4; > + } > + } else if (insn->opcode.nbytes == 1) > + if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4) Add comment: /* Displacement = entire address - up to 8 bytes */ > + next_byte += insn->ad_bytes; > +decode_src: decode_src is a misnomer. Here we're decoding the immediate operand (which is always a source operand, but not the only kind). > + if (insn->opcode.nbytes == 1) { > + switch (insn->opcode1) { > + case 0x05: > + case 0x25: What about (hex) 15, 35, 01, 0d, 2d? > + case 0x3d: > + case 0x68: // pushl > + case 0x69: // imul > + case 0x9a: /* long call */ 0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as 2 immediate operands. > + case 0xa9: // test > + case 0xc7: > + case 0xe8: > + case 0xe9: > + case 0xea: /* long jump */ Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode, disassembled as 2 immediate operands. > + case 0x82: /* Group */ s/82/81/ here. > + goto imm_common; > + case 0x04: > + case 0x24: What about (hex) 14, 34, 0c, 1c, 2c? > + case 0x3c: > + case 0x6a: //pushb > + case 0x6b: //imul > + case 0xa8: //testb > + case 0xeb: > + case 0xc0: > + case 0xc1: > + case 0xc6: > + case 0x80: /* Group */ > + case 0x81: /* Group */ s/81/82/ here. > + case 0x83: /* Group */ > + goto immbyte_common; > + } > + if ((insn->opcode1 & 0xf8) == 0xb8 || I don't think this is right. b8-bf can have 8-byte immediate fields (with 0x48 prefix). > + (insn->opcode1 == 0xf7 && reg == 0 or reg == 1 > ) ) { > +imm_common: Jumping into the middle of an if block is ugly, and not necessary here. > + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; > + } else if ((insn->opcode1 & 0xf8) == 0xb0 || // > + (insn->opcode1 & 0xf0) == 0x70 || // Jcc > + (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out > + (insn->opcode1 == 0xf6 && reg == 0)) { > +immbyte_common: Jumping into the middle of an if block is ugly, and not necessary here. > + next_byte++; > + } 0xc8 and 0xcd are weird cases that we should handle . > + } else { > + switch (insn->opcode2) { Add 0x70. > + case 0xa4: > + case 0xac: > + case 0xba: > + case 0x0f: // 3dnow > + case 0x3a: // ssse3 > + next_byte++; > + break; > + default: > + if ((insn->opcode2 & 0xf0) == 0x80) > + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; > + } > + } > + insn->length = (u8)(next_byte - insn->kaddr); > +} > +EXPORT_SYMBOL_GPL(insn_get_length); > From oleg at redhat.com Wed Mar 4 21:27:35 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 4 Mar 2009 22:27:35 +0100 Subject: Q: utrace_attach_task && utrace_release_task In-Reply-To: <20090303230838.476AEFC3C9@magilla.sf.frob.com> References: <20090303200907.GA19207@redhat.com> <20090303230838.476AEFC3C9@magilla.sf.frob.com> Message-ID: <20090304212735.GA21703@redhat.com> On 03/03, Roland McGrath wrote: > > I would rather not touch the tracehook interfaces now. You are indeed > right that the motivation for this had to do with the utrace-indirect code. > As I've said, I do intend to resurrect that code and send it upstream later > on. We can consider cleanups then. For now, let's not do anything > preemptively that is likely to introduce a new need to touch non-utrace > code again later. OK, understand, thanks. A couple of questions... utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean, how can it help, we don't hold any locks, target can change its ->exit_state right after the check. So, looks like we can attach to the EXIT_DEAD target. Is it safe? The only in-kernel user of utrace is ptrace, in that case I _think_ we are safe, we should notice that the task is dead later, for example in get_utrace_lock(), and do UTRACE_DETACH. But in general, is it OK? Hmm... utrace_release_task() checks only ->attached, I can't understand why it ignores ->attaching. Let's suppose we are doing PTRACE_ATTACH to the exiting task, isn't it possible to leak the attached engine? I don't understand why utrace_release_task() doesn't set ->reap = 1 unconditionally. In that case we could use this flag instead of EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock(). Back to utrace_attach_task(), static inline int utrace_attach_delay(struct task_struct *target) { if (target->flags & PF_STARTING) { struct utrace *utrace = task_utrace_struct(current); if (!utrace || utrace->cloning != target) { yield(); if (signal_pending(current)) return -ERESTARTNOINTR; return -EAGAIN; Why does it call yield() before returning the error? This looks really strange. And what is the point to check signal_pending() here? (btw, "!utrace" above is not possible). Oleg. From mhiramat at redhat.com Thu Mar 5 02:10:08 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Wed, 04 Mar 2009 21:10:08 -0500 Subject: instruction-analysis API(s) In-Reply-To: <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> Message-ID: <49AF3480.1040804@redhat.com> Hi Jim, Jim Keniston wrote: > On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: > ... >> Here are a patch against your code and an example code for >> instruction length decoder. >> Curiously, KVM's instruction decoder does not completely >> cover all instructions(especially, Jcc/test...). >> I had to refer Intel manuals. >> >> Moreover, even with this patch, the decoder is incomplete. >> - this doesn't cover 3bytes opcode yet. >> - this doesn't decode sib, displacement and immediate. >> - might have some bugs :-( >> >> >> Thank you, > > Thanks for your work on this. Comments below. Thank you very much for review! Actually, that code was based on KVM code, so I also found many opcodes were not supported. > As I mentioned in private email, you or I should probably refactor this > into: > - insn_get_sib() > - insn_get_displacement() > - insn_get_immediate() > - insn_get_length() Agreed, these should be supported. I also would like to change struct insn as below; struct insn { struct insn_field prefixes; /* prefixes.value is a bitmap */ struct insn_field opcode; /* opcode.bytes[n] == opcode_n */ struct insn_field modrm; struct insn_field sib; struct insn_field displacement; union { struct insn_field immediate; struct insn_field moffset1; /* for 64bit MOV */ struct insn_field immediate1; /* for 64bit imm or off16/32 */ }; union { struct insn_field moffset2; /* for 64bit MOV */ struct insn_field immediate2; /* for 64bit imm or seg16 */ }; u8 opnd_bytes; u8 addr_bytes; u8 length; bool x86_64; const u8 *kaddr; /* kernel address of insn (copy) to analyze */ const u8 *next_byte; }; opcode2 and opcode3 will be stored in opcode.value with opcode1. Now, I'm updating my code. Would anyone also be working on it? Thank you, > > Jim > >> plain text document attachment (insn_x86.patch) >> Index: insn_x86.h >> =================================================================== >> --- insn_x86.h (revision 1510) >> +++ insn_x86.h (working copy) >> @@ -66,6 +66,10 @@ >> struct insn_field displacement; >> struct insn_field immediate; >> >> + u8 op_bytes; > > I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I > first saw "op", I thought "opcode".) Also, we should clarify that these > are the EFFECTIVE lengths, not the lengths of the immediate and > displacement fields in the instruction. > >> + u8 ad_bytes; >> + u8 length; >> + >> const u8 *kaddr; /* kernel address of insn (copy) to analyze */ >> const u8 *next_byte; >> bool x86_64; >> @@ -75,6 +79,7 @@ >> extern void insn_get_prefixes(struct insn *insn); >> extern void insn_get_opcode(struct insn *insn); >> extern void insn_get_modrm(struct insn *insn); >> +extern void insn_get_length(struct insn *insn); >> >> #ifdef CONFIG_X86_64 >> extern bool insn_rip_relative(struct insn *insn); >> Index: insn_x86.c >> =================================================================== >> --- insn_x86.c (revision 1510) >> +++ insn_x86.c (working copy) >> @@ -17,7 +17,7 @@ >> * >> * Copyright (C) IBM Corporation, 2002, 2004, 2009 >> */ >> - >> +#include >> #include >> // #include >> #include "insn_x86.h" >> @@ -34,6 +34,11 @@ >> insn->kaddr = kaddr; >> insn->next_byte = kaddr; >> insn->x86_64 = x86_64; >> + insn->op_bytes = 4; >> + if (x86_64) >> + insn->ad_bytes = 8; >> + else >> + insn->ad_bytes = 4; >> } >> EXPORT_SYMBOL_GPL(insn_init); >> >> @@ -79,10 +84,51 @@ >> break; >> prefixes->value |= pfx; >> } >> + if (prefixes->value & X86_PFX_OPNDSZ) { >> + /* oprand size switches 2/4 */ >> + insn->op_bytes ^= 6; >> + } >> + if (prefixes->value & X86_PFX_ADDRSZ) { >> + /* address size switches 2/4 or 4/8 */ >> +#ifdef CONFIG_X86_64 >> + if (insn->x86_64) >> + insn->op_bytes ^= 12; >> + else >> +#endif >> + insn->op_bytes ^= 6; > > This seems wrong. You're checking the address-size prefix, but > adjusting the operand size. > >> + } >> +#ifdef CONFIG_X86_64 >> + if (prefixes->value & X86_PFX_REXW) >> + insn->op_bytes = 8; >> +#endif >> prefixes->got = true; >> } >> EXPORT_SYMBOL_GPL(insn_get_prefixes); >> >> +static bool __insn_is_stack(struct insn *insn) > > It's not entirely clear to me what this function checks. (A more > precise name might help.) You have pushes, pops, and calls here, but > you also have some instructions that don't appear to affect the stack at > all. And other push and pop instructions are missing. > >> +{ >> + u8 reg; >> + if (insn->opcode.nbytes == 2) >> + return 0; > > The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9. > > Also, since the return value is bool, I'd prefer to see true/false > rather than 1/0. > >> + >> + switch(insn->opcode1) { >> + case 0x68: >> + case 0x6a: >> + case 0x9c: >> + case 0x9d: >> + case 0xc5: > > 0xc5 = lds. Why lds? > > In general, it'd be nice to add a comment showing the mnemonic next to > each hex value -- e.g., > case 0x68: /* push */ > >> + case 0xe8: >> + return 1; >> + } > > Other related instructions: 9a, 1f, 07, 17, 8f. > >> + reg = ((*insn->next_byte) >> 3) & 7; >> + if ((insn->opcode1 & 0xf0) == 0x50 || >> + (insn->opcode1 == 0x1a && reg == 0) || > > The above line doesn't seem right. It catches things like > sbb (%rax),%al . > >> + (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) { > > Looks like the interesting reg values are 2 (call), 3 (call), and 6 > (push). > >> + return 1; >> + } >> + return 0; >> +} >> + >> /** >> * insn_get_opcode - collect opcode(s) >> * @insn: &struct insn containing instruction >> @@ -108,6 +154,8 @@ >> opcode->nbytes = 1; >> opcode->value = insn->opcode1; >> opcode->got = true; >> + if (insn->x86_64 && __insn_is_stack(insn)) >> + insn->op_bytes = 8; >> } >> EXPORT_SYMBOL_GPL(insn_get_opcode); >> >> @@ -208,3 +256,115 @@ >> } >> EXPORT_SYMBOL_GPL(insn_rip_relative); >> #endif >> + >> +/** >> + * >> + * insn_get_length() - Get the length of instruction >> + * @insn: &struct insn containing instruction >> + * >> + * If necessary, first collects the instruction up to and including the >> + * ModRM byte. >> + */ > > As I mentioned in private email, you or I should probably refactor this > into: > - insn_get_sib() > - insn_get_displacement() > - insn_get_immediate() > - insn_get_length() > > BTW, I'm going to have to change my definition of insn_field to > accommodate the 8-byte fields that can be found in instructions like > a0-a3 (8-byte displacement) and b8-bf (8-byte immediate). > >> +void insn_get_length(struct insn *insn) >> +{ >> + u8 modrm; >> + u8 mod = 0, reg = 0, rm = 0, sib; >> + const u8 *next_byte; >> + if (insn->length) >> + return; >> + if (!insn->modrm.got) >> + insn_get_modrm(insn); >> + next_byte = insn->next_byte; > > This of course assumes that no fields have been fetched beyond the modrm > field. > >> + >> + if (insn->modrm.nbytes) { >> + modrm = insn->modrm.value; >> + mod = (modrm & 0xc0) >> 6; >> + reg = (modrm & 0x38) >> 3; >> + rm = (modrm & 0x07); > > Some comments here would really help -- e.g... > /* > Interpreting the modrm byte: > mod = 00 - no displacement fields (exceptions below) > mod = 01 - 1-byte displacement field > mod = 10 - displacement field is 4 bytes, or 2 bytes if > address size = 2 (0x67 prefix in 32-bit mode) > mod = 11 - no memory operand > > If address size = 2... > mod = 00, r/m = 110 - displacement field is 2 bytes > > If address size != 2... > mod != 11, r/m = 100 - SIB byte exists > mod = 00, SIB base field = 101 - displacement field is 4 bytes > mod = 00, r/m = 101 - rip-relative addressing, displacement > field is 4 bytes > */ > >> + if (mod == 3) >> + goto decode_src; >> + if (insn->ad_bytes == 2) { >> + if (mod == 1) >> + next_byte++; >> + else if (mod == 2) >> + next_byte += 2; >> + else if (rm == 6) >> + next_byte += 2; >> + } else { >> + if (rm == 4) { >> + sib = *(next_byte++); >> + insn->sib.value = sib; >> + insn->sib.nbytes = 1; >> + insn->sib.got = 1; >> + if ((sib & 7) == 5 && mod == 0) >> + next_byte += 4; >> + } >> + if (mod == 1) >> + next_byte++; >> + else if (mod == 2) >> + next_byte += 4; >> + else if (rm == 5) >> + next_byte += 4; >> + } >> + } else if (insn->opcode.nbytes == 1) >> + if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4) > > Add comment: > /* Displacement = entire address - up to 8 bytes */ > >> + next_byte += insn->ad_bytes; >> +decode_src: > > decode_src is a misnomer. Here we're decoding the immediate operand > (which is always a source operand, but not the only kind). > >> + if (insn->opcode.nbytes == 1) { >> + switch (insn->opcode1) { >> + case 0x05: >> + case 0x25: > > What about (hex) 15, 35, 01, 0d, 2d? > >> + case 0x3d: >> + case 0x68: // pushl >> + case 0x69: // imul >> + case 0x9a: /* long call */ > > 0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as > 2 immediate operands. > >> + case 0xa9: // test >> + case 0xc7: >> + case 0xe8: >> + case 0xe9: >> + case 0xea: /* long jump */ > > Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode, > disassembled as 2 immediate operands. > >> + case 0x82: /* Group */ > > s/82/81/ here. > >> + goto imm_common; >> + case 0x04: >> + case 0x24: > > What about (hex) 14, 34, 0c, 1c, 2c? > >> + case 0x3c: >> + case 0x6a: //pushb >> + case 0x6b: //imul >> + case 0xa8: //testb >> + case 0xeb: >> + case 0xc0: >> + case 0xc1: >> + case 0xc6: >> + case 0x80: /* Group */ >> + case 0x81: /* Group */ > > s/81/82/ here. > >> + case 0x83: /* Group */ >> + goto immbyte_common; >> + } >> + if ((insn->opcode1 & 0xf8) == 0xb8 || > > I don't think this is right. b8-bf can have 8-byte immediate fields > (with 0x48 prefix). > >> + (insn->opcode1 == 0xf7 && reg == 0 > > or reg == 1 > >> ) ) { >> +imm_common: > > Jumping into the middle of an if block is ugly, and not necessary here. > >> + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; >> + } else if ((insn->opcode1 & 0xf8) == 0xb0 || // >> + (insn->opcode1 & 0xf0) == 0x70 || // Jcc >> + (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out >> + (insn->opcode1 == 0xf6 && reg == 0)) { >> +immbyte_common: > > Jumping into the middle of an if block is ugly, and not necessary here. > >> + next_byte++; >> + } > > 0xc8 and 0xcd are weird cases that we should handle . > >> + } else { >> + switch (insn->opcode2) { > > Add 0x70. > >> + case 0xa4: >> + case 0xac: >> + case 0xba: >> + case 0x0f: // 3dnow >> + case 0x3a: // ssse3 >> + next_byte++; >> + break; >> + default: >> + if ((insn->opcode2 & 0xf0) == 0x80) >> + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; >> + } >> + } >> + insn->length = (u8)(next_byte - insn->kaddr); >> +} >> +EXPORT_SYMBOL_GPL(insn_get_length); >> > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From roland at redhat.com Thu Mar 5 20:10:12 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 5 Mar 2009 12:10:12 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Ananth N Mavinakayanahalli's message of Tuesday, 3 March 2009 13:21:29 +0530 <20090303075129.GD22517@in.ibm.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> <20090303075129.GD22517@in.ibm.com> Message-ID: <20090305201012.A581DFC3BF@magilla.sf.frob.com> > There is at least one change from the earlier behaviour -- rather than > utrace_attach_task() retrying by itself on a !parent attach, -EAGAIN is > returned to the user. That may need changes to the utrace client side. Oops, that was not intentional. I've restored the old behavior. > I've just started with implementing a non-disruptive application core > dump. Its probably too early to commit, but it could also be a potential > in-kernel user of utrace. I've just started with quiescing all threads > but need to plug-in the core generating infrastructure for it. I am looking at > the possibility of tweaking do_coredump() to reuse it for this while the > workhorse can just be the binfmt->core_dump() itself. Its still in the > early prototype stage -- I'll post that when there is something more > concrete. Ideas/suggestions welcome! Oh yeah. I almost started on one of those a while back, and I have certainly put a lot of thought into the subject that we can discuss later. It is a bit of a can of worms in that the right long-run way to approach it will involve a bunch of refactoring. (That's why I haven't suggested it as a quick, clean, and self-contained demo of things utrace can do, like Frank's ftrace widget patch is. I also just hadn't thought about it in a while.) Please start a proper thread about that. Thanks, Roland From roland at redhat.com Thu Mar 5 20:27:08 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 5 Mar 2009 12:27:08 -0800 (PST) Subject: Q: utrace_attach_task && utrace_release_task In-Reply-To: Oleg Nesterov's message of Wednesday, 4 March 2009 22:27:35 +0100 <20090304212735.GA21703@redhat.com> References: <20090303200907.GA19207@redhat.com> <20090303230838.476AEFC3C9@magilla.sf.frob.com> <20090304212735.GA21703@redhat.com> Message-ID: <20090305202708.39216FC3BF@magilla.sf.frob.com> > utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean, > how can it help, we don't hold any locks, target can change its > ->exit_state right after the check. Good catch, thanks. This is a remnant of the utrace-indirect code, where utrace_first_engine() had an interlock with reap/release_task. (It's one of the several ways that arrangement is superior IMNSHO.) > I don't understand why utrace_release_task() doesn't set ->reap = 1 > unconditionally. In that case we could use this flag instead of > EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock(). That's what I've made it do now. In the utrace-indirect setup, it was possible to avoid locks for the common case (nobody attached). > static inline int utrace_attach_delay(struct task_struct *target) [...] This is the same thing Ananth noticed. It was an unintended holdover from the utrace-indirect code organization. It's fixed now. Thanks, Roland From oleg at redhat.com Thu Mar 5 21:02:01 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 5 Mar 2009 22:02:01 +0100 Subject: Q: utrace_attach_task && utrace_release_task In-Reply-To: <20090305202708.39216FC3BF@magilla.sf.frob.com> References: <20090303200907.GA19207@redhat.com> <20090303230838.476AEFC3C9@magilla.sf.frob.com> <20090304212735.GA21703@redhat.com> <20090305202708.39216FC3BF@magilla.sf.frob.com> Message-ID: <20090305210201.GA18181@redhat.com> On 03/05, Roland McGrath wrote: > > > utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean, > > how can it help, we don't hold any locks, target can change its > > ->exit_state right after the check. > > Good catch, thanks. This is a remnant of the utrace-indirect code, > where utrace_first_engine() had an interlock with reap/release_task. > (It's one of the several ways that arrangement is superior IMNSHO.) > > > I don't understand why utrace_release_task() doesn't set ->reap = 1 > > unconditionally. In that case we could use this flag instead of > > EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock(). > > That's what I've made it do now. In the utrace-indirect setup, > it was possible to avoid locks for the common case (nobody attached). Aha, I see the new patches... what about get_utrace_lock() ? Do we really need the EXI_DEAD check? And this check looks "racy" too. > > static inline int utrace_attach_delay(struct task_struct *target) > [...] > > This is the same thing Ananth noticed. It was an unintended holdover from > the utrace-indirect code organization. It's fixed now. Great, but utrace_attach_delay: if (signal_pending(current)) return -ERESTARTNOINTR; If utrace_attach_delay() fails, utrace_attach_task() returns this error. This is right, but for example, prepare_ptrace_attach() will convert it to EPERM? Oleg. From roland at redhat.com Thu Mar 5 21:52:46 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 5 Mar 2009 13:52:46 -0800 (PST) Subject: Q: utrace_attach_task && utrace_release_task In-Reply-To: Oleg Nesterov's message of Thursday, 5 March 2009 22:02:01 +0100 <20090305210201.GA18181@redhat.com> References: <20090303200907.GA19207@redhat.com> <20090303230838.476AEFC3C9@magilla.sf.frob.com> <20090304212735.GA21703@redhat.com> <20090305202708.39216FC3BF@magilla.sf.frob.com> <20090305210201.GA18181@redhat.com> Message-ID: <20090305215246.21006FC3BF@magilla.sf.frob.com> > what about get_utrace_lock() ? Do we really need the EXI_DEAD check? > And this check looks "racy" too. It is not strictly necessary any more, no. It now serves as an early unsynchronized check before taking the utrace lock, rather than as a reliable interlock. The same is now true of the check at the top of utrace_attach_task. I'm not inclined to remove them. They don't hurt now, and we'll need them back later to reimplement indirect struct utrace. > If utrace_attach_delay() fails, utrace_attach_task() returns this error. > This is right, but for example, prepare_ptrace_attach() will convert it > to EPERM? Good catch. But note that we are not really trying to review the utrace-ptrace branch right now. Thanks, Roland From jbaron at redhat.com Thu Mar 5 21:58:38 2009 From: jbaron at redhat.com (Jason Baron) Date: Thu, 5 Mar 2009 21:58:38 +0000 (UTC) Subject: [PATCH] Embed struct utrace in task_struct - V2 References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: Roland McGrath redhat.com> writes: > > Hi, Ananth. Sorry everything has slid so long (again). > (I have far too many hats and the past month not so many brains!) > > Here is my immediate agenda for utrace hacking: > > * I have incorporated the "embed struct utrace" changes. > > I did various small bits of reorganization and cosmetic cleanup > first to make the actual data structure change a smaller patch. > Since things had changed around, I didn't actually use your patch. > I just did it over myself, but I think it's nearly the same. > > After this change, we now need some fresh testing of things like Frank's > ftrace widget and stap's utrace-using modes. (Nothing should have > changed from the utrace API perspective.) > > I've created the new branch "utrace-indirect" with a revert of the > change. I think this is really the better way to organize the data > structures, as I've mentioned before. After we get an initial utrace > merged in upstream, I intend to revive this branch and turn it into an > incremental patch to (re-)improve the data structures later on. That's > for later; for the time being, the branch will just sit idle. > > * I've renamed "struct utrace_attached_engine" to "struct utrace_engine". > This was a cosmetic suggestion in an earlier LKML review, and I could not > really find any good reason to keep the longer name. We all seem to say > "a utrace engine" in conversation when talking about this object. > > I added the UTRACE_API_VERSION macro to ease existing utrace-using code > adapting to old/new names. > > * I'll shortly scour the old review comments for more cosmetic things we > might change. > > * I would like to have a final "in-team" top-to-bottom review of the main > utrace patch before sending to LKML. i.e. maybe by you, Frank, me, and Oleg. > Each pair of eyeballs should: > * make sure all barriers and other kinds of magic have adequate comments > explaining why they are there and why they are correct > * cite anything else that sticks out and maybe needs more comments > * make sure all comments are accurate and understandable > hi, i've been looking at the patch at the utrace.patch at: http://people.redhat.com/roland/utrace/2.6-current/ hopefully, that's the latest one. Anyways, i'm still looking it over, but one thing that sticks out for me along these lines are the memory barriers and usage of utrace->reporting. It seems that this field is being used exclude utrace_control when we are in the middle of a callback. however, there aren't any comments about the memory barriers and logic here, so its hard for me to tell if its correct... thanks, -Jason From roland at redhat.com Thu Mar 5 22:09:22 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 5 Mar 2009 14:09:22 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Jason Baron's message of Thursday, 5 March 2009 21:58:38 +0000 References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> Message-ID: <20090305220922.F2C03FC3BF@magilla.sf.frob.com> > i've been looking at the patch at the utrace.patch at: > > http://people.redhat.com/roland/utrace/2.6-current/ > > hopefully, that's the latest one. Yes, it's updated frequently. The .id files tell you what git commit the patch corresponds to, so we can be mutually clear in making references. 0ef2243a is the utrace branch head at the moment. > Anyways, i'm still looking it over, but one thing that sticks out for me along > these lines are the memory barriers and usage of utrace->reporting. It seems > that this field is being used exclude utrace_control when we are in the middle > of a callback. however, there aren't any comments about the memory barriers and > logic here, so its hard for me to tell if its correct... For some reason I felt sure I'd put some comments about that in a long time ago. But indeed I see they are not there. I'll write some up. This is exactly why I need you all doing this review! Thanks very much, Roland From mhiramat at redhat.com Thu Mar 5 23:01:12 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Thu, 05 Mar 2009 18:01:12 -0500 Subject: instruction-analysis API(s) In-Reply-To: <49AF3480.1040804@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> <49AF3480.1040804@redhat.com> Message-ID: <49B059B8.8090702@redhat.com> Hi Jim and Sriker, Here, I almost rewrote my patch. Changelog: - rewrite decoding logic based on Intel' manual. - supoort insn_get_sib(),insn_get_displacement() and insn_get_immediate() too. - support 3 bytes opcode and 64bit immediate. - introduce some bitmaps. Thank you, Masami Hiramatsu wrote: > Hi Jim, > > Jim Keniston wrote: >> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote: >> ... >>> Here are a patch against your code and an example code for >>> instruction length decoder. >>> Curiously, KVM's instruction decoder does not completely >>> cover all instructions(especially, Jcc/test...). >>> I had to refer Intel manuals. >>> >>> Moreover, even with this patch, the decoder is incomplete. >>> - this doesn't cover 3bytes opcode yet. >>> - this doesn't decode sib, displacement and immediate. >>> - might have some bugs :-( >>> >>> >>> Thank you, >> Thanks for your work on this. Comments below. > > Thank you very much for review! > > Actually, that code was based on KVM code, so I also found many > opcodes were not supported. > >> As I mentioned in private email, you or I should probably refactor this >> into: >> - insn_get_sib() >> - insn_get_displacement() >> - insn_get_immediate() >> - insn_get_length() > > Agreed, these should be supported. > > I also would like to change struct insn as below; > > struct insn { > struct insn_field prefixes; /* prefixes.value is a bitmap */ > struct insn_field opcode; /* opcode.bytes[n] == opcode_n */ > struct insn_field modrm; > struct insn_field sib; > struct insn_field displacement; > union { > struct insn_field immediate; > struct insn_field moffset1; /* for 64bit MOV */ > struct insn_field immediate1; /* for 64bit imm or off16/32 */ > }; > union { > struct insn_field moffset2; /* for 64bit MOV */ > struct insn_field immediate2; /* for 64bit imm or seg16 */ > }; > > u8 opnd_bytes; > u8 addr_bytes; > u8 length; > bool x86_64; > > const u8 *kaddr; /* kernel address of insn (copy) to analyze */ > const u8 *next_byte; > }; > > opcode2 and opcode3 will be stored in opcode.value with opcode1. > > Now, I'm updating my code. Would anyone also be working on it? > > Thank you, > >> Jim >> >>> plain text document attachment (insn_x86.patch) >>> Index: insn_x86.h >>> =================================================================== >>> --- insn_x86.h (revision 1510) >>> +++ insn_x86.h (working copy) >>> @@ -66,6 +66,10 @@ >>> struct insn_field displacement; >>> struct insn_field immediate; >>> >>> + u8 op_bytes; >> I'd probably use opnd_bytes and addr_bytes here, for clarity. (When I >> first saw "op", I thought "opcode".) Also, we should clarify that these >> are the EFFECTIVE lengths, not the lengths of the immediate and >> displacement fields in the instruction. >> >>> + u8 ad_bytes; >>> + u8 length; >>> + >>> const u8 *kaddr; /* kernel address of insn (copy) to analyze */ >>> const u8 *next_byte; >>> bool x86_64; >>> @@ -75,6 +79,7 @@ >>> extern void insn_get_prefixes(struct insn *insn); >>> extern void insn_get_opcode(struct insn *insn); >>> extern void insn_get_modrm(struct insn *insn); >>> +extern void insn_get_length(struct insn *insn); >>> >>> #ifdef CONFIG_X86_64 >>> extern bool insn_rip_relative(struct insn *insn); >>> Index: insn_x86.c >>> =================================================================== >>> --- insn_x86.c (revision 1510) >>> +++ insn_x86.c (working copy) >>> @@ -17,7 +17,7 @@ >>> * >>> * Copyright (C) IBM Corporation, 2002, 2004, 2009 >>> */ >>> - >>> +#include >>> #include >>> // #include >>> #include "insn_x86.h" >>> @@ -34,6 +34,11 @@ >>> insn->kaddr = kaddr; >>> insn->next_byte = kaddr; >>> insn->x86_64 = x86_64; >>> + insn->op_bytes = 4; >>> + if (x86_64) >>> + insn->ad_bytes = 8; >>> + else >>> + insn->ad_bytes = 4; >>> } >>> EXPORT_SYMBOL_GPL(insn_init); >>> >>> @@ -79,10 +84,51 @@ >>> break; >>> prefixes->value |= pfx; >>> } >>> + if (prefixes->value & X86_PFX_OPNDSZ) { >>> + /* oprand size switches 2/4 */ >>> + insn->op_bytes ^= 6; >>> + } >>> + if (prefixes->value & X86_PFX_ADDRSZ) { >>> + /* address size switches 2/4 or 4/8 */ >>> +#ifdef CONFIG_X86_64 >>> + if (insn->x86_64) >>> + insn->op_bytes ^= 12; >>> + else >>> +#endif >>> + insn->op_bytes ^= 6; >> This seems wrong. You're checking the address-size prefix, but >> adjusting the operand size. >> >>> + } >>> +#ifdef CONFIG_X86_64 >>> + if (prefixes->value & X86_PFX_REXW) >>> + insn->op_bytes = 8; >>> +#endif >>> prefixes->got = true; >>> } >>> EXPORT_SYMBOL_GPL(insn_get_prefixes); >>> >>> +static bool __insn_is_stack(struct insn *insn) >> It's not entirely clear to me what this function checks. (A more >> precise name might help.) You have pushes, pops, and calls here, but >> you also have some instructions that don't appear to affect the stack at >> all. And other push and pop instructions are missing. >> >>> +{ >>> + u8 reg; >>> + if (insn->opcode.nbytes == 2) >>> + return 0; >> The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9. >> >> Also, since the return value is bool, I'd prefer to see true/false >> rather than 1/0. >> >>> + >>> + switch(insn->opcode1) { >>> + case 0x68: >>> + case 0x6a: >>> + case 0x9c: >>> + case 0x9d: >>> + case 0xc5: >> 0xc5 = lds. Why lds? >> >> In general, it'd be nice to add a comment showing the mnemonic next to >> each hex value -- e.g., >> case 0x68: /* push */ >> >>> + case 0xe8: >>> + return 1; >>> + } >> Other related instructions: 9a, 1f, 07, 17, 8f. >> >>> + reg = ((*insn->next_byte) >> 3) & 7; >>> + if ((insn->opcode1 & 0xf0) == 0x50 || >>> + (insn->opcode1 == 0x1a && reg == 0) || >> The above line doesn't seem right. It catches things like >> sbb (%rax),%al . >> >>> + (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) { >> Looks like the interesting reg values are 2 (call), 3 (call), and 6 >> (push). >> >>> + return 1; >>> + } >>> + return 0; >>> +} >>> + >>> /** >>> * insn_get_opcode - collect opcode(s) >>> * @insn: &struct insn containing instruction >>> @@ -108,6 +154,8 @@ >>> opcode->nbytes = 1; >>> opcode->value = insn->opcode1; >>> opcode->got = true; >>> + if (insn->x86_64 && __insn_is_stack(insn)) >>> + insn->op_bytes = 8; >>> } >>> EXPORT_SYMBOL_GPL(insn_get_opcode); >>> >>> @@ -208,3 +256,115 @@ >>> } >>> EXPORT_SYMBOL_GPL(insn_rip_relative); >>> #endif >>> + >>> +/** >>> + * >>> + * insn_get_length() - Get the length of instruction >>> + * @insn: &struct insn containing instruction >>> + * >>> + * If necessary, first collects the instruction up to and including the >>> + * ModRM byte. >>> + */ >> As I mentioned in private email, you or I should probably refactor this >> into: >> - insn_get_sib() >> - insn_get_displacement() >> - insn_get_immediate() >> - insn_get_length() >> >> BTW, I'm going to have to change my definition of insn_field to >> accommodate the 8-byte fields that can be found in instructions like >> a0-a3 (8-byte displacement) and b8-bf (8-byte immediate). >> >>> +void insn_get_length(struct insn *insn) >>> +{ >>> + u8 modrm; >>> + u8 mod = 0, reg = 0, rm = 0, sib; >>> + const u8 *next_byte; >>> + if (insn->length) >>> + return; >>> + if (!insn->modrm.got) >>> + insn_get_modrm(insn); >>> + next_byte = insn->next_byte; >> This of course assumes that no fields have been fetched beyond the modrm >> field. >> >>> + >>> + if (insn->modrm.nbytes) { >>> + modrm = insn->modrm.value; >>> + mod = (modrm & 0xc0) >> 6; >>> + reg = (modrm & 0x38) >> 3; >>> + rm = (modrm & 0x07); >> Some comments here would really help -- e.g... >> /* >> Interpreting the modrm byte: >> mod = 00 - no displacement fields (exceptions below) >> mod = 01 - 1-byte displacement field >> mod = 10 - displacement field is 4 bytes, or 2 bytes if >> address size = 2 (0x67 prefix in 32-bit mode) >> mod = 11 - no memory operand >> >> If address size = 2... >> mod = 00, r/m = 110 - displacement field is 2 bytes >> >> If address size != 2... >> mod != 11, r/m = 100 - SIB byte exists >> mod = 00, SIB base field = 101 - displacement field is 4 bytes >> mod = 00, r/m = 101 - rip-relative addressing, displacement >> field is 4 bytes >> */ >> >>> + if (mod == 3) >>> + goto decode_src; >>> + if (insn->ad_bytes == 2) { >>> + if (mod == 1) >>> + next_byte++; >>> + else if (mod == 2) >>> + next_byte += 2; >>> + else if (rm == 6) >>> + next_byte += 2; >>> + } else { >>> + if (rm == 4) { >>> + sib = *(next_byte++); >>> + insn->sib.value = sib; >>> + insn->sib.nbytes = 1; >>> + insn->sib.got = 1; >>> + if ((sib & 7) == 5 && mod == 0) >>> + next_byte += 4; >>> + } >>> + if (mod == 1) >>> + next_byte++; >>> + else if (mod == 2) >>> + next_byte += 4; >>> + else if (rm == 5) >>> + next_byte += 4; >>> + } >>> + } else if (insn->opcode.nbytes == 1) >>> + if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4) >> Add comment: >> /* Displacement = entire address - up to 8 bytes */ >> >>> + next_byte += insn->ad_bytes; >>> +decode_src: >> decode_src is a misnomer. Here we're decoding the immediate operand >> (which is always a source operand, but not the only kind). >> >>> + if (insn->opcode.nbytes == 1) { >>> + switch (insn->opcode1) { >>> + case 0x05: >>> + case 0x25: >> What about (hex) 15, 35, 01, 0d, 2d? >> >>> + case 0x3d: >>> + case 0x68: // pushl >>> + case 0x69: // imul >>> + case 0x9a: /* long call */ >> 0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as >> 2 immediate operands. >> >>> + case 0xa9: // test >>> + case 0xc7: >>> + case 0xe8: >>> + case 0xe9: >>> + case 0xea: /* long jump */ >> Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode, >> disassembled as 2 immediate operands. >> >>> + case 0x82: /* Group */ >> s/82/81/ here. >> >>> + goto imm_common; >>> + case 0x04: >>> + case 0x24: >> What about (hex) 14, 34, 0c, 1c, 2c? >> >>> + case 0x3c: >>> + case 0x6a: //pushb >>> + case 0x6b: //imul >>> + case 0xa8: //testb >>> + case 0xeb: >>> + case 0xc0: >>> + case 0xc1: >>> + case 0xc6: >>> + case 0x80: /* Group */ >>> + case 0x81: /* Group */ >> s/81/82/ here. >> >>> + case 0x83: /* Group */ >>> + goto immbyte_common; >>> + } >>> + if ((insn->opcode1 & 0xf8) == 0xb8 || >> I don't think this is right. b8-bf can have 8-byte immediate fields >> (with 0x48 prefix). >> >>> + (insn->opcode1 == 0xf7 && reg == 0 >> or reg == 1 >> >>> ) ) { >>> +imm_common: >> Jumping into the middle of an if block is ugly, and not necessary here. >> >>> + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; >>> + } else if ((insn->opcode1 & 0xf8) == 0xb0 || // >>> + (insn->opcode1 & 0xf0) == 0x70 || // Jcc >>> + (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out >>> + (insn->opcode1 == 0xf6 && reg == 0)) { >>> +immbyte_common: >> Jumping into the middle of an if block is ugly, and not necessary here. >> >>> + next_byte++; >>> + } >> 0xc8 and 0xcd are weird cases that we should handle . >> >>> + } else { >>> + switch (insn->opcode2) { >> Add 0x70. >> >>> + case 0xa4: >>> + case 0xac: >>> + case 0xba: >>> + case 0x0f: // 3dnow >>> + case 0x3a: // ssse3 >>> + next_byte++; >>> + break; >>> + default: >>> + if ((insn->opcode2 & 0xf0) == 0x80) >>> + next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes; >>> + } >>> + } >>> + insn->length = (u8)(next_byte - insn->kaddr); >>> +} >>> +EXPORT_SYMBOL_GPL(insn_get_length); >>> > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: insn_x86.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: insndec.c URL: From renzo at cs.unibo.it Fri Mar 6 10:35:44 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Fri, 6 Mar 2009 11:35:44 +0100 Subject: [PATCH] UTRACE_STOP race condition? In-Reply-To: <20090213202925.GE28685@cs.unibo.it> References: <20090211095946.GA2597@cs.unibo.it> <20090213202925.GE28685@cs.unibo.it> Message-ID: <20090306103544.GH28098@cs.unibo.it> Dear Roland, dear utrace developers, I have updated my patch #1 (it solves the race condition on utrace_stop but not the nesting issue) for the latest version of utrace. renzo On Fri, Feb 13, 2009 at 09:29:25PM +0100, Renzo Davoli wrote: > I have now a complete patch that seems to be quite stable. > At least Kmview have passed through the tests without getting stuck randomly for the race condition. > --- --- kernel/utrace.c.mcgrath 2009-03-05 15:09:57.000000000 +0100 +++ kernel/utrace.c 2009-03-06 11:20:48.000000000 +0100 @@ -369,6 +369,13 @@ return killed; } +static void mark_engine_wants_stop(struct utrace_engine *engine); +static void clear_engine_wants_stop(struct utrace_engine *engine); +static bool engine_wants_stop(struct utrace_engine *engine); +static void mark_engine_wants_resume(struct utrace_engine *engine); +static void clear_engine_wants_resume(struct utrace_engine *engine); +static bool engine_wants_resume(struct utrace_engine *engine); + /* * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. * @task == current, @utrace == current->utrace, which is not locked. @@ -378,6 +385,7 @@ static bool utrace_stop(struct task_struct *task, struct utrace *utrace) { bool killed; + struct utrace_engine *engine, *next; /* * @utrace->stopped is the flag that says we are safely @@ -399,7 +407,23 @@ return true; } - utrace->stopped = 1; + /* final check: it is really needed to stop? */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } + else + utrace->stopped = 1; + } + } + if (unlikely(!utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return false; + } + __set_current_state(TASK_TRACED); /* @@ -625,6 +649,7 @@ * to record whether the engine is keeping the target thread stopped. */ #define ENGINE_STOP (1UL << _UTRACE_NEVENTS) +#define ENGINE_RESUME (1UL << (_UTRACE_NEVENTS+1)) static void mark_engine_wants_stop(struct utrace_engine *engine) { @@ -641,6 +666,21 @@ return (engine->flags & ENGINE_STOP) != 0; } +static void mark_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags |= ENGINE_RESUME; +} + +static void clear_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags &= ~ENGINE_RESUME; +} + +static bool engine_wants_resume(struct utrace_engine *engine) +{ + return (engine->flags & ENGINE_RESUME) != 0; +} + /** * utrace_set_events - choose which event reports a tracing engine gets * @target: thread to affect @@ -891,6 +931,10 @@ list_move(&engine->entry, &detached); } else { flags |= engine->flags | UTRACE_EVENT(REAP); + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } wake = wake && !engine_wants_stop(engine); } } @@ -1110,6 +1154,7 @@ * There might not be another report before it just * resumes, so make sure single-step is not left set. */ + mark_engine_wants_resume(engine); if (likely(resume)) user_disable_single_step(target); break; From renzo at cs.unibo.it Fri Mar 6 11:03:31 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Fri, 6 Mar 2009 12:03:31 +0100 Subject: [PATCH] #2 UTRACE_STOP race condition & nesting In-Reply-To: <20090214091155.GA3582@cs.unibo.it> References: <20090211095946.GA2597@cs.unibo.it> <20090213202925.GE28685@cs.unibo.it> <20090214091155.GA3582@cs.unibo.it> Message-ID: <20090306110331.GI28098@cs.unibo.it> Dear Roland, dear utrace developers, I have update also the second patch (which includes the first). This patch fixes the utrace_stop race condition and implements a consistent model of tracing engine nesting. renzo On Sat, Feb 14, 2009 at 10:11:55AM +0100, Renzo Davoli wrote: > > This is an updated patch. It solves the race condition + it gives a quick (a bit dirty) > solution to issues 3&4. > 3- Nesting, is it really useful to run all the reports in a row and > (eventually) stop and the end waiting for all the engines? > The patch waits for each engine to resume before notifying the next registered engine. > 4- report_syscall_entry engines evaluation order should be reversed > REPORT macros have an extra "reverse" argument. The macros append this string to the > list_for_each_entry_safe function name. All the macro calls skip this argument except > the one in report_syscall_entry where it is set to _reverse. > > With this patch it is possible to run nested kmview machines and ptrace works inside > the virtual machines. > > This patch is "a bit dirty" because variables and sections of code needed to count and test > the stopped engines are useless here: a task can be kept stopped for at most one engine at > a time. > > This patch is a proof-of concept to show what I meant in my previous message. > > For what concerns 1&2 (not included in this patch): > 1- Virtual Machines may need to change the system call > THis is just to simplify the implementation of arch. independent virtual machine. > I have kept the definition of missing functions in the kmview module code. > 2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for > report_syscall_entry? > It is useless for kmview as the decision of aborting the system call is taken while > the process is stopped, I am currently setting the syscall number to -1 to skip the syscall. > > For the sake of completeness there is another way to implement the partial virtual machine > stuff by introducing another "quiescence" state inside the report upcalls. > I mean: when utrace calls a report function (say for example report_syscall_entry), the function > in the module puts the process in a stopped state (maybe its TASK_TRACED and calls the schedule). > >From utrace's point of view the report function does not return until all the changes in > the task state have been completed and the decision UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken. > In this way UTRACE_STOP is never used because the module has to implement another feature > similar to UTRACE_STOP on its own. So what is UTRACE_STOP for? > > ciao > renzo --- --- kernel/utrace.c.mcgrath 2009-03-05 15:09:57.000000000 +0100 +++ kernel/utrace.c 2009-03-06 11:49:15.000000000 +0100 @@ -369,6 +369,13 @@ return killed; } +static void mark_engine_wants_stop(struct utrace_engine *engine); +static void clear_engine_wants_stop(struct utrace_engine *engine); +static bool engine_wants_stop(struct utrace_engine *engine); +static void mark_engine_wants_resume(struct utrace_engine *engine); +static void clear_engine_wants_resume(struct utrace_engine *engine); +static bool engine_wants_resume(struct utrace_engine *engine); + /* * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. * @task == current, @utrace == current->utrace, which is not locked. @@ -378,6 +385,7 @@ static bool utrace_stop(struct task_struct *task, struct utrace *utrace) { bool killed; + struct utrace_engine *engine, *next; /* * @utrace->stopped is the flag that says we are safely @@ -399,7 +407,23 @@ return true; } - utrace->stopped = 1; + /* final check: is really needed to stop? */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } + else + utrace->stopped = 1; + } + } + if (unlikely(!utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return false; + } + __set_current_state(TASK_TRACED); /* @@ -625,6 +649,7 @@ * to record whether the engine is keeping the target thread stopped. */ #define ENGINE_STOP (1UL << _UTRACE_NEVENTS) +#define ENGINE_RESUME (1UL << (_UTRACE_NEVENTS+1)) static void mark_engine_wants_stop(struct utrace_engine *engine) { @@ -641,6 +666,21 @@ return (engine->flags & ENGINE_STOP) != 0; } +static void mark_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags |= ENGINE_RESUME; +} + +static void clear_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags &= ~ENGINE_RESUME; +} + +static bool engine_wants_resume(struct utrace_engine *engine) +{ + return (engine->flags & ENGINE_RESUME) != 0; +} + /** * utrace_set_events - choose which event reports a tracing engine gets * @target: thread to affect @@ -891,6 +931,10 @@ list_move(&engine->entry, &detached); } else { flags |= engine->flags | UTRACE_EVENT(REAP); + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } wake = wake && !engine_wants_stop(engine); } } @@ -1110,6 +1154,7 @@ * There might not be another report before it just * resumes, so make sure single-step is not left set. */ + mark_engine_wants_resume(engine); if (likely(resume)) user_disable_single_step(target); break; @@ -1326,6 +1371,7 @@ static bool finish_callback(struct utrace *utrace, struct utrace_report *report, struct utrace_engine *engine, + struct task_struct *task, u32 ret) { enum utrace_resume_action action = utrace_resume_action(ret); @@ -1347,6 +1393,7 @@ spin_lock(&utrace->lock); mark_engine_wants_stop(engine); spin_unlock(&utrace->lock); + utrace_stop(task, utrace); } } else if (engine_wants_stop(engine)) { spin_lock(&utrace->lock); @@ -1401,7 +1448,7 @@ ops = engine->ops; if (want & UTRACE_EVENT(QUIESCE)) { - if (finish_callback(utrace, report, engine, + if (finish_callback(utrace, report, engine, task, (*ops->report_quiesce)(report->action, engine, task, event))) @@ -1430,25 +1477,25 @@ * @callback is the name of the member in the ops vector, and remaining * args are the extras it takes after the standard three args. */ -#define REPORT(task, utrace, report, event, callback, ...) \ +#define REPORT(reverse, task, utrace, report, event, callback, ...) \ do { \ start_report(utrace); \ - REPORT_CALLBACKS(task, utrace, report, event, callback, \ + REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, \ (report)->action, engine, current, \ ## __VA_ARGS__); \ finish_report(report, task, utrace); \ } while (0) -#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...) \ +#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...) \ do { \ struct utrace_engine *engine, *next; \ const struct utrace_engine_ops *ops; \ - list_for_each_entry_safe(engine, next, \ + list_for_each_entry_safe ## reverse(engine, next, \ &utrace->attached, entry) { \ ops = start_callback(utrace, report, engine, task, \ event); \ if (!ops) \ continue; \ - finish_callback(utrace, report, engine, \ + finish_callback(utrace, report, engine, task, \ (*ops->callback)(__VA_ARGS__)); \ } \ } while (0) @@ -1463,7 +1510,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(EXEC), + REPORT(,task, utrace, &report, UTRACE_EVENT(EXEC), report_exec, fmt, bprm, regs); } @@ -1478,7 +1525,7 @@ INIT_REPORT(report); start_report(utrace); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), + REPORT_CALLBACKS(_reverse,task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), report_syscall_entry, report.result | report.action, engine, current, regs); finish_report(&report, task, utrace); @@ -1520,7 +1567,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), + REPORT(,task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), report_syscall_exit, regs); } @@ -1536,7 +1583,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(CLONE), + REPORT(,task, utrace, &report, UTRACE_EVENT(CLONE), report_clone, clone_flags, child); /* @@ -1600,7 +1647,7 @@ utrace->report = 0; spin_unlock(&utrace->lock); - REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), + REPORT(,task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); if (was_stopped && !task_is_stopped(task)) { @@ -1637,7 +1684,7 @@ INIT_REPORT(report); long orig_code = *exit_code; - REPORT(task, utrace, &report, UTRACE_EVENT(EXIT), + REPORT(,task, utrace, &report, UTRACE_EVENT(EXIT), report_exit, orig_code, exit_code); if (report.action == UTRACE_STOP) @@ -1676,7 +1723,7 @@ utrace->interrupt = 0; spin_unlock(&utrace->lock); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH), + REPORT_CALLBACKS(,task, utrace, &report, UTRACE_EVENT(DEATH), report_death, engine, task, group_dead, signal); spin_lock(&utrace->lock); @@ -2018,7 +2065,7 @@ break; } - finish_callback(utrace, &report, engine, ret); + finish_callback(utrace, &report, engine, task, ret); } /* From ananth at in.ibm.com Fri Mar 6 15:41:34 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 6 Mar 2009 21:11:34 +0530 Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback Message-ID: <20090306154134.GB15133@in.ibm.com> Roland, With the current utrace/master tree, I am seeing that utrace_attach_task() never returns when invoked from the clone callback. The same module works fine with prior utrace (rcu as well as with my embed version). The testcase is simple: a. attach an engine to attachstop-mt (from the gdb testsuite) _before_ it calls pthread_create. b. Watch for CLONE_THREAD and try to attach a utrace engine to the new thread. The utrace_attach_task() call never returns. If the utrace module is unloaded, the kernel barfs with the following innocuous information: BUG: unable to handle kernel paging request at fffffffffffffdff IP: [] 0xffffffffa012009a PGD 203067 PUD 204067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:01/0000:01:01.1/irq CPU 6 Modules linked in: [last unloaded: utrace_quiesce_threads] Pid: 6203, comm: attachstop-mt Not tainted 2.6.29-rc7-ut #1 eserver xSeries 366-[88632RA]- RIP: 0010:[] [] 0xffffffffa012009a RSP: 0018:ffff8801d34ebe10 EFLAGS: 00010246 RAX: fffffffffffffdff RBX: ffff8801f11a36c0 RCX: 00000000c0000100 RDX: 0000000000000000 RSI: ffff8801dd0507f8 RDI: ffff88022daf4500 RBP: 00000000fffffff4 R08: ffff8801d34ea000 R09: ffff88022f2596a0 R10: ffff8800280b1600 R11: 0000000000000018 R12: ffff8801d34f1860 R13: ffff8802210dd300 R14: ffff8801dd07e2c0 R15: 00000000003d0f00 FS: 00007f58c8d286e0(0000) GS:ffff88022f18e5c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: fffffffffffffdff CR3: 00000002029bd000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process attachstop-mt (pid: 6203, threadinfo ffff8801d34ea000, task ffff8801d3512440) Stack: 00000000003d0f00 ffff8801d34f1860 ffff8802210dd300 ffff8801d3512440 ffff8801d34ebe70 ffffffffa012028d ffff8801dd050618 ffff8801d35129e0 ffff8801d35129d8 ffffffff80260480 0000000000000000 ffff8801d34f1860 Call Trace: [] ? utrace_report_clone+0x95/0xfc [] ? do_fork+0x20b/0x2f3 [] ? do_page_fault+0x3c7/0x74e [] ? stub_clone+0x13/0x20 [] ? system_call_fastpath+0x16/0x1b Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 RIP [] 0xffffffffa012009a RSP CR2: fffffffffffffdff ---[ end trace 96bb7eb644ab73a4 ]--- I have verified that the earlier version of utrace works just fine. In the earlier case, the engine would go directly on to the attached list if the calling task was the creator of the new thread. This has changed with the new implementation. I haven't yet zeroed in on what exact change caused this problem. Ananth From fche at redhat.com Fri Mar 6 15:42:46 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 6 Mar 2009 10:42:46 -0500 Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: <20090303231401.3376CFC3C9@magilla.sf.frob.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> <20090303231401.3376CFC3C9@magilla.sf.frob.com> Message-ID: <20090306154246.GE32581@redhat.com> Hi - On Tue, Mar 03, 2009 at 03:14:01PM -0800, Roland McGrath wrote: > > > * When we on the team think the utrace patch is ready to post, we need to > > > do a coordinated post of Frank's ftrace widget. [...] > > > > Would you consider simply merging it into your git tree / patch suite? > > Sure. The way to do that is for you to publish a git repository that I can > pull from. [...] OK: The following changes since commit 0ef2243aeae481f1c0f1edd23a8259bd20331b00: Roland McGrath (1): Merge remote branch 'upstream/HEAD' of /home/roland/redhat/linux/2.6/ into utrace are available in the git repository at: http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace Frank Ch. Eigler (1): utrace-based ftrace "process" engine, v2 include/linux/processtrace.h | 41 +++ kernel/trace/Kconfig | 9 + kernel/trace/Makefile | 1 + kernel/trace/trace.h | 30 ++- kernel/trace/trace_process.c | 591 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 661 insertions(+), 11 deletions(-) create mode 100644 include/linux/processtrace.h create mode 100644 kernel/trace/trace_process.c - FChE From roland at redhat.com Fri Mar 6 20:49:46 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 6 Mar 2009 12:49:46 -0800 (PST) Subject: [PATCH] Embed struct utrace in task_struct - V2 In-Reply-To: Frank Ch. Eigler's message of Friday, 6 March 2009 10:42:46 -0500 <20090306154246.GE32581@redhat.com> References: <20090119132838.GA3542@in.ibm.com> <20090119232031.82675FC3C6@magilla.sf.frob.com> <20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com> <20090302120754.9A64AFC3C6@magilla.sf.frob.com> <20090303231401.3376CFC3C9@magilla.sf.frob.com> <20090306154246.GE32581@redhat.com> Message-ID: <20090306204946.38DEBFC3BF@magilla.sf.frob.com> > http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace > > Frank Ch. Eigler (1): > utrace-based ftrace "process" engine, v2 Thanks, Frank. Your branch is now in my repo and its patch generated in 2.6-current/. I'll pull periodically, or let me know if my repo lags behind yours in future. Thanks, Roland From roland at redhat.com Fri Mar 6 20:52:34 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 6 Mar 2009 12:52:34 -0800 (PST) Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback In-Reply-To: Ananth N Mavinakayanahalli's message of Friday, 6 March 2009 21:11:34 +0530 <20090306154134.GB15133@in.ibm.com> References: <20090306154134.GB15133@in.ibm.com> Message-ID: <20090306205234.0A759FC3BF@magilla.sf.frob.com> > With the current utrace/master tree, I am seeing that utrace_attach_task() > never returns when invoked from the clone callback. The same module > works fine with prior utrace (rcu as well as with my embed version). I changed the utrace_attach_delay() logic recently. That is probably it. Please post your test case. Thanks, Roland From ananth at in.ibm.com Sat Mar 7 01:44:50 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Sat, 7 Mar 2009 07:14:50 +0530 Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com> References: <20090306154134.GB15133@in.ibm.com> <20090306205234.0A759FC3BF@magilla.sf.frob.com> Message-ID: <20090307014449.GC15133@in.ibm.com> On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote: > > With the current utrace/master tree, I am seeing that utrace_attach_task() > > never returns when invoked from the clone callback. The same module > > works fine with prior utrace (rcu as well as with my embed version). > > I changed the utrace_attach_delay() logic recently. That is probably it. Right, reverting dd30e86355e fixes the problem. > Please post your test case. Here it is -- does nothing much really :) I used this module in conjunction with attachstop_mt with an engine attaching to it before the pthread_create(). --- #include #include #include MODULE_DESCRIPTION("Utrace tests"); MODULE_LICENSE("GPL"); static int target_pid; module_param_named(pid, target_pid, int, 0); /* Structure for all threads of a process having the same utrace ops */ struct proc_utrace { struct task_struct *tgid_task; /* list of task_utrace structs */ struct list_head list; unsigned int num_threads; }; struct task_utrace { struct list_head list; struct task_struct *task; /* TODO: Get rid of this and use MATCHING_OPS on task? */ struct utrace_engine *engine; }; static const struct utrace_engine_ops ut_ops; static struct task_utrace *get_task_ut(struct task_struct *task, struct proc_utrace *proc_ut) { struct task_utrace *task_ut, *temp; list_for_each_entry_safe(task_ut, temp, &proc_ut->list, list) { if (task_ut->task == task) return task_ut; } return NULL; } static int cleanup_proc_ut(struct proc_utrace *proc_ut) { int ret = 0; struct task_utrace *task_ut, *temp; printk(KERN_INFO "Cleanup_proc_ut\n"); if (proc_ut == NULL) return 0; if (list_empty(&proc_ut->list)) goto out; /* walk proc_ut->list and free task_ut */ list_for_each_entry_safe(task_ut, temp, &proc_ut->list, list) { if (task_ut->engine) { printk(KERN_INFO "Calling detach for %d\n", task_pid_nr(task_ut->task)); ret = utrace_control(task_ut->task, task_ut->engine, UTRACE_DETACH); if (ret) printk(KERN_INFO "utrace_detach returned %d\n", ret); printk(KERN_INFO "Detached engine for %d\n", task_pid_nr(task_ut->task)); } list_del(&task_ut->list); kfree(task_ut); } out: kfree(proc_ut); return ret; } static int setup_task_ut(struct task_struct *t, struct proc_utrace *proc_ut) { struct task_utrace *task_ut; int ret = 0; if (!t || !proc_ut) return -EINVAL; printk(KERN_INFO "setup_task_ut: attaching for task %d\n", task_pid_nr(t)); task_ut = kzalloc(sizeof(*task_ut), GFP_KERNEL); if (!task_ut) return -ENOMEM; INIT_LIST_HEAD(&task_ut->list); task_ut->task = t; list_add_tail(&task_ut->list, &proc_ut->list); /* * The utrace engine's *data will point to proc_ut. */ printk(KERN_INFO "Before utrace_attach_task: %d\n", task_pid_nr(t)); task_ut->engine = utrace_attach_task(t, UTRACE_ATTACH_CREATE, &ut_ops, proc_ut); printk(KERN_INFO "After utrace_attach_task: %d, engine = %p\n", task_pid_nr(t), task_ut->engine); if (IS_ERR(task_ut->engine)) { printk(KERN_ERR "utrace_attach_task returned %d\n", (int)PTR_ERR(task_ut->engine)); task_ut->engine = NULL; ret = -ESRCH; goto out; } printk(KERN_INFO "utrace_attach_task: SUCCESS! - engine = %p\n", task_ut->engine); if (utrace_set_events(t, task_ut->engine, UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT))) { ret = -ESRCH; } proc_ut->num_threads++; out: return ret; } static u32 ut_quiesce(enum utrace_resume_action action, struct utrace_engine *engine, struct task_struct *task, unsigned long event) { printk(KERN_INFO "In quiesce callback: tid = %d\n", task_pid_nr(task)); return UTRACE_RESUME; } /* clone handler -- handle thread spawns and forks */ static u32 ut_clone(enum utrace_resume_action action, struct utrace_engine *engine, struct task_struct *parent, unsigned long clone_flags, struct task_struct *child) { struct proc_utrace *proc_ut = (struct proc_utrace *)engine->data; printk(KERN_INFO "In clone callback: parent = %d, child = %d\n", task_pid_nr(parent), task_pid_nr(child)); if (clone_flags & CLONE_THREAD) { /* New thread in the same process */ printk(KERN_INFO "New thread - tid = %d\n", task_pid_nr(child)); if (setup_task_ut(child, proc_ut)) { printk(KERN_INFO "ut_clone - calling cleanup_proc_ut\n"); cleanup_proc_ut(proc_ut); goto out; } } out: return UTRACE_RESUME; } static u32 ut_exit(enum utrace_resume_action action, struct utrace_engine *engine, struct task_struct *task, long orig_code, long *code) { struct task_utrace *task_ut; struct proc_utrace *proc_ut = (struct proc_utrace *)engine->data; printk(KERN_INFO "In exit callback: tid = %d\n", task_pid_nr(task)); /* One task dying */ task_ut = get_task_ut(task, proc_ut); if (task_ut) { proc_ut->num_threads--; list_del(&task_ut->list); kfree(task_ut); /* If we are the last task, cleanup! */ if (unlikely(list_empty(&proc_ut->list))) { printk(KERN_INFO "ut_exit - calling cleanup_proc_ut\n"); cleanup_proc_ut(proc_ut); } } printk(KERN_INFO "Detaching %d\n", task_pid_nr(task)); return UTRACE_DETACH; } static const struct utrace_engine_ops ut_ops = { .report_clone = ut_clone, /* new thread */ .report_quiesce = ut_quiesce, .report_exit = ut_exit, /* thread exit */ }; /* Engine attach -- for all threads of the process */ static struct proc_utrace *attach_utrace_engines(struct pid *pid) { int ret = 0; struct task_struct *t; struct proc_utrace *proc_ut; struct task_utrace *task_ut; struct utrace_engine *engine; if (!pid) { ret = -EINVAL; goto out; } /* * We already hold a ref to the pid here */ engine = utrace_attach_pid(pid, UTRACE_ATTACH_MATCH_OPS, &ut_ops, 0); if (IS_ERR(engine)) { if (PTR_ERR(engine) != -ENOENT) { printk(KERN_INFO "Engine already attached?\n"); goto out; } } proc_ut = kzalloc(sizeof(*proc_ut), GFP_KERNEL); if (!proc_ut) return ERR_PTR(-ENOMEM); t = proc_ut->tgid_task = pid_task(pid, PIDTYPE_PID); INIT_LIST_HEAD(&proc_ut->list); rcu_read_lock(); do { ret = setup_task_ut(t, proc_ut); printk(KERN_INFO "setup_task_ut returned %d\n", ret); if (ret) goto err_task_ut; task_ut = get_task_ut(t, proc_ut); ret = utrace_control(t, task_ut->engine, UTRACE_STOP); if (ret == 0) printk(KERN_INFO "Task %d is quiescent\n", task_pid_nr(t)); else if (ret == -EINPROGRESS) printk(KERN_INFO "Task %d is on its way to quiesce\n", task_pid_nr(t)); else { printk(KERN_ERR "utrace_control returned %d\n", ret); goto err_task_ut; } ret = 0; t = next_thread(t); } while (t != proc_ut->tgid_task); rcu_read_unlock(); return proc_ut; err_task_ut: rcu_read_unlock(); printk(KERN_INFO "attach_utrace_engines - calling cleanup_proc_ut\n"); ret = cleanup_proc_ut(proc_ut); out: return ERR_PTR(ret); } static int __init utrace_init(void) { int ret = 0; struct proc_utrace *proc_ut = NULL; struct pid *pid; pid = find_get_pid(target_pid); if (pid == NULL) { printk(KERN_ERR "Cannot find PID %d\n", target_pid); ret = -ESRCH; goto out; } /* attach an engine for each thread */ proc_ut = attach_utrace_engines(pid); if (IS_ERR(proc_ut)) { ret = (int)PTR_ERR(proc_ut); printk(KERN_ERR "utrace_attach_engines returned %d\n", ret); goto out; } out: put_pid(pid); return ret; } static void __exit utrace_exit(void) { int ret = 0; struct pid *pid; struct utrace_engine *engine; struct proc_utrace *proc_ut; pid = find_get_pid(target_pid); if (pid == NULL) { printk(KERN_ERR "Cannot find PID %d\n", target_pid); return; } printk(KERN_INFO "In module_exit for pid = %d\n", pid_vnr(pid)); engine = utrace_attach_pid(pid, UTRACE_ATTACH_MATCH_OPS, &ut_ops, 0); if (IS_ERR(engine)) printk(KERN_ERR "Can't find self: %ld\n", PTR_ERR(engine)); else if (engine == NULL) printk(KERN_ERR "Can't find self: no match\n"); else { printk(KERN_INFO "Trying to detach\n"); proc_ut = (struct proc_utrace *)engine->data; ret = cleanup_proc_ut(proc_ut); if (ret) printk(KERN_ERR "cleanup_proc_ut returned %d\n", ret); } put_pid(pid); } module_init(utrace_init); module_exit(utrace_exit); From ananth at in.ibm.com Sat Mar 7 02:07:02 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Sat, 7 Mar 2009 07:37:02 +0530 Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com> References: <20090306154134.GB15133@in.ibm.com> <20090306205234.0A759FC3BF@magilla.sf.frob.com> Message-ID: <20090307020702.GD15133@in.ibm.com> On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote: > > With the current utrace/master tree, I am seeing that utrace_attach_task() > > never returns when invoked from the clone callback. The same module > > works fine with prior utrace (rcu as well as with my embed version). > > I changed the utrace_attach_delay() logic recently. That is probably it. > Please post your test case. The issue is that target->real_parent == current->real_parent and not current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the do-while. Ananth From jkenisto at us.ibm.com Sat Mar 7 07:55:00 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Sat, 7 Mar 2009 02:55:00 -0500 Subject: instruction-analysis API(s) In-Reply-To: <49B059B8.8090702@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> <49AF3480.1040804@redhat.com> <49B059B8.8090702@redhat.com> Message-ID: <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com> Quoting Masami Hiramatsu : > Hi Jim and Sriker, > > Here, I almost rewrote my patch. > > Changelog: > - rewrite decoding logic based on Intel' manual. > - supoort insn_get_sib(),insn_get_displacement() > and insn_get_immediate() too. > - support 3 bytes opcode and 64bit immediate. > - introduce some bitmaps. > > Thank you, Well, I didn't do much of a code review -- it looks like you addressed all my concerns -- but as I mentioned on IRC, I hacked together a test rig whereby you can disassemble a designated elf file (e.g., vmlinux, libc, libm) and then compare insn_get_length()'s results with objdump's results. The comment in distill.awk shows how to use objdump, awk, and test_get_len together. I also hacked up insn_x86.h and insn_x86.c to work in user space. Most of that is accomplished via insn_x86_user.h, but it certainly isn't necessary to do it that way. In particular, __u8, __s8, __u16, etc. are versions of u8, s8, u16, etc. that can be used in both kernel and user code, so maybe we should switch to those. I tested with vmlinux, libc, and libm on both an i686 system and an x86_64 system. I found and fixed a few bugs. Here are the ones that come to mind (all fixed): - shrd/shld, which we discussed - missing support for weird nops with modrm bytes (0f 1f ...). - neglected to include the REX prefix in prefixes.nbytes - missing static decl in an inline function in insn_x86.h There are some other cases where insn_get_length() doesn't match up with the disassembly, but I don't consider them bugs: - 0x9b is an instruction (fwait), but the disassembler treats it as a prefix. For example 9b df ... can be disassembled as fstsw ... // wait, then store status word or fwait // wait fnstsw ... // store status word without waiting Perhaps it's relevant to investigate whether a single-step of 9b df ... would execute just the fwait or the whole fstsw. Anyway, this explains the "failures" of finit and fstsw that I mentioned to you. I also saw this with fstcw and fclex. - Illegal instruction sequences, such as an x86_64 instruction that starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these when disassembling data. I just filtered out (via egrep) instructions whose disassembly starts with "rex" or includes "(bad)". We could address the above by filtering them out in distill.awk or test_get_len.c. I think we're clean otherwise. There's a little more housecleaning to do -- e.g., adding Hitachi (?) copyright to IBM copyright, discarding insn_field_exists() and insn_extract_reg(), putting this all in git somewhere. But not tonight. Pull all the attached files into a directory and have a go -- e.g., $ make $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64] Jim -------------- next part -------------- # Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len # Distills the disassembly as follows: # - Removes all lines except the disassembled instructions. # - For instructions that exceed 1 line (7 bytes), crams all the hex bytes # into a single line. BEGIN { prev_addr = "" prev_hex = "" prev_mnemonic = "" } /^ *[0-9a-f]+:/ { if (split($0, field, "\t") < 3) { # This is a continuation of the same insn. prev_hex = prev_hex field[2] } else { if (prev_addr != "") printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic prev_addr = field[1] prev_hex = field[2] prev_mnemonic = field[3] } } END { if (prev_addr != "") printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic } -------------- next part -------------- /* * x86 instruction analysis * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ #ifdef KERNEL #include #include #else #include #endif // #include #include "insn_x86.h" MODULE_LICENSE("GPL"); // for test /** * insn_init() - initialize struct insn * @insn: &struct insn to be initialized * @kaddr: address (in kernel memory) of instruction (or copy thereof) * @x86_64: true for 64-bit kernel or 64-bit app */ void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64) { memset(insn, 0, sizeof(*insn)); insn->kaddr = kaddr; insn->next_byte = kaddr; insn->x86_64 = x86_64; insn->opnd_bytes = 4; if (x86_64) insn->addr_bytes = 8; else insn->addr_bytes = 4; } EXPORT_SYMBOL_GPL(insn_init); /** * insn_get_prefixes - scan x86 instruction prefix bytes * @insn: &struct insn containing instruction * * Populates the @insn->prefixes bitmap, and updates @insn->next_byte * to point to the (first) opcode. No effect if @insn->prefixes.got * is already true. */ void insn_get_prefixes(struct insn *insn) { u32 pfx; struct insn_field *prefixes = &insn->prefixes; if (prefixes->got) return; for (;; insn->next_byte++, prefixes->nbytes++) { u8 b = *(insn->next_byte); #ifdef CONFIG_X86_64 if ((b & 0xf0) == 0x40 && insn->x86_64) { prefixes->value |= X86_PFX_REX; prefixes->value |= (b & 0x0f) * X86_PFX_REX_BASE; /* REX prefix is always last. */ insn->next_byte++; prefixes->nbytes++; break; } #endif switch (b) { case 0x26: pfx = X86_PFX_ES; break; case 0x2E: pfx = X86_PFX_CS; break; case 0x36: pfx = X86_PFX_SS; break; case 0x3E: pfx = X86_PFX_DS; break; case 0x64: pfx = X86_PFX_FS; break; case 0x65: pfx = X86_PFX_GS; break; case 0x66: pfx = X86_PFX_OPNDSZ; break; case 0x67: pfx = X86_PFX_ADDRSZ; break; case 0xF0: pfx = X86_PFX_LOCK; break; case 0xF2: pfx = X86_PFX_REPNE; break; case 0xF3: pfx = X86_PFX_REPE; break; default: pfx = 0x0; break; } if (!pfx) break; prefixes->value |= pfx; } if (prefixes->value & X86_PFX_OPNDSZ) { /* oprand size switches 2/4 */ insn->opnd_bytes ^= 6; } if (prefixes->value & X86_PFX_ADDRSZ) { /* address size switches 2/4 or 4/8 */ #ifdef CONFIG_X86_64 if (insn->x86_64) insn->addr_bytes ^= 12; else #endif insn->addr_bytes ^= 6; } #ifdef CONFIG_X86_64 if (prefixes->value & X86_PFX_REXW) insn->opnd_bytes = 8; #endif prefixes->got = true; } EXPORT_SYMBOL_GPL(insn_get_prefixes); /** * insn_get_opcode - collect opcode(s) * @insn: &struct insn containing instruction * * Populates @insn->opcode1 (and @insn->opcode2, if it's a 2-byte opcode) * and updates @insn->next_byte to point past the opcode byte(s). * If necessary, first collects any preceding (prefix) bytes. * Sets @insn->opcode.value = opcode1. No effect if @insn->opcode.got * is already true. */ void insn_get_opcode(struct insn *insn) { struct insn_field *opcode = &insn->opcode; if (opcode->got) return; if (!insn->prefixes.got) insn_get_prefixes(insn); OPCODE1(insn) = *insn->next_byte++; if (OPCODE1(insn) == 0x0f) { OPCODE2(insn) = *insn->next_byte++; if (OPCODE2(insn) == 0x38 || OPCODE2(insn) == 0x3a) { OPCODE3(insn) = *insn->next_byte++; opcode->nbytes = 3; } else opcode->nbytes = 2; } else opcode->nbytes = 1; opcode->got = true; } EXPORT_SYMBOL_GPL(insn_get_opcode); const u32 onebyte_has_modrm[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ----------------------------------------------- */ W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 0f */ W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 1f */ W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 2f */ W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 3f */ W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */ W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */ W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 6f */ W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 7f */ W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 8f */ W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */ W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* af */ W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */ W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* ef */ W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* ff */ /* ----------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; const u32 twobyte_has_modrm[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ----------------------------------------------- */ W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 1f */ W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ /* ----------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; #ifdef CONFIG_X86_64 const u32 onebyte_force_64[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ----------------------------------------------- */ W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 0f */ W(0x10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ W(0x20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 2f */ W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */ W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0) | /* 6f */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 7f */ W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) | /* 8f */ W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */ W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* af */ W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */ W(0xc0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) | /* cf */ W(0xd0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */ W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* ef */ W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) /* ff */ /* ----------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; /* force 64 or default 64 bits operand opcodes */ static bool __operand_64(struct insn *insn) { u8 reg = MODRM_REG(insn); if (insn->opcode.nbytes == 1) { if (test_bit(OPCODE1(insn), (const unsigned long*) onebyte_force_64) || (OPCODE1(insn) == 0xff && (reg == 2 || reg == 4 || reg == 6))) return true; } return false; } #endif /** * insn_get_modrm - collect ModRM byte, if any * @insn: &struct insn containing instruction * * Populates @insn->modrm and updates @insn->next_byte to point past the * ModRM byte, if any. If necessary, first collects the preceding bytes * (prefixes and opcode(s)). No effect if @insn->modrm.got is already true. */ void insn_get_modrm(struct insn *insn) { struct insn_field *modrm = &insn->modrm; if (modrm->got) return; if (!insn->opcode.got) insn_get_opcode(insn); switch (insn->opcode.nbytes) { case 1: modrm->nbytes = test_bit(OPCODE1(insn), (const unsigned long*) onebyte_has_modrm); break; case 2: modrm->nbytes = test_bit(OPCODE2(insn), (const unsigned long*) twobyte_has_modrm); break; case 3: /* Three bytes opcodes always have modrm */ modrm->nbytes = 1; break; } if (modrm->nbytes) modrm->value = *(insn->next_byte++); #ifdef CONFIG_X86_64 if (insn->x86_64 && __operand_64(insn)) insn->opnd_bytes = 8; #endif modrm->got = true; } EXPORT_SYMBOL_GPL(insn_get_modrm); #ifdef CONFIG_X86_64 /** * insn_rip_relative() - Does instruction use RIP-relative addressing mode? * @insn: &struct insn containing instruction * * If necessary, first collects the instruction up to and including the * ModRM byte. No effect if @insn->x86_64 is false. */ bool insn_rip_relative(struct insn *insn) { struct insn_field *modrm = &insn->modrm; if (!insn->x86_64) return false; if (!modrm->got) insn_get_modrm(insn); /* * For rip-relative instructions, the mod field (top 2 bits) * is zero and the r/m field (bottom 3 bits) is 0x5. */ return (insn_field_exists(modrm) && (modrm->value & 0xc7) == 0x5); } EXPORT_SYMBOL_GPL(insn_rip_relative); #endif /** * * insn_get_sib() - Get the SIB byte of instruction * @insn: &struct insn containing instruction * * If necessary, first collects the instruction up to and including the * ModRM byte. */ void insn_get_sib(struct insn *insn) { if (insn->sib.got) return; if (!insn->modrm.got) insn_get_modrm(insn); if (insn->modrm.nbytes) if (insn->addr_bytes != 2 && MODRM_MOD(insn) != 3 && MODRM_RM(insn) == 4) { insn->sib.value = *(insn->next_byte++); insn->sib.nbytes = 1; } insn->sib.got = true; } EXPORT_SYMBOL_GPL(insn_get_sib); #define get_next(t, insn) \ ({t r; r = *(t *)insn->next_byte; insn->next_byte += sizeof(t); r;}) /** * * insn_get_displacement() - Get the displacement of instruction * @insn: &struct insn containing instruction * * If necessary, first collects the instruction up to and including the * SIB byte. * Displacement value is sign-expanded. */ void insn_get_displacement(struct insn *insn) { u8 mod; if (insn->displacement.got) return; if (!insn->sib.got) insn_get_sib(insn); if (insn->modrm.nbytes) { /* * Interpreting the modrm byte: * mod = 00 - no displacement fields (exceptions below) * mod = 01 - 1-byte displacement field * mod = 10 - displacement field is 4 bytes, or 2 bytes if * address size = 2 (0x67 prefix in 32-bit mode) * mod = 11 - no memory operand * * If address size = 2... * mod = 00, r/m = 110 - displacement field is 2 bytes * * If address size != 2... * mod != 11, r/m = 100 - SIB byte exists * mod = 00, SIB base = 101 - displacement field is 4 bytes * mod = 00, r/m = 101 - rip-relative addressing, displacement * field is 4 bytes */ mod = MODRM_MOD(insn); if (mod == 3) goto out; if (mod == 1) { insn->displacement.value = *((s8 *)insn->next_byte++); insn->displacement.nbytes = 1; } else if (insn->addr_bytes == 2) { if ((mod == 0 && MODRM_RM(insn) == 6) || mod == 2) { insn->displacement.value = get_next(s16, insn); insn->displacement.nbytes = 2; } } else { if ((mod == 0 && MODRM_RM(insn) == 5) || mod == 2 || (mod == 0 && SIB_BASE(insn) == 5)) { insn->displacement.value = get_next(s32, insn); insn->displacement.nbytes = 4; } } } out: insn->displacement.got = true; } EXPORT_SYMBOL_GPL(insn_get_displacement); const u32 onebyte_has_immb[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ----------------------------------------------- */ W(0x00, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) | /* 0f */ W(0x10, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) , /* 1f */ W(0x20, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) | /* 2f */ W(0x30, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) , /* 3f */ W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */ W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */ W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0) | /* 6f */ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 7f */ W(0x80, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */ W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) | /* af */ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */ W(0xc0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* cf */ W(0xd0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */ W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0) | /* ef */ W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) /* ff */ /* ----------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; const u32 onebyte_has_imm[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ----------------------------------------------- */ W(0x00, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* 0f */ W(0x10, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) , /* 1f */ W(0x20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* 2f */ W(0x30, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) , /* 3f */ W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */ W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */ W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* 6f */ W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 7f */ W(0x80, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */ W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) | /* af */ W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */ W(0xc0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ W(0xd0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */ W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* ef */ W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) /* ff */ /* ----------------------------------------------- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; /* Decode moffset16/32/64 */ static void __get_moffset(struct insn *insn) { switch (insn->addr_bytes) { case 2: insn->moffset1.value = get_next(s16, insn); insn->moffset1.nbytes = 2; break; case 4: insn->moffset1.value = get_next(s32, insn); insn->moffset1.nbytes = 4; break; case 8: insn->moffset1.value = get_next(s32, insn); insn->moffset1.nbytes = 4; insn->moffset2.value = get_next(s32, insn); insn->moffset2.nbytes = 4; break; } insn->moffset1.got = insn->moffset2.got = true; } /* Decode imm(Iz) */ static void __get_imm(struct insn *insn) { switch (insn->opnd_bytes) { case 2: insn->immediate.value = get_next(s16, insn); insn->immediate.nbytes = 2; break; case 4: case 8: insn->immediate.value = get_next(s32, insn); insn->immediate.nbytes = 4; break; } } /* Decode imm64(Iv) */ static void __get_imm64(struct insn *insn) { switch (insn->opnd_bytes) { case 2: insn->immediate1.value = get_next(s16, insn); insn->immediate1.nbytes = 2; break; case 4: insn->immediate1.value = get_next(s32, insn); insn->immediate1.nbytes = 4; break; case 8: insn->immediate1.value = get_next(s32, insn); insn->immediate1.nbytes = 4; insn->immediate2.value = get_next(s32, insn); insn->immediate2.nbytes = 4; break; } insn->immediate1.got = insn->immediate2.got = true; } /* Decode ptr16:16/32(AP) */ static void __get_immptr(struct insn *insn) { switch (insn->opnd_bytes) { case 2: insn->immediate1.value = get_next(s16, insn); insn->immediate1.nbytes = 2; break; case 4: insn->immediate1.value = get_next(s32, insn); insn->immediate1.nbytes = 4; break; case 8: /* ptr16:64 is not supported (no segment) */ WARN_ON(1); return; } insn->immediate2.value = get_next(u16, insn); insn->immediate2.nbytes = 2; insn->immediate1.got = insn->immediate2.got = true; } /** * * insn_get_immediate() - Get the immediates of instruction * @insn: &struct insn containing instruction * * If necessary, first collects the instruction up to and including the * displacement bytes. * Basically, most of immediates are sign-expanded. Unsigned-value can be * get by bit masking with ((1 << (nbytes * 8)) - 1) */ void insn_get_immediate(struct insn *insn) { u8 opcode; if (insn->immediate.got) return; if (!insn->displacement.got) insn_get_displacement(insn); if (insn->opcode.nbytes == 1) { opcode = OPCODE1(insn); if (opcode >= 0xa0 && opcode <= 0xa3) { /* direct moffset mov */ __get_moffset(insn); } else if (test_bit(opcode, (const unsigned long *)onebyte_has_immb) || (opcode == 0xf6 && MODRM_REG(insn) == 0)) { insn->immediate.value = get_next(s8, insn); insn->immediate.nbytes = 1; } else if (test_bit(opcode, (const unsigned long *)onebyte_has_imm) || (opcode == 0xf7 && MODRM_REG(insn) == 0)) { __get_imm(insn); } else if (0xb8 <= opcode && opcode <= 0xbf /* mov immv */) { __get_imm64(insn); } else if (opcode == 0xea /* jmp far seg:offs */) { __get_immptr(insn); } else if (opcode == 0xc2 /* retn immw */ || opcode == 0xca /* retf immw */) { insn->immediate.value = get_next(u16, insn); insn->immediate.nbytes = 2; } else if (opcode == 0xc8 /* enter immw, immb */) { insn->immediate1.value = get_next(u16, insn); insn->immediate1.nbytes = 2; insn->immediate2.value = get_next(u8, insn); insn->immediate2.nbytes = 1; } } else if (insn->opcode.nbytes == 2) { opcode = OPCODE2(insn); if ((opcode & 0xf0) == 0x80 /* Jcc imm32 */) { __get_imm(insn); } else switch(opcode) { case 0x70: /* pshuf* %1, %2, immb */ case 0x71: /* Group12 %1, immb */ case 0x72: /* Group13 %1, immb */ case 0x73: /* Group14 %1, immb */ case 0xa4: /* shld %1, %2, immb */ case 0xac: /* shrd %1, %2, immb */ case 0xba: /* Group8 %1, immb */ case 0xc2: /* cmpps %1, %2, immb */ case 0xc4: /* pinsw %1, %2, immb */ case 0xc5: /* pextrw %1, %2, immb */ case 0xc6: /* shufps/d %1, %2, immb */ insn->immediate.value = get_next(u8, insn); insn->immediate.nbytes = 1; default: break; } } else if (OPCODE3(insn) == 0x0f /* pailgnr %1, %2, immb */) { insn->immediate.value = get_next(u8, insn); insn->immediate.nbytes = 1; } insn->immediate.got = true; } EXPORT_SYMBOL_GPL(insn_get_immediate); /** * * insn_get_length() - Get the length of instruction * @insn: &struct insn containing instruction * * If necessary, first collects the instruction up to and including the * immediates bytes. */ void insn_get_length(struct insn *insn) { if (insn->length) return; if (!insn->immediate.got) insn_get_immediate(insn); insn->length = (u8)((unsigned long)insn->next_byte - (unsigned long)insn->kaddr); } EXPORT_SYMBOL_GPL(insn_get_length); -------------- next part -------------- #ifndef _ASM_X86_INSN_H #define _ASM_X86_INSN_H /* * x86 instruction analysis * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2009 */ #ifdef KERNEL #include #else #include "insn_x86_user.h" #endif #undef W #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \ (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \ (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \ << (row % 32)) /* legacy instruction prefixes */ #define X86_PFX_OPNDSZ 0x1 /* 0x66 */ #define X86_PFX_ADDRSZ 0x2 /* 0x67 */ #define X86_PFX_CS 0x4 /* 0x2E */ #define X86_PFX_DS 0x8 /* 0x3E */ #define X86_PFX_ES 0x10 /* 0x26 */ #define X86_PFX_FS 0x20 /* 0x64 */ #define X86_PFX_GS 0x40 /* 0x65 */ #define X86_PFX_SS 0x80 /* 0x36 */ #define X86_PFX_LOCK 0x100 /* 0xF0 */ #define X86_PFX_REPE 0x200 /* 0xF3 */ #define X86_PFX_REPNE 0x400 /* 0xF2 */ /* REX prefix */ #define X86_PFX_REX 0x800 /* 0x4X */ /* REX prefix dissected */ #define X86_PFX_REX_BASE 0x1000 #define X86_PFX_REXB 0x1000 /* 0x41 bit */ #define X86_PFX_REXX 0x2000 /* 0x42 bit */ #define X86_PFX_REXR 0x4000 /* 0x44 bit */ #define X86_PFX_REXW 0x8000 /* 0x48 bit */ struct insn_field { union { s32 value; u8 bytes[4]; }; bool got; /* true if we've run insn_get_xxx() for this field */ u8 nbytes; }; struct insn { struct insn_field prefixes; /* prefixes.value is a bitmap */ struct insn_field opcode; /* * opcode.bytes[0]: opcode1 * opcode.bytes[1]: opcode2 * opcode.bytes[2]: opcode3 */ struct insn_field modrm; struct insn_field sib; struct insn_field displacement; union { struct insn_field immediate; struct insn_field moffset1; /* for 64bit MOV */ struct insn_field immediate1; /* for 64bit imm or off16/32 */ }; union { struct insn_field moffset2; /* for 64bit MOV */ struct insn_field immediate2; /* for 64bit imm or seg16 */ }; u8 opnd_bytes; u8 addr_bytes; u8 length; bool x86_64; const u8 *kaddr; /* kernel address of insn (copy) to analyze */ const u8 *next_byte; }; #define OPCODE1(insn) ((insn)->opcode.bytes[0]) #define OPCODE2(insn) ((insn)->opcode.bytes[1]) #define OPCODE3(insn) ((insn)->opcode.bytes[2]) #define MODRM_MOD(insn) (((insn)->modrm.value & 0xc0) >> 6) #define MODRM_REG(insn) (((insn)->modrm.value & 0x38) >> 3) #define MODRM_RM(insn) ((insn)->modrm.value & 0x07) #define SIB_SCALE(insn) (((insn)->sib.value & 0xc0) >> 6) #define SIB_INDEX(insn) (((insn)->sib.value & 0x38) >> 3) #define SIB_BASE(insn) ((insn)->sib.value & 0x07) #define MOFFSET64(insn) (((u64)((insn)->moffset2.value) << 32) | \ (u32)((insn)->moffset1.value)) #define IMMEDIATE64(insn) (((u64)((insn)->immediate2.value) << 32) | \ (u32)((insn)->immediate1.value)) extern void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64); extern void insn_get_prefixes(struct insn *insn); extern void insn_get_opcode(struct insn *insn); extern void insn_get_modrm(struct insn *insn); extern void insn_get_sib(struct insn *insn); extern void insn_get_displacement(struct insn *insn); extern void insn_get_immediate(struct insn *insn); extern void insn_get_length(struct insn *insn); #ifdef CONFIG_X86_64 extern bool insn_rip_relative(struct insn *insn); #else static bool insn_rip_relative(struct insn *insn) { return false; } #endif static inline bool insn_field_exists(const struct insn_field *field) { return (field->nbytes > 0); } static inline u8 insn_extract_reg(int modrm) { return (modrm >> 3) & 0x7; } #endif /* _ASM_X86_INSN_H */ -------------- next part -------------- #ifndef __INSN_X86_USER_H #define __INSN_X86_USER_H /* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2009 */ #ifdef __x86_64__ #define CONFIG_X86_64 #else #define CONFIG_X86_32 #endif typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef unsigned long long u64; typedef signed char s8; typedef short s16; typedef int s32; typedef long long s64; typedef enum bool { false, true } bool; /* any harmless file-scope decl */ #define NOP_DECL struct __nop #define EXPORT_SYMBOL_GPL(symbol) NOP_DECL #define MODULE_LICENSE(gpl) NOP_DECL #define WARN_ON(cond) do{}while(0) #define BITS_PER_LONG (8*sizeof(long)) /* from arch/x86/include/asm/bitops.h */ static inline int test_bit(int nr, const volatile unsigned long *addr) { return ((1UL << (nr % BITS_PER_LONG)) & (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0; } #endif /* __INSN_X86_USER_H */ -------------- next part -------------- test_get_len: test_get_len.c insn_x86.c insn_x86.h insn_x86_user.h $(CC) -g test_get_len.c insn_x86.c -o test_get_len clean: rm -f *.o clobber: clean rm -f test_get_len -------------- next part -------------- #include #include #include #include #include "insn_x86.h" /* * Test of instruction analysis in general and insn_get_length() in * particular. See if insn_get_length() and the disassembler agree * on the length of each instruction in an elf disassembly. * * usage: test_get_len [x86_64] < distilled_disassembly */ const char *prog; static void usage() { fprintf(stderr, "usage: %s [x86_64] < distilled_disassembly\n", prog); exit(1); } static void malformed_line(const char *line, int line_nr) { fprintf(stderr, "%s: malformed line %d:\n%s", prog, line_nr, line); exit(3); } main(int argc, char **argv) { char line[200]; unsigned char insn_buf[16]; struct insn insn; bool x86_64 = false; int errors = 0, insns = 0; #define MAX_ERRORS 10 prog = argv[0]; if (argc == 2) { if (!strcmp(argv[1], "x86_64")) x86_64 = true; else usage(); } else if (argc > 2) usage(); while (fgets(line, 200, stdin)) { char copy[200], *s, *tab1, *tab2; int nb = 0; unsigned b; insns++; memset(insn_buf, 0, 16); strcpy(copy, line); tab1 = strchr(copy, '\t'); if (!tab1) malformed_line(line, insns); s = tab1 + 1; s += strspn(s, " "); tab2 = strchr(s, '\t'); if (!tab2) malformed_line(line, insns); *tab2 = '\0'; // so characters beyond tab2 aren't examined while (s < tab2) { if (sscanf(s, "%x", &b) == 1) { insn_buf[nb++] = (unsigned char) b; s += 3; } else break; } insn_init(&insn, insn_buf, x86_64); insn_get_length(&insn); if (insn.length != nb) { fprintf(stderr, "%s", line); fprintf(stderr, "objdump says %d bytes, but " "insn_get_length() says %d\n", nb, insn.length); if (++errors > MAX_ERRORS) { fprintf(stderr, "Stopping after %d errors " "and %d instructions.\n", MAX_ERRORS, insns); exit(2); } } } exit(0); } From ananth at in.ibm.com Sat Mar 7 11:57:35 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Sat, 7 Mar 2009 17:27:35 +0530 Subject: [PATCH] Fix utrace_attach_delay() to work correctly with cloned threads In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com> References: <20090306154134.GB15133@in.ibm.com> <20090306205234.0A759FC3BF@magilla.sf.frob.com> Message-ID: <20090307115735.GE15133@in.ibm.com> On a CLONE_THREAD, target->real_parent == current->real_parent and not current. New threads would loop forever here. Fix utrace_attach_delay() to work correctly with new threads. Signed-off-by: Ananth N Mavinakayanahalli --- kernel/utrace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: utrace-6mar/kernel/utrace.c =================================================================== --- utrace-6mar.orig/kernel/utrace.c +++ utrace-6mar/kernel/utrace.c @@ -123,12 +123,15 @@ static inline bool exclude_utrace(struct */ static inline int utrace_attach_delay(struct task_struct *target) { - if ((target->flags & PF_STARTING) && target->real_parent != current) - do { - schedule_timeout_interruptible(1); - if (signal_pending(current)) - return -ERESTARTNOINTR; - } while (target->flags & PF_STARTING); + if ((target->flags & PF_STARTING) && target->real_parent != current) { + if (target->real_parent != current->real_parent) { + do { + schedule_timeout_interruptible(1); + if (signal_pending(current)) + return -ERESTARTNOINTR; + } while (target->flags & PF_STARTING); + } + } return 0; } From stapling at padis.com.pl Sat Mar 7 20:33:03 2009 From: stapling at padis.com.pl (Leonhardt Falencki) Date: Sat, 07 Mar 2009 20:33:03 +0000 Subject: How manny orgasm can man do? Message-ID: <3863668088.20090307202958@padis.com.pl> How many orgasm can man do? I had four orgasms in about 400 minutes! :) Instead of thinking about the fishing and the philip kynge of spaine did practise to be asured intelligence and wisdom by waiting upon those very difficult to attain, must be constant in him and proud of his great talents and was a devoted. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleg at redhat.com Sat Mar 7 23:03:30 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Sun, 8 Mar 2009 00:03:30 +0100 Subject: [PATCH] Fix utrace_attach_delay() to work correctly with cloned threads Message-ID: <20090307230330.GA26139@redhat.com> Ananth N Mavinakayanahalli wrote: > > --- utrace-6mar.orig/kernel/utrace.c > +++ utrace-6mar/kernel/utrace.c > @@ -123,12 +123,15 @@ static inline bool exclude_utrace(struct > */ > static inline int utrace_attach_delay(struct task_struct *target) > { > - if ((target->flags & PF_STARTING) && target->real_parent != current) > - do { > - schedule_timeout_interruptible(1); > - if (signal_pending(current)) > - return -ERESTARTNOINTR; > - } while (target->flags & PF_STARTING); > + if ((target->flags & PF_STARTING) && target->real_parent != current) { > + if (target->real_parent != current->real_parent) { But target->real_parent == current->real_parent doesn't mean current is a creator? It is possible that current's ->real_parent does fork(). And even with CLONE_THREAD, this doesn't mean we are creator, but the commment says "The creator gets the first chance to attach". Perhaps we can intruduce the new UTRACE_ATTACH_XXX, this flags should be used when utrace_attach_task() is called from ->report_clone(), and then something like --- kernel/utrace.c +++ kernel/utrace.c @@ -130,12 +130,11 @@ static inline bool exclude_utrace(struct */ static inline int utrace_attach_delay(struct task_struct *target) { - if ((target->flags & PF_STARTING) && target->real_parent != current) - do { - schedule_timeout_interruptible(1); - if (signal_pending(current)) - return -ERESTARTNOINTR; - } while (target->flags & PF_STARTING); + while (unlikely(target->flags & PF_STARTING)) { + schedule_timeout_interruptible(1); + if (signal_pending(current)) + return -ERESTARTNOINTR; + } return 0; } @@ -267,7 +266,8 @@ struct utrace_engine *utrace_attach_task engine->ops = ops; engine->data = data; - ret = utrace_attach_delay(target); + if (!(flags & UTRACE_ATTACH_XXX)) + ret = utrace_attach_delay(target); if (likely(!ret)) ret = utrace_add_engine(target, utrace, engine, flags, ops, data); when ->report_clone() is called current == creator always. Yes, this is ugly, I agree. We can also add "struct task_struct *creator" to "struct utrace". It is be set by tracehook_finish_clone/utrace_init_task, and it is cleared by tracehook_report_clone() path. In that case we do not need PF_STARTING. But this blows task_struct... Oleg. From ananth at in.ibm.com Sun Mar 8 14:53:54 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Sun, 8 Mar 2009 20:23:54 +0530 Subject: [PATCH] Fix utrace_attach_delay() to work correctly with cloned threads In-Reply-To: <20090307230330.GA26139@redhat.com> References: <20090307230330.GA26139@redhat.com> Message-ID: <20090308145354.GA4600@in.ibm.com> On Sun, Mar 08, 2009 at 12:03:30AM +0100, Oleg Nesterov wrote: > Ananth N Mavinakayanahalli wrote: ... > We can also add "struct task_struct *creator" to "struct utrace". It is > be set by tracehook_finish_clone/utrace_init_task, and it is cleared by > tracehook_report_clone() path. In that case we do not need PF_STARTING. > But this blows task_struct... But just by one pointer size. Perhaps reverting commit dd30e86355 would suffice? Ananth From roland at redhat.com Mon Mar 9 18:23:51 2009 From: roland at redhat.com (Roland McGrath) Date: Mon, 9 Mar 2009 11:23:51 -0700 (PDT) Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback In-Reply-To: Ananth N Mavinakayanahalli's message of Saturday, 7 March 2009 07:37:02 +0530 <20090307020702.GD15133@in.ibm.com> References: <20090306154134.GB15133@in.ibm.com> <20090306205234.0A759FC3BF@magilla.sf.frob.com> <20090307020702.GD15133@in.ibm.com> Message-ID: <20090309182351.1FA6FFC3C7@magilla.sf.frob.com> > The issue is that target->real_parent == current->real_parent and not > current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the > do-while. Oops! I knew it felt too easy to remove the utrace->cloning field. If a little cleverness sufficed then I would have done it that way in the first place. I've restored the old mechanism. Thanks, Roland From winnings at uslottery.com.redhat.com Tue Mar 10 06:42:39 2009 From: winnings at uslottery.com.redhat.com (U.S Lottery) Date: Tue, 10 Mar 2009 01:42:39 -0500 Subject: Congratulation From The United States of America: Your Email Have Won Message-ID: Date:10th March 2009 Ref. No: 575061725 Batch No: 8056490902/188 Serial No: 6741137002 Winning No: KB8701/LPRC CONGRATULATION!!! We are delighted to inform you of your winning on 9th March 2009 from the United States of America International Lottery Program, which is partially based on an electronic selection of winners using their e-mail addresses. Your e-mail address was attached to ticket number; 575061725 8056490902 serial number 6741137002 .This batch draws the lucky numbers as follows: 4-13-33-37-42 bonus number 17, which consequently won the lottery in the second category. All participants were selected through a computer balloting system drawn from Nine hundred thousand E-mail addresses from Canada, Australia, Asia, Europe, Middle East, and Africa as part of our international promotions program which is conducted annually. This Lottery was promoted and sponsored by president Barack Obama as his part of social responsibility and his special way to appreciate the world citizens as the new inaugurated African-American president and base on this your emails address was lucky to be selected and you are entitled with a huge lump of $500,000.00. HOW TO FILE YOUR CLAIM: Simply contact our Fiduciary Claims Agent, with below information; The claims processor is: Name:Dr Daniel Peters E-mail: dr.danielpeters at msn.com Telephone: +447045711338 Do email the above email address, immediately with all the claims requirements below In order to avoid unnecessary delays and complications. Claims Requirements: 1. FULL NAMES: 2. NATIONALITY: 3. DATE OF BIRTH: 4. SEX: 5. MARITAL STATUS: 6. CONTACT ADDRESS: 7. TELEPHONE NUMBER: 8. OCCUPATION: 9. COUNTRY: security reasons, we advice the winner to keep this information confidential from the public until your claim is processed and been released to you. This is part of our security protocol to avoid double claiming and unwarranted taking advantage of this program by non-participant or unofficial personnel. ANY BREACH OF CONFIDENTIALITY ON THE PART OF WINNERS WILL RESULT TO DISQUALIFICATION. Best Regards. Mrs. Sean Maria Dunn (For the coordinator) Copyright ? 1968-2009 United States lottery Inc All rights. From ananth at in.ibm.com Tue Mar 10 10:59:22 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 10 Mar 2009 16:29:22 +0530 Subject: [BUG] utrace_attach_task() never returns when called from the report_clone callback In-Reply-To: <20090309182351.1FA6FFC3C7@magilla.sf.frob.com> References: <20090306154134.GB15133@in.ibm.com> <20090306205234.0A759FC3BF@magilla.sf.frob.com> <20090307020702.GD15133@in.ibm.com> <20090309182351.1FA6FFC3C7@magilla.sf.frob.com> Message-ID: <20090310105922.GF4600@in.ibm.com> On Mon, Mar 09, 2009 at 11:23:51AM -0700, Roland McGrath wrote: > > The issue is that target->real_parent == current->real_parent and not > > current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the > > do-while. > > Oops! I knew it felt too easy to remove the utrace->cloning field. If a > little cleverness sufficed then I would have done it that way in the first > place. I've restored the old mechanism. Thanks! The interface now works as expected. Ananth From info at posteserver.it Tue Mar 10 13:53:24 2009 From: info at posteserver.it (Poste Italiane) Date: Tue, 10 Mar 2009 17:53:24 +0400 (GST) Subject: Diventa utente verificato ! Message-ID: <20090310135324.B62A8D17FAE@email.arabtecuae.com> An HTML attachment was scrubbed... URL: From vfalico at redhat.com Tue Mar 10 16:33:51 2009 From: vfalico at redhat.com (Veaceslav Falico) Date: Tue, 10 Mar 2009 17:33:51 +0100 Subject: [PATCH] utrace_add_engine: add missing 'else' after 'if (utrace->reap)' Message-ID: <1236702831.8714.33.camel@darkmag.usersys.redhat.com> In function utrace_add_engine is a missing else while verifying if utrace_release_task was already called, which can lead to adding to a reaping utrace engine. Signed-off-by: Veaceslav Falico --- diff --git a/kernel/utrace.c b/kernel/utrace.c index 906145e..8fc1867 100644 --- a/kernel/utrace.c +++ b/kernel/utrace.c @@ -153,7 +153,7 @@ static int utrace_add_engine(struct task_struct *target, * Already entered utrace_release_task(), cannot attach now. */ ret = -ESRCH; - } if ((flags & UTRACE_ATTACH_EXCLUSIVE) && + } else if ((flags & UTRACE_ATTACH_EXCLUSIVE) && unlikely(matching_engine(utrace, flags, ops, data))) { ret = -EEXIST; } else { From oleg at redhat.com Tue Mar 10 16:45:36 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 10 Mar 2009 17:45:36 +0100 Subject: [PATCH] utrace_stop: trivial, kill the unnecessary assignment Message-ID: <20090310164536.GA32196@redhat.com> Kill the unneeded "killed = false", the next line overwrites "killed". Signed-off-by: Oleg Nesterov --- xxx/kernel/utrace.c~DEAD_LINE 2009-03-09 21:41:04.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-10 17:42:02.000000000 +0100 @@ -440,7 +440,6 @@ static bool utrace_stop(struct task_stru */ try_to_freeze(); - killed = false; killed = finish_utrace_stop(task, utrace); /* From oleg at redhat.com Tue Mar 10 18:23:27 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 10 Mar 2009 19:23:27 +0100 Subject: Q: REPORT_CALLBACKS()->list_for_each_entry_safe() - why _safe? Message-ID: <20090310182327.GA3826@redhat.com> REPORT_CALLBACKS/utrace_resume/etc use list_for_each_entry_safe(). Why we can't just use list_for_each_entry() ? Perhaps I misread utrace.c, but I can't see how engine can be unlinked under us. Afaics, nobody except us (finish_report->utrace_reset) can unlink the detached engines, even if we race with UTRACE_DETACH. And we can't race with utrace_release_task(). No? OTOH. If I am wrong, and UTRACE_DETACH can unlink _any_ engine from ->attached list while we are doing list_for_each_entry_safe(), then we can crash, and I can't see how _safe can help. Confused. Oleg. From mhiramat at redhat.com Tue Mar 10 19:57:11 2009 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Tue, 10 Mar 2009 15:57:11 -0400 Subject: instruction-analysis API(s) In-Reply-To: <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> <49AF3480.1040804@redhat.com> <49B059B8.8090702@redhat.com> <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com> Message-ID: <49B6C617.1090602@redhat.com> Hi Jim, Jim Keniston wrote: > Quoting Masami Hiramatsu : > >> Hi Jim and Sriker, >> >> Here, I almost rewrote my patch. >> >> Changelog: >> - rewrite decoding logic based on Intel' manual. >> - supoort insn_get_sib(),insn_get_displacement() >> and insn_get_immediate() too. >> - support 3 bytes opcode and 64bit immediate. >> - introduce some bitmaps. >> >> Thank you, > > Well, I didn't do much of a code review -- it looks like you addressed > all my concerns -- but as I mentioned on IRC, I hacked together a test > rig whereby you can disassemble a designated elf file (e.g., vmlinux, > libc, libm) and then compare insn_get_length()'s results with objdump's > results. The comment in distill.awk shows how to use objdump, awk, and > test_get_len together. Thank you for review and test! > I also hacked up insn_x86.h and insn_x86.c to work in user space. Most > of that is accomplished via insn_x86_user.h, but it certainly isn't > necessary to do it that way. In particular, __u8, __s8, __u16, etc. are > versions of u8, s8, u16, etc. that can be used in both kernel and user > code, so maybe we should switch to those. > > I tested with vmlinux, libc, and libm on both an i686 system and an > x86_64 system. I found and fixed a few bugs. Here are the ones that > come to mind (all fixed): > - shrd/shld, which we discussed > - missing support for weird nops with modrm bytes (0f 1f ...). > - neglected to include the REX prefix in prefixes.nbytes > - missing static decl in an inline function in insn_x86.h Thank you for fixing it. BTW, it might have to support vm86 mode(especially, for user code). > There are some other cases where insn_get_length() doesn't match up with > the disassembly, but I don't consider them bugs: > - 0x9b is an instruction (fwait), but the disassembler treats it as a > prefix. For example 9b df ... can be disassembled as > fstsw ... // wait, then store status word > or > fwait // wait > fnstsw ... // store status word without waiting > Perhaps it's relevant to investigate whether a single-step of 9b df ... > would execute just the fwait or the whole fstsw. Anyway, this explains > the "failures" of finit and fstsw that I mentioned to you. I also saw > this with fstcw and fclex. FYI, there is a single wait/fwait instruction described at Intel software developers manual vol.2B p.399. > - Illegal instruction sequences, such as an x86_64 instruction that > starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these > when disassembling data. I just filtered out (via egrep) instructions > whose disassembly starts with "rex" or includes "(bad)". Sure, I think insn_* should return -EINVAL or set insn.invalid = 1 if we found those invalid ops. E.g. kernel use BUG() macro, it adds some raw numbers after ud2, in that case, those raw numbers might be decoded as an illegal instruction. > We could address the above by filtering them out in distill.awk or > test_get_len.c. I think we're clean otherwise. > > There's a little more housecleaning to do -- e.g., adding Hitachi (?) > copyright to IBM copyright, discarding insn_field_exists() and > insn_extract_reg(), putting this all in git somewhere. But not tonight. > > Pull all the attached files into a directory and have a go -- e.g., > $ make > $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64] > > Jim > -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhiramat at redhat.com From roland at redhat.com Tue Mar 10 21:18:31 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 10 Mar 2009 14:18:31 -0700 (PDT) Subject: [PATCH] utrace_add_engine: add missing 'else' after 'if (utrace->reap)' In-Reply-To: Veaceslav Falico's message of Tuesday, 10 March 2009 17:33:51 +0100 <1236702831.8714.33.camel@darkmag.usersys.redhat.com> References: <1236702831.8714.33.camel@darkmag.usersys.redhat.com> Message-ID: <20090310211831.2FE22FC3B6@magilla.sf.frob.com> Good catch! Applied. Thanks, Roland From oleg at redhat.com Tue Mar 10 21:22:47 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 10 Mar 2009 22:22:47 +0100 Subject: Q: ->attaching && REPORT_CALLBACKS() Message-ID: <20090310212247.GA12258@redhat.com> Despite the fat comment in utrace_add_engine() I can't really understand the meaning of ->attaching list. The comment: * When target == current, it would be safe just to call * splice_attaching() right here. But if we're inside a * callback, just to clarify, "inside a callback" means inside utrace_report_xxx(), not only inside utrace_engine_ops->report_xxx(), right? that would mean the new engine also gets * notified about the event that precipitated its own * creation. engine->flags == 0, so it should not be notified until the caller does utrace_set_events() later, right? This is not what the user wants. It it not clear to me why the user doesn't want this. I understand this as follows. If we add the new engine to the ->attached list, and if the target is inside a callback, the target can later race with (say) utrace_set_events(). The target can see "engine->flags & event" and call start_callback/finish_callback before utrace_set_events() completes. Is this correct? I guess no. Because the "race" above can happen even if we use ->attaching. utrace_add_engine() can happen after we already entered utrace_report_xxx(), but before it does start_report(). Could you clarify? Another question. In any case I don't understand why do we really need two lists. Let's suppose we implement the new trivial helper, list_for_each_entry_xxx(pos, head, tail, member) it stops when "pos" reaches "tail", not "head". Then REPORT_CALLBACKS() can just read "tail = utrace->attached->prev" (under ->lock, or utrace_add_engine() can use list_add_rcu) before list_for_each_entry_xxx. This way we can kill ->attaching, no? Oleg. From roland at redhat.com Tue Mar 10 21:57:57 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 10 Mar 2009 14:57:57 -0700 (PDT) Subject: Q: REPORT_CALLBACKS()->list_for_each_entry_safe() - why _safe? In-Reply-To: Oleg Nesterov's message of Tuesday, 10 March 2009 19:23:27 +0100 <20090310182327.GA3826@redhat.com> References: <20090310182327.GA3826@redhat.com> Message-ID: <20090310215757.1D3BCFC3B6@magilla.sf.frob.com> You are right. I think that in some past version of the code, some utrace calls made on current from inside a callback could change the list. But now it's only possible in utrace_reset, so the list can never change from a callback. I changed the code. Thanks, Roland From roland at redhat.com Wed Mar 11 00:11:37 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 10 Mar 2009 17:11:37 -0700 (PDT) Subject: Q: ->attaching && REPORT_CALLBACKS() In-Reply-To: Oleg Nesterov's message of Tuesday, 10 March 2009 22:22:47 +0100 <20090310212247.GA12258@redhat.com> References: <20090310212247.GA12258@redhat.com> Message-ID: <20090311001137.2D625FC3B6@magilla.sf.frob.com> > The comment: > > * When target == current, it would be safe just to call > * splice_attaching() right here. But if we're inside a > * callback, > > just to clarify, "inside a callback" means inside utrace_report_xxx(), > not only inside utrace_engine_ops->report_xxx(), right? Certainly what I mean when I say "a callback" is one of the functions whose pointer lives in struct utrace_engine_ops. But I don't see how the distinction you make could even be meaningful here. A utrace_attach_task() call "inside utrace_report_foo()" could only possibly mean one made by a ->report_foo() function utrace_report_foo() calls, since obviously there are no hard-wired utrace_attach_task() calls in utrace.c itself. > that would mean the new engine also gets > * notified about the event that precipitated its own > * creation. > > engine->flags == 0, so it should not be notified until the caller > does utrace_set_events() later, right? Right. The case in question is a callback doing: new_engine = utrace_attach_task(current, ...); utrace_set_events(new_engine, ); Then new_engine would get the "this event" callback at the end of the very same reporting loop containing its creation. (This is what happened before I changed the code as the comment describes.) > This is not what the user wants. > > It it not clear to me why the user doesn't want this. Jim Keniston is the user who doesn't want this. https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html > I understand this as follows. If we add the new engine to the ->attached > list, and if the target is inside a callback, the target can later race > with (say) utrace_set_events(). The target can see "engine->flags & event" > and call start_callback/finish_callback before utrace_set_events() completes. It's not a race question. There are no guarantees for such races. (That is, utrace_set_events() calls on target!=current make no guarantee about reporting an event that might already have started. Only if the target was already stopped by your engine when you made the call can you be sure that no such event can be in progress.) The scenario we are talking about here is fully synchronous. The target itself is inside a callback, calling utrace_set_events() on itself. > Another question. In any case I don't understand why do we really need > two lists. We want that in the common case a reporting pass takes no locks. The ->attached list is never touched when the target is not quiescent. (The target uses the lock to synchronize when it transitions between being quiescent and not.) Once you believe the quiescence logic, this makes it easy to be confident about the unlocked use of that list in reporting passes. It's used in totally vanilla ways, and modified in totally vanilla ways. > Let's suppose we implement the new trivial helper, > > list_for_each_entry_xxx(pos, head, tail, member) > > it stops when "pos" reaches "tail", not "head". Then REPORT_CALLBACKS() > can just read "tail = utrace->attached->prev" (under ->lock, or > utrace_add_engine() can use list_add_rcu) before list_for_each_entry_xxx. > > This way we can kill ->attaching, no? This is a lot like what the old utrace code did before the introduction of the two lists (when the engine struct was managed using RCU). This is just an optimization over what we have now. It saves the ->attaching space (i.e. two words in struct utrace), the splice_attaching() logic (pretty cheap), and the sometimes-superfluous resume report after an attach. The cost for this is some very touchy fine-grained complexity in convincing ourselves (and reviewers) that the list traversal and modification is always correct. I've already implied that anything taking any locks for every vanilla reporting pass is a non-starter. I'm asserting preemptive optimization here because it's the case that is most important to optimize. The overhead of a reporting pass applies to situations like every system call entry, with an engine callback that quickly filters out the vast majority of calls (i.e. "if (regs->foo != __NR_bar) return;" or something about that cheap). So we think about the reporting pass overhead as something that might be done a million times a second, and accordingly think carefully about that hot path. In contrast, we are talking here about optimizing attach, and saving a couple of words of data structure space that will already be cache-hot. We are not actually following any RCU rules at all, so to use list_add_tail_rcu would really just mean that we are relying on our own fancy special list mutation scheme and proving/documenting that it is correct. It just happens to have the same implementation details as list_add_tail_rcu, and we must either copy those innards and document why they are right in our uses, or document how the list_add_tail_rcu innards happen to match what is right for our uses and keep track of any future implementation changes in rculist.h that might diverge from what we rely on. That proof and documentation entails hairy logic about SMP ordering and memory barriers and so forth. Frankly, that all seems like much more touchy hair than the utrace-indirect logic (for less benefit), and we've already decided to avoid that for the first cut because LKML reviewers found it too hairy to contemplate. Next, consider that e.g. Renzo Davoli has proposed reversing the engine order used for certain reporting passes (syscall entry vs exit having inverse order). (I'm not discussing the merits of that change, it's just an example.) Right now, a change like that would be a simple choice about the desireable API, with no implementation complexity to worry about at all, just s/list_for_each/&_reverse/. In the same vein, we contemplate for the future having engines on a priority list or some other means to add new engines somewhere other than at the end of the list. When we've resolved what interfaces we want for that, it will be straightforward to implement whatever it is using normal list.h calls (or even to use a different kind of list data structure). Many choices like those are likely to conflict with what clever SMP-safe list magic we can do now if we start on that sort of optimization now. I could easily be quite wrong about the performance trade-offs of a lock vs splice_attaching() and cache effects, etc. But before we get to worrying about that performance in great detail, the complexity argument stands. This is an optimization to consider later on, both after the upstream review has accepted the simpler code into the kernel to begin with, and after we have gotten a more mature set of uses of the API and refined the details of the API semantics on merits broader than such micro-optimization. Thanks, Roland From jkenisto at us.ibm.com Wed Mar 11 19:44:07 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 11 Mar 2009 12:44:07 -0700 Subject: instruction-analysis API(s) In-Reply-To: <49B6C617.1090602@redhat.com> References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com> <498CA248.2090708@redhat.com> <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com> <4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com> <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com> <49A85902.8000306@redhat.com> <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com> <49AF3480.1040804@redhat.com> <49B059B8.8090702@redhat.com> <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com> <49B6C617.1090602@redhat.com> Message-ID: <1236800647.4965.46.camel@dyn9047018139.beaverton.ibm.com> On Tue, 2009-03-10 at 15:57 -0400, Masami Hiramatsu wrote: > Hi Jim, ... > > > > I tested with vmlinux, libc, and libm on both an i686 system and an > > x86_64 system. ... > > Thank you for fixing it. > BTW, it might have to support vm86 mode(especially, for user code). I have a vague idea of what vm86 mode is, but I don't really understand what the implications are for instruction analysis or probing. My understanding is that its use is rare (e.g., for DOS emulators), so it hasn't been a requirement for uprobes so far. > > > There are some other cases where insn_get_length() doesn't match up with > > the disassembly, but I don't consider them bugs: > > - 0x9b is an instruction (fwait), but the disassembler treats it as a > > prefix. For example 9b df ... can be disassembled as > > fstsw ... // wait, then store status word > > or > > fwait // wait > > fnstsw ... // store status word without waiting > > Perhaps it's relevant to investigate whether a single-step of 9b df ... > > would execute just the fwait or the whole fstsw. Anyway, this explains > > the "failures" of finit and fstsw that I mentioned to you. I also saw > > this with fstcw and fclex. > > FYI, there is a single wait/fwait instruction described at Intel software > developers manual vol.2B p.399. Yes, I tried probing an fclex instruction -- which is really fwait + fnclex -- and the single-step stopped after the fwait. So our instruction analysis is correct. (Of course, I had to adjust uprobes not to reject the 0x9b opcode -- need to check that in. PR 5273 is about this sort of thing.) > > > - Illegal instruction sequences, such as an x86_64 instruction that > > starts with 0x40, or a misplaced 0x65 prefix. Typically, we see these > > when disassembling data. I just filtered out (via egrep) instructions > > whose disassembly starts with "rex" or includes "(bad)". > > Sure, I think insn_* should return -EINVAL or set insn.invalid = 1 > if we found those invalid ops. E.g. kernel use BUG() macro, it adds > some raw numbers after ud2, in that case, those raw numbers might > be decoded as an illegal instruction. It could be useful to provide a function to determine whether the byte sequence is a valid instruction, but I don't think we should make that check by default. Here are some reasons: 1. It costs execution time. For some instructions, you have to examine the prefixes and/or modrm byte as well as the opcode(s). 2. It takes time to code it 100% right. In particular, mistakenly rejecting a valid instruction can be a nuisance. 3. Intel and AMD may not completely agree on which instructions are valid in which modes. I've always consulted the AMD manuals, since they're online and appear complete, but I'm not really sure whether what they say applies without exception to (say) Pentium and EM64T. 4. kprobes and uprobes have gotten along fine without such a test. (Uprobes's test is far from complete, and deliberately screens out some valid instructions, such as sysenter, that we suspect may produce weird results when single-stepped.) The assumption is that the address provided points to the first byte of a valid instruction. Since on x86, most random byte sequences look like some kind valid instruction, catching obviously invalid sequences wouldn't buy us very much. Jim From oleg at redhat.com Wed Mar 11 22:24:01 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 11 Mar 2009 23:24:01 +0100 Subject: Q: utrace->stopped && utrace_report_jctl() Message-ID: <20090311222401.GA13512@redhat.com> I'd like to ask you to clarify what utrace->stopped means... My understanding is: if we see ->stopped == true under utrace->lock, then the target can do nothing "interesting" from the utrace's pov. The target should take utrace->lock at least once. Either in finish_utrace_stop(), or, if ->stopped was set by do_signal_stop() path, the target will call tracehook_get_signal()->utrace_get_signal(). So we can assume the target is "quiescent" and we can do, for example, UTRACE_DETACH safely. Is this correct? But utrace_report_jctl() doesn't look right to me, spin_lock(&utrace->lock); utrace->stopped = 0; utrace->report = 0; spin_unlock(&utrace->lock); I must admit, I dont't understand the comment above, but obviously this is right, we should clear ->stopped. If nothing else, REPORT()->start_report() won't be happy if ->stopped. But ->stopped can be restored right after we clear it! Yes, utrace_do_stop() and utrace_set_events() set ->stopped == 1 only if ->utrace_flags has no JCTL, and since we are here we must have JCTL. But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be already removed from ->utrace_flags, exactly because ->stopped was true. No? This leads to another minor question, how it is possible to enter enter utrace_report_jctl() with ->stopped == 1 ? I think the only possibility it was previously set by another call to utrace_report_jctl(), see below. REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); if (was_stopped && !task_is_stopped(task)) { /* * The event report hooks could have blocked, though * it should have been briefly. Make sure we're in * TASK_STOPPED state again to block properly, unless * we've just come back out of job control stop. */ Yes. Even a plain kmalloc() can change ->state to TASK_RUNNING, spin_lock_irq(&task->sighand->siglock); if (task->signal->flags & SIGNAL_STOP_STOPPED) __set_current_state(TASK_STOPPED); SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in progress and it is not finished yet. But ->group_stop_count is not reliable too. It it possible that we recieved SIGCONT and then another SIGSTOP. If another thread has already dequeued this SIGSTOP and initiated the new group stop, we can't just set TASK_STOPPED, we must participate in the ->group_stop_count accounting. if (task_is_stopped(current)) { /* * While in TASK_STOPPED, we can be considered safely * stopped by utrace_do_stop() only once we set this. */ spin_lock(&utrace->lock); utrace->stopped = 1; spin_unlock(&utrace->lock); I think this is correct, but it is not easy to understand. SIGCONT may come right after the task_is_stopped() check, so this _looks_ racy. But, nobody should clear ->utrace_flags without calling utrace_wakeup() which clears ->stopped too. This means that the target can't escape from get_signal_to_deliver() with the ->stopped == 1. And in fact, we could check was_stopped instead of task_is_stopped(). Is my understanding correct? But! can't we miss utrace_wakeup() ? Let's suppose the debugger D attaches the single engine E to the target T. D does utrace_set_events(JCTL). T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts utrace_report_jctl(). D does utrace_set_events(events => 0), this clears E->flags. T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers and calls utrace_reset(). It sets ->utrace_flags = 0. T checks task_is_stopped(), sets ->stopped = 1. Now, when T is woken by SIGCONT, it returns to user-space bypassing all utrace hooks, and runs with ->stopped == 1. This doesn't look right. Say, D can do utrace_set_events(ANY) and then T hits start_report()->BUG_ON(utrace->stopped). Could you clarify? Oleg. From oleg at redhat.com Thu Mar 12 00:15:21 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 01:15:21 +0100 Subject: Q: ->attaching && REPORT_CALLBACKS() In-Reply-To: <20090311001137.2D625FC3B6@magilla.sf.frob.com> References: <20090310212247.GA12258@redhat.com> <20090311001137.2D625FC3B6@magilla.sf.frob.com> Message-ID: <20090312001521.GA16303@redhat.com> On 03/10, Roland McGrath wrote: > > > The comment: > > > > * When target == current, it would be safe just to call > > * splice_attaching() right here. But if we're inside a > > * callback, > > > > just to clarify, "inside a callback" means inside utrace_report_xxx(), > > not only inside utrace_engine_ops->report_xxx(), right? > > Certainly what I mean when I say "a callback" is one of the functions whose > pointer lives in struct utrace_engine_ops. But I don't see how the > distinction you make could even be meaningful here. Yes, I wasn't clear. > A utrace_attach_task() > call "inside utrace_report_foo()" could only possibly mean one made by a > ->report_foo() function utrace_report_foo() calls, since obviously there > are no hard-wired utrace_attach_task() calls in utrace.c itself. But not vise versa. I misunderstood the comment as if the new engine should not be notified if it is attached by another task while target is inside callback. I was confused by "When target == current" part of the comment, please see below. > > This is not what the user wants. > > > > It it not clear to me why the user doesn't want this. > > Jim Keniston is the user who doesn't want this. > https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html Still can't understand... If (say) ->report_exec() attaches the new engine to the same task and does utrace_set_events(EXEC), then it looks logical the new engine gets the notification too. But OK, I agree, either way is correct, and perhaps the current behaviour is more intuitive. But this means that "When target == current it would be safe just to call splice_attaching() right here" part of the comment is not right, no? Except for report_reap() target == current. > > I understand this as follows. If we add the new engine to the ->attached > > list, and if the target is inside a callback, the target can later race > > with (say) utrace_set_events(). The target can see "engine->flags & event" > > and call start_callback/finish_callback before utrace_set_events() completes. > > It's not a race question. There are no guarantees for such races. (That > is, utrace_set_events() calls on target!=current make no guarantee about > reporting an event that might already have started. Only if the target was > already stopped by your engine when you made the call can you be sure that > no such event can be in progress.) > > The scenario we are talking about here is fully synchronous. The target > itself is inside a callback, calling utrace_set_events() on itself. Yes, yes, I see. But I meant another case. Suppose that the debugger D attaches to T and does engine = utrace_attach_task(T, ...); utrace_set_events(T, engine, XXX); It is possible that ->report_xxx() is called before utrace_set_events() completes. But afaics currently this is not a problem. > > Another question. In any case I don't understand why do we really need > > two lists. > > [... big snip ...] Thanks for your explanations! And, in any case, > This is an optimization to consider later on Yes, yes, sure. I didn't mean we should do this change right now even _if_ it is good, and I didn't mean I think it is necessary good ;) Oleg. From oleg at redhat.com Thu Mar 12 00:28:59 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 01:28:59 +0100 Subject: [PATCH] utrace_tracer_task: s/list_for_each_safe/list_for_each_entry In-Reply-To: <20090310215757.1D3BCFC3B6@magilla.sf.frob.com> References: <20090310182327.GA3826@redhat.com> <20090310215757.1D3BCFC3B6@magilla.sf.frob.com> Message-ID: <20090312002859.GA20725@redhat.com> utrace_tracer_task() can use list_for_each_entry() too. Signed-off-by: Oleg Nesterov --- xxx/kernel/utrace.c~TRACER_TASK 2009-03-12 01:18:38.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-12 01:21:05.000000000 +0100 @@ -2317,15 +2317,12 @@ EXPORT_SYMBOL_GPL(task_user_regset_view) */ struct task_struct *utrace_tracer_task(struct task_struct *target) { - struct list_head *pos, *next; struct utrace_engine *engine; const struct utrace_engine_ops *ops; struct task_struct *tracer = NULL; struct utrace *utrace = task_utrace_struct(target); - list_for_each_safe(pos, next, &utrace->attached) { - engine = list_entry(pos, struct utrace_engine, - entry); + list_for_each_entry(engine, &utrace->attached, entry) { ops = rcu_dereference(engine->ops); if (ops->tracer_task) { tracer = (*ops->tracer_task)(engine, target); From roland at redhat.com Thu Mar 12 05:12:46 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 11 Mar 2009 22:12:46 -0700 (PDT) Subject: Q: ->attaching && REPORT_CALLBACKS() In-Reply-To: Oleg Nesterov's message of Thursday, 12 March 2009 01:15:21 +0100 <20090312001521.GA16303@redhat.com> References: <20090310212247.GA12258@redhat.com> <20090311001137.2D625FC3B6@magilla.sf.frob.com> <20090312001521.GA16303@redhat.com> Message-ID: <20090312051246.B50E5FC3B6@magilla.sf.frob.com> > But not vise versa. I misunderstood the comment as if the new engine > should not be notified if it is attached by another task while target > is inside callback. That is indeed what happens in that case. But that one is not a specific "should not", it's just what happens to be true given what we say about the "asynchronous" attach case in general. That is, that an "asynchronous" attach + set_events makes no guarantees about how instantly you start to get event reports. It might be as long as the time it takes to get back to user mode from whereever the thread is now, or the time it takes it to process an interrupt and then get back to user mode. It's like you did "thread->events |= events" but there has not been any kind of memory barrier--it might see it or might not, until you do something affirmative to make sure (i.e. put it through UTRACE_STOP, or else get some other callback you're sure happens after your utrace_set_events call). For this purpose an "asynchronous attach" means one by a third task (not the thread itself or the creator during its report_clone), and done when that third task did not already have some engine that completed a UTRACE_STOP. This applies even if it is literally synchronous, i.e. if a callback arranged for the third task to do the attach and set_events and then blocked waiting for the third task to report its success, we'd call this an "asynchronous attach" because it didn't synchronize using UTRACE_STOP. > Still can't understand... If (say) ->report_exec() attaches the new > engine to the same task and does utrace_set_events(EXEC), then it looks > logical the new engine gets the notification too. But OK, I agree, either > way is correct, and perhaps the current behaviour is more intuitive. As you can see in the cited thread, that's what I thought too. Jim convinced me that the (new) current behavior is more useful. The most important thing to me is that it's clearly specified one way or the other for the synchronous case. It's obviously straightforward to do: report_exec(engine, ...) { new_engine = utrace_attach_task(current, &new_ops); utrace_set_events(new_engine, UTRACE_EVENT(EXEC)); new_ops.report_exec(new_engine, ...); } if you want one of your own callback functions to get another call there. OTOH, it's much more cumbersome to make the report_exec callback used by your new engine keep flags and whatnot to distinguish the first exec event that preceded that engine's setup from the next one (which is what the new engine is really there to respond to). Jim's use seems fairly representative of situations where this might come up. He's concerned with the EXEC event as the "old address space is gone, new one is here" event. It's also the "my name changed" event that may be triggering a new tracing setup. The former use just wants report_exec to do "wipe out our state and go away" stuff. The latter use might want to set up a new incarnation of that sort of tracing setup--a new engine whose report_exec callback does clean up. It's obvious how the new engine getting its "clean up now" callback immediately as a consequence of where the call to set it up came from is not helpful. I'm sure this sort of scenario will not be unique either to Jim's work or to EXEC callbacks in particular. > But this means that "When target == current it would be safe just to call > splice_attaching() right here" part of the comment is not right, no? > Except for report_reap() target == current. It would be "safe", meaning it doesn't have race problems like the target != current case does for touching ->attached here. That's what the comment says (and that's what the code used to do). The reason we don't do it (any more) is the explicit choice for API semantics, not any implementation reason (in the implementation it is indeed an obvious optimization if you are understanding the code). That's why the comment is there. > Yes, yes, I see. But I meant another case. Suppose that the debugger D > attaches to T and does > > engine = utrace_attach_task(T, ...); > utrace_set_events(T, engine, XXX); > > It is possible that ->report_xxx() is called before utrace_set_events() > completes. But afaics currently this is not a problem. As far as the API guarantees are concerned, there is no "completes". When you call utrace_set_events, it becomes possible your callbacks get made. The return value (a failure return, not -EINPROGRESS) can say that you are now sure no callback was made or will be. But when you called, you wanted it to be possible. If you didn't, then you should have made sure it was fully stopped via UTRACE_STOP before you called utrace_set_events. Thanks, Roland From roland at redhat.com Thu Mar 12 07:36:52 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 12 Mar 2009 00:36:52 -0700 (PDT) Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: Oleg Nesterov's message of Wednesday, 11 March 2009 23:24:01 +0100 <20090311222401.GA13512@redhat.com> References: <20090311222401.GA13512@redhat.com> Message-ID: <20090312073652.75811FC3B6@magilla.sf.frob.com> > I'd like to ask you to clarify what utrace->stopped means... I'm very glad you are looking into this area! > My understanding is: if we see ->stopped == true under utrace->lock, then > the target can do nothing "interesting" from the utrace's pov. The target > should take utrace->lock at least once. Either in finish_utrace_stop(), or, > if ->stopped was set by do_signal_stop() path, the target will call > tracehook_get_signal()->utrace_get_signal(). So we can assume the target > is "quiescent" and we can do, for example, UTRACE_DETACH safely. Correct. > But utrace_report_jctl() doesn't look right to me, > > spin_lock(&utrace->lock); > utrace->stopped = 0; > utrace->report = 0; > spin_unlock(&utrace->lock); > > I must admit, I dont't understand the comment above, but obviously this is > right, we should clear ->stopped. If nothing else, REPORT()->start_report() > won't be happy if ->stopped. The comment mentions "utrace being removed", which is a bit of old text referring to an indirect struct utrace. Aside from that, please tell me what is not clear about that comment. > But ->stopped can be restored right after we clear it! Yes, utrace_do_stop() > and utrace_set_events() set ->stopped == 1 only if ->utrace_flags has no JCTL, > and since we are here we must have JCTL. That's indeed the logic intended to prevent ->stopped being set again here. > But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be > already removed from ->utrace_flags, exactly because ->stopped was true. I don't follow this. JCTL is never "removed" from ->utrace_flags, except as all event bits are, by utrace_reset(). > This leads to another minor question, how it is possible to enter enter > utrace_report_jctl() with ->stopped == 1 ? I think the only possibility > it was previously set by another call to utrace_report_jctl(), see below. There are two ways to enter utrace_report_jctl with ->stopped set. 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then. Now utrace_report_jctl is called for the CLD_CONTINUED case, and ->stopped remains set. 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was already in TASK_STOPPED (and really stopped, or at least got past tracehook_notify_jctl before JCTL was set). It sets ->stopped before adding JCTL to ->utrace_flags, so that utrace_control() will consider the target stopped. > SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in > progress and it is not finished yet. SIGNAL_STOP_STOPPED should be reliable, as far as it goes. It will only be set if the group stop is complete. If then a SIGCONT+stop signal come, SIGCONT will clear SIGNAL_STOP_STOPPED before the stop signal starts another group stop. (We have no bad old PTRACE_CONT implementation to conflict with here.) > But ->group_stop_count is not reliable too. It it possible that we > recieved SIGCONT and then another SIGSTOP. If another thread has already > dequeued this SIGSTOP and initiated the new group stop, we can't just set > TASK_STOPPED, we must participate in the ->group_stop_count accounting. It's worse than that! If we came out of TASK_STOPPED, we did it implicitly and without holding the siglock. We participated in group_stop_count accounting for the first stop before we got here. If we stayed in TASK_STOPPED throughout the callbacks, then that bookkeeping is still correct. If the initiation of the new group stop happened while we were in TASK_STOPPED, we were omitted from the count but we should stop again. In that case we should stop either if SIGNAL_STOP_STOPPED is set or if group_stop_count > 0. Since we weren't counted, if group_stop_count==0 then SIGNAL_STOP_STOPPED will be set (again). If that initiation happened while a callback (e.g.) blocked in kmalloc or after (i.e. we were not in TASK_STOPPED), we were included in that count. In that case we need to decrement group_stop_count and stop again, but possibly also need to call do_notify_parent_cldstop again if it was 1. For that we'd do the right thing just by returning in TASK_RUNNING. We'll just come right back around in get_signal_to_deliver and handle group_stop_count normally. The trouble is that we have no way to distinguish these two cases, i.e. to know whether or not we were counted in group_stop_count. Am I missing a way? (The one piece of information we are not using is the @notify argument: it tells us whether we were the thread responsible for setting SIGNAL_STOP_STOPPED just before we got here. But I don't see how that helps.) I think the bottom line is that we can't ever allow any transition to or from TASK_STOPPED when we don't hold the siglock. Every such transition must hold that lock to manage group_stop_count and SIGNAL_STOP_STOPPED. That suggests we must preemptively go back to TASK_RUNNING before making the callbacks, just in case they would do the transition. We'd take the siglock and manage the bookkeeping. But I'm not sure yet how best to do that. I'm not sure if we can safely clear SIGNAL_STOP_STOPPED momentarily after it's been set. This all happens before do_notify_parent_cldstop is called, which avoids a whole can of worms about do_wait() I was starting to worry about. Hmm. Seems like there should be something we can do using group_stop_count and/or checking the SIGNAL_CLD_* bits to notice a SIGCONT having come in. > > if (task_is_stopped(current)) { > /* > * While in TASK_STOPPED, we can be considered safely > * stopped by utrace_do_stop() only once we set this. > */ > spin_lock(&utrace->lock); > utrace->stopped = 1; > spin_unlock(&utrace->lock); > > I think this is correct, but it is not easy to understand. SIGCONT may > come right after the task_is_stopped() check, so this _looks_ racy. Right, all that matters is that we are always on a path that goes back through utrace_get_signal() before doing anything else utrace thinks about. > But, nobody should clear ->utrace_flags without calling utrace_wakeup() > which clears ->stopped too. Right. > This means that the target can't escape > from get_signal_to_deliver() with the ->stopped == 1. Right, that is the core invariant of all ->stopped logic. > And in fact, we could check was_stopped instead of task_is_stopped(). Right. If we were resumed rather than actually stopping now, then ->stopped will be cleared shortly anyway. Since we have one test or the other here anyway, the fresh test is a free way to optimize out the lock and set when it happens to be that case. (Not that it matters to optimize this case, but it's free.) > Is my understanding correct? I think so. > But! can't we miss utrace_wakeup() ? I think you've found something (though not quite the scenario you describe). > Let's suppose the debugger D attaches the single engine E to the target T. > > D does utrace_set_events(JCTL). > > T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts > utrace_report_jctl(). > > D does utrace_set_events(events => 0), this clears E->flags. > > T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers > and calls utrace_reset(). It sets ->utrace_flags = 0. Nope: flags |= engine->flags | UTRACE_EVENT(REAP); If there are any engines left on the list, ->utrace_flags is never zero. So, change your scenario to: D does utrace_control(UTRACE_DETACH). and then this will happen. > T checks task_is_stopped(), sets ->stopped = 1. Right. In the utrace-indirect code, this was even worse! The dangling utrace pointer was invalid and should not have been used at all (it should have fetched the new one under RCU). > Now, when T is woken by SIGCONT, it returns to user-space bypassing all utrace > hooks, and runs with ->stopped == 1. This doesn't look right. Say, D can do > utrace_set_events(ANY) and then T hits start_report()->BUG_ON(utrace->stopped). Right. I think it's made safe with: if (task_is_stopped(task) && (task->utrace_flags & UTRACE_EVENT(JCTL))) { In fact, just task->utrace_flags != 0 would be safe. But only if JCTL is set do we actually need to set ->stopped here. (Otherwise, it will get set later by utrace_do_stop or utrace_set_events.) Thanks, Roland From renzo at cs.unibo.it Thu Mar 12 13:13:03 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Thu, 12 Mar 2009 14:13:03 +0100 Subject: [PATCH 1/2] UTRACE_STOP race condition (updated) Message-ID: <20090312131303.GA25801@cs.unibo.it> Dear Roland, dear utrace developers, I have updated my patch #1 (it solves the race condition on utrace_stop but not the nesting issue) for the latest version of utrace. I am trying to get the patches updated downloading, compiling and testing the fixes every week or so... Things would be easier if these patch could be merged in the mainstream ;-) renzo ---- diff -Naur linux-2.6.29-rc7-git5-utrace/kernel/utrace.c linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c --- linux-2.6.29-rc7-git5-utrace/kernel/utrace.c 2009-03-12 11:00:09.000000000 +0100 +++ linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 2009-03-12 11:05:50.000000000 +0100 @@ -376,6 +376,13 @@ return killed; } +static void mark_engine_wants_stop(struct utrace_engine *engine); +static void clear_engine_wants_stop(struct utrace_engine *engine); +static bool engine_wants_stop(struct utrace_engine *engine); +static void mark_engine_wants_resume(struct utrace_engine *engine); +static void clear_engine_wants_resume(struct utrace_engine *engine); +static bool engine_wants_resume(struct utrace_engine *engine); + /* * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. * @task == current, @utrace == current->utrace, which is not locked. @@ -385,6 +392,7 @@ static bool utrace_stop(struct task_struct *task, struct utrace *utrace) { bool killed; + struct utrace_engine *engine, *next; /* * @utrace->stopped is the flag that says we are safely @@ -406,7 +414,23 @@ return true; } - utrace->stopped = 1; + /* final check: it is really needed to stop? */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } + else + utrace->stopped = 1; + } + } + if (unlikely(!utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return false; + } + __set_current_state(TASK_TRACED); /* @@ -632,6 +656,7 @@ * to record whether the engine is keeping the target thread stopped. */ #define ENGINE_STOP (1UL << _UTRACE_NEVENTS) +#define ENGINE_RESUME (1UL << (_UTRACE_NEVENTS+1)) static void mark_engine_wants_stop(struct utrace_engine *engine) { @@ -648,6 +673,21 @@ return (engine->flags & ENGINE_STOP) != 0; } +static void mark_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags |= ENGINE_RESUME; +} + +static void clear_engine_wants_resume(struct utrace_engine *engine) +{ + engine->flags &= ~ENGINE_RESUME; +} + +static bool engine_wants_resume(struct utrace_engine *engine) +{ + return (engine->flags & ENGINE_RESUME) != 0; +} + /** * utrace_set_events - choose which event reports a tracing engine gets * @target: thread to affect @@ -906,6 +946,10 @@ list_move(&engine->entry, &detached); } else { flags |= engine->flags | UTRACE_EVENT(REAP); + if (engine_wants_resume(engine)) { + clear_engine_wants_stop(engine); + clear_engine_wants_resume(engine); + } wake = wake && !engine_wants_stop(engine); } } @@ -1133,6 +1177,7 @@ * There might not be another report before it just * resumes, so make sure single-step is not left set. */ + mark_engine_wants_resume(engine); if (likely(resume)) user_disable_single_step(target); break; From renzo at cs.unibo.it Thu Mar 12 13:13:30 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Thu, 12 Mar 2009 14:13:30 +0100 Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated) Message-ID: <20090312131330.GB25801@cs.unibo.it> Dear Roland, dear utrace developers, I have update also the second patch. Please note that now this patch must be applied after the first one. This patch implements a consistent nesting model for utrace machines. (There is a full description in the messages I sent on Feb. 14 and Mar. 6) renzo --- diff -Naur linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c --- linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 2009-03-12 11:05:50.000000000 +0100 +++ linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c 2009-03-12 13:37:27.000000000 +0100 @@ -1405,6 +1405,7 @@ static bool finish_callback(struct utrace *utrace, struct utrace_report *report, struct utrace_engine *engine, + struct task_struct *task, u32 ret) { enum utrace_resume_action action = utrace_resume_action(ret); @@ -1426,6 +1427,7 @@ spin_lock(&utrace->lock); mark_engine_wants_stop(engine); spin_unlock(&utrace->lock); + utrace_stop(task, utrace); } } else if (engine_wants_stop(engine)) { spin_lock(&utrace->lock); @@ -1492,7 +1494,7 @@ ops = engine->ops; if (want & UTRACE_EVENT(QUIESCE)) { - if (finish_callback(utrace, report, engine, + if (finish_callback(utrace, report, engine, task, (*ops->report_quiesce)(report->action, engine, task, event))) @@ -1526,24 +1528,24 @@ * @callback is the name of the member in the ops vector, and remaining * args are the extras it takes after the standard three args. */ -#define REPORT(task, utrace, report, event, callback, ...) \ +#define REPORT(reverse, task, utrace, report, event, callback, ...) \ do { \ start_report(utrace); \ - REPORT_CALLBACKS(task, utrace, report, event, callback, \ + REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, \ (report)->action, engine, current, \ ## __VA_ARGS__); \ finish_report(report, task, utrace); \ } while (0) -#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...) \ +#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...) \ do { \ struct utrace_engine *engine; \ const struct utrace_engine_ops *ops; \ - list_for_each_entry(engine, &utrace->attached, entry) { \ + list_for_each_entry ## reverse(engine, &utrace->attached, entry) { \ ops = start_callback(utrace, report, engine, task, \ event); \ if (!ops) \ continue; \ - finish_callback(utrace, report, engine, \ + finish_callback(utrace, report, engine, task, \ (*ops->callback)(__VA_ARGS__)); \ } \ } while (0) @@ -1558,7 +1560,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(EXEC), + REPORT(, task, utrace, &report, UTRACE_EVENT(EXEC), report_exec, fmt, bprm, regs); } @@ -1573,7 +1575,7 @@ INIT_REPORT(report); start_report(utrace); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), + REPORT_CALLBACKS(_reverse, task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), report_syscall_entry, report.result | report.action, engine, current, regs); finish_report(&report, task, utrace); @@ -1615,7 +1617,7 @@ struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), + REPORT(, task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), report_syscall_exit, regs); } @@ -1640,7 +1642,7 @@ start_report(utrace); utrace->cloning = child; - REPORT_CALLBACKS(task, utrace, &report, + REPORT_CALLBACKS(, task, utrace, &report, UTRACE_EVENT(CLONE), report_clone, report.action, engine, task, clone_flags, child); @@ -1708,7 +1710,7 @@ utrace->report = 0; spin_unlock(&utrace->lock); - REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), + REPORT(, task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); if (was_stopped && !task_is_stopped(task)) { @@ -1745,7 +1747,7 @@ INIT_REPORT(report); long orig_code = *exit_code; - REPORT(task, utrace, &report, UTRACE_EVENT(EXIT), + REPORT(, task, utrace, &report, UTRACE_EVENT(EXIT), report_exit, orig_code, exit_code); if (report.action == UTRACE_STOP) @@ -1784,7 +1786,7 @@ utrace->interrupt = 0; spin_unlock(&utrace->lock); - REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH), + REPORT_CALLBACKS(, task, utrace, &report, UTRACE_EVENT(DEATH), report_death, engine, task, group_dead, signal); spin_lock(&utrace->lock); @@ -2129,7 +2131,7 @@ break; } - finish_callback(utrace, &report, engine, ret); + finish_callback(utrace, &report, engine, task, ret); } /* From oleg at redhat.com Thu Mar 12 17:21:28 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 18:21:28 +0100 Subject: [PATCH 1/2] UTRACE_STOP race condition (updated) In-Reply-To: <20090312131303.GA25801@cs.unibo.it> References: <20090312131303.GA25801@cs.unibo.it> Message-ID: <20090312172128.GA26657@redhat.com> Hi Renzo, This patch needs Roland's review, but I'd like to participate... On 03/12, Renzo Davoli wrote: > > I have updated my patch #1 (it solves the race condition on utrace_stop but > not the nesting issue) for the latest version of utrace. > > I am trying to get the patches updated downloading, compiling and testing > the fixes every week or so... > Things would be easier if these patch could be merged in the mainstream ;-) I think it would be better if you describe the problem in the changelog. It is not convenient to dig the archives to understand which problem this patch fixes. Can't really comment this change because I don't understand what is the supposed behaviour of utrace_control(UTRACE_RESUME). Perhaps the caller should wait until the target is stopped? The comment says: case UTRACE_RESUME: * This and all other cases imply resuming if stopped. it doesn't explain what should we do if it is not stopped yet. > static bool utrace_stop(struct task_struct *task, struct utrace *utrace) > { > bool killed; > + struct utrace_engine *engine, *next; > > /* > * @utrace->stopped is the flag that says we are safely > @@ -406,7 +414,23 @@ > return true; > } > > - utrace->stopped = 1; > + /* final check: it is really needed to stop? */ > + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { I think we can do this earlier, before taking ->siglock > + if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { Do we need "!= &utrace_detached_ops" check? mark_engine_detached() removes ENGINE_STOP from ->flags. > + if (engine_wants_resume(engine)) { > + clear_engine_wants_stop(engine); > + clear_engine_wants_resume(engine); > + } I'm afraid _wants_resume() adds another problem. Let's suppose we do utrace_control(UTRACE_RESUME); utrace_control(UTRACE_STOP); UTRACE_STOP doesn't do clear_engine_wants_resume(), so it can be lost. And. Let's suppose we call utrace_control(UTRACE_RESUME), and later report_xxx() returns UTRACE_STOP. Again, this stop request can be lost. This doesn't look consistent. Do we really need _wants_resume()? Note that utrace_control(UTRACE_RESUME) does clear_engine_wants_stop(). Yes, we can race with finish_callback() in case when ->report_xxx() returns UTRACE_STOP. But, perhaps, in that case the caller of utrace_control(UTRACE_RESUME) should take care about the synchronization with its own callbacks? Something like: make_sure_my_callback_wont_return_UTRACE_STOP(); utrace_barrier(); utrace_control(UTRACE_RESUME); This way utrace_stop() can just check engine_wants_stop(). Oleg. From oleg at redhat.com Thu Mar 12 17:35:32 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 18:35:32 +0100 Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated) In-Reply-To: <20090312131330.GB25801@cs.unibo.it> References: <20090312131330.GB25801@cs.unibo.it> Message-ID: <20090312173532.GB26657@redhat.com> On 03/12, Renzo Davoli wrote: > > I have update also the second patch. Please note that now this patch > must be applied after the first one. > This patch implements a consistent nesting model for utrace machines. > (There is a full description in the messages I sent on Feb. 14 and Mar. 6) This patch does 2 completely different things. I think you should make separate patches. Again, we need Roland's opinion, but could you explain why it would be better to use _reverse in utrace_report_syscall_entry() ? As for another change, > --- linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c 2009-03-12 11:05:50.000000000 +0100 > +++ linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c 2009-03-12 13:37:27.000000000 +0100 > @@ -1405,6 +1405,7 @@ > static bool finish_callback(struct utrace *utrace, > struct utrace_report *report, > struct utrace_engine *engine, > + struct task_struct *task, > u32 ret) > { > enum utrace_resume_action action = utrace_resume_action(ret); > @@ -1426,6 +1427,7 @@ > spin_lock(&utrace->lock); > mark_engine_wants_stop(engine); > spin_unlock(&utrace->lock); > + utrace_stop(task, utrace); I don't think this is safe. If we do utrace_stop() here, the next engine can be detached before we return (UTRACE_DETACH assumes it it safe to unlink the engine when the target is stopped). This means we can't continue list_for_each_entry(engine, &utrace->attached, entry) after return from finish_callback(). Oleg. From oleg at redhat.com Thu Mar 12 19:07:38 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 20:07:38 +0100 Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: <20090312073652.75811FC3B6@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> Message-ID: <20090312190738.GA3529@redhat.com> Roland, I left some parts of your message unanswered because I need to think more about them... On 03/12, Roland McGrath wrote: > > > But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be > > already removed from ->utrace_flags, exactly because ->stopped was true. > > I don't follow this. JCTL is never "removed" from ->utrace_flags, except > as all event bits are, by utrace_reset(). Yep. And utrace_reset() can be called because ->stopped == 1. Let me explain. Again, let's suppose D attaches engine E to the target T. T enters utrace_report_jctl() with ->stopped == 1. D calls utrace_set_events(events => 0), this removes JCTL from E->flags. D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this calls utrace_reset() and removes JCTL from T->utrace_flags. T takes utrace->lock, clears ->stopped, and drops the lock. D does utrace_control(UTRACE_STOP). This calls utrace_do_stop() which sees task_is_stopped() && !JCTL, so it sets ->stopped = true. T calls REPORT() and start_report() hits the (correct) BUG_ON(stopped). No? > > This leads to another minor question, how it is possible to enter enter > > utrace_report_jctl() with ->stopped == 1 ? I think the only possibility > > it was previously set by another call to utrace_report_jctl(), see below. > > There are two ways to enter utrace_report_jctl with ->stopped set. > > 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then. > Now utrace_report_jctl is called for the CLD_CONTINUED case, and > ->stopped remains set. this is covered by my guess above, > 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was > already in TASK_STOPPED (and really stopped, or at least got past > tracehook_notify_jctl before JCTL was set). It sets ->stopped before > adding JCTL to ->utrace_flags, Yes, thanks. I missed this. > > SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in > > progress and it is not finished yet. > > SIGNAL_STOP_STOPPED should be reliable, as far as it goes. It will only be > set if the group stop is complete. Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set? This doesn't mean we don't need __set_current_state(TASK_STOPPED), it is possible that the group-stop is in progress and ->group_stop_count != 0. > > But! can't we miss utrace_wakeup() ? > > I think you've found something (though not quite the scenario you describe). > > > Let's suppose the debugger D attaches the single engine E to the target T. > > > > D does utrace_set_events(JCTL). > > > > T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts > > utrace_report_jctl(). > > > > D does utrace_set_events(events => 0), this clears E->flags. > > > > T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers > > and calls utrace_reset(). It sets ->utrace_flags = 0. > > Nope: > > flags |= engine->flags | UTRACE_EVENT(REAP); Ah, thanks. Can't understand how I didn't notice this, I checked the code several times ;) But as you pointed out, > So, change your scenario to: > > D does utrace_control(UTRACE_DETACH). > > and then this will happen. Yes. Oleg. From oleg at redhat.com Thu Mar 12 19:50:21 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 20:50:21 +0100 Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: <20090312073652.75811FC3B6@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> Message-ID: <20090312195021.GB3529@redhat.com> On 03/12, Roland McGrath wrote: > > > T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers > > and calls utrace_reset(). It sets ->utrace_flags = 0. > > Nope: > > flags |= engine->flags | UTRACE_EVENT(REAP); Hmm. But this leads to another question: why does utrace_reset() set UTRACE_EVENT(REAP) ? This looks as: make sure ->utrace_flags is never 0 unless we detach all engines. Perhaps because sometimes, say tracehook_notify_resume(), we just check task_utrace_flags() != 0 ? Imho, this needs a comment. Or I missed something obvious. Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME. But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ? Confused. Oleg. From oleg at redhat.com Thu Mar 12 20:36:09 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 21:36:09 +0100 Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: <20090312195021.GB3529@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> Message-ID: <20090312203609.GC3529@redhat.com> I'm afraid I wasn't clear again, On 03/12, Oleg Nesterov wrote: > > Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME. > But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ? > Confused. Perhaps this is not problem per se. But let's suppose we call, say, utrace_control(UTRACE_STOP) later. utrace_do_stop() sees ->report == 1 and doesn't call set_notify_resume(). But TIF_NOTIFY_RESUME was already cleared by do_notify_resume(). And again, utrace_control(UTRACE_STOP) does not set ->utrace_flags != 0 itself. But even if we called utrace_set_events(XXX) before, without set_notify_resume() we have to wait for that XXX event, this doesn't look right. Oleg. From oleg at redhat.com Thu Mar 12 21:40:37 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 12 Mar 2009 22:40:37 +0100 Subject: Q: ->attaching && REPORT_CALLBACKS() In-Reply-To: <20090312051246.B50E5FC3B6@magilla.sf.frob.com> References: <20090310212247.GA12258@redhat.com> <20090311001137.2D625FC3B6@magilla.sf.frob.com> <20090312001521.GA16303@redhat.com> <20090312051246.B50E5FC3B6@magilla.sf.frob.com> Message-ID: <20090312214037.GA10462@redhat.com> On 03/11, Roland McGrath wrote: > > > But not vise versa. I misunderstood the comment as if the new engine > > should not be notified if it is attached by another task while target > > is inside callback. > > That is indeed what happens in that case. But that one is not a > specific "should not", it's just what happens to be true given what we > say about the "asynchronous" attach case in general. That is, that an > "asynchronous" attach + set_events makes no guarantees about how > instantly you start to get event reports. Yes, yes, I understand. In short: I greatly misinterpreted the comment. > > Still can't understand... If (say) ->report_exec() attaches the new > > engine to the same task and does utrace_set_events(EXEC), then it looks > > logical the new engine gets the notification too. But OK, I agree, either > > way is correct, and perhaps the current behaviour is more intuitive. > > As you can see in the cited thread, that's what I thought too. Jim > convinced me that the (new) current behavior is more useful. > ... > Jim's use seems fairly representative of situations where this might > come up. He's concerned with the EXEC event as the "old address space > is gone, new one is here" event. It's also the "my name changed" > event that may be triggering a new tracing setup. Aha, thanks! > > But this means that "When target == current it would be safe just to call > > splice_attaching() right here" part of the comment is not right, no? > > Except for report_reap() target == current. > > It would be "safe", meaning it doesn't have race problems like the > target != current case does for touching ->attached here. Yes, I see. Again, I confused "safe" with "not what the user wants". > > Yes, yes, I see. But I meant another case. Suppose that the debugger D > > attaches to T and does > > > > engine = utrace_attach_task(T, ...); > > utrace_set_events(T, engine, XXX); > > > > It is possible that ->report_xxx() is called before utrace_set_events() > > completes. But afaics currently this is not a problem. > > As far as the API guarantees are concerned, there is no "completes". > When you call utrace_set_events, it becomes possible your callbacks > get made. Yes sure. Thanks! Oleg. From roland at redhat.com Thu Mar 12 22:40:55 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 12 Mar 2009 15:40:55 -0700 (PDT) Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: Oleg Nesterov's message of Thursday, 12 March 2009 20:07:38 +0100 <20090312190738.GA3529@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> Message-ID: <20090312224055.BA71CFC3B6@magilla.sf.frob.com> > Yep. And utrace_reset() can be called because ->stopped == 1. Right. > Let me explain. Again, let's suppose D attaches engine E to the target T. > > T enters utrace_report_jctl() with ->stopped == 1. > > D calls utrace_set_events(events => 0), this removes JCTL from E->flags. > > D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this > calls utrace_reset() and removes JCTL from T->utrace_flags. Right. In the utrace-indirect code this would have reset the utrace pointer too. > T takes utrace->lock, clears ->stopped, and drops the lock. In the utrace-indirect code, this part would have been harmless even in the race case where it happened (the more likely case being that task->utrace was cleared already before utrace_report_jctl looked at it). (That code just had the dangling utrace pointer issue I noticed yesterday, at the end of the function.) But, yes, this is a problem. I think this ought to cover it: @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what) * longer considered stopped while we run callbacks. */ spin_lock(&utrace->lock); + /* + * Now that we have the lock, check in case utrace_reset() has + * just now cleared UTRACE_EVENT(JCTL) while it considered us + * safely stopped. In that case, we should not touch ->stopped + * and have nothing else to do. + */ + if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) { + spin_unlock(&utrace->lock); + return; + } utrace->stopped = 0; utrace->report = 0; spin_unlock(&utrace->lock); > > 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was > > already in TASK_STOPPED (and really stopped, or at least got past > > tracehook_notify_jctl before JCTL was set). It sets ->stopped before > > adding JCTL to ->utrace_flags, > > Yes, thanks. I missed this. I feel I should also point out the case where exit_signals() calls tracehook_notify_jctl, because I just noticed it. I don't think that path existed the last time I thought seriously about the utrace_report_jctl logic. (This is not a #3 in that list, but in general is another path we need to keep in mind here.) > Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set? > This doesn't mean we don't need __set_current_state(TASK_STOPPED), it is > possible that the group-stop is in progress and ->group_stop_count != 0. Right. Thanks, Roland From roland at redhat.com Thu Mar 12 23:16:07 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 12 Mar 2009 16:16:07 -0700 (PDT) Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: Oleg Nesterov's message of Thursday, 12 March 2009 20:50:21 +0100 <20090312195021.GB3529@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> Message-ID: <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> > Hmm. But this leads to another question: why does utrace_reset() set > UTRACE_EVENT(REAP) ? > > This looks as: make sure ->utrace_flags is never 0 unless we detach > all engines. Perhaps because sometimes, say tracehook_notify_resume(), > we just check task_utrace_flags() != 0 ? Right, it's an invariant that utrace_flags != 0 if there is any utrace stuff to do. It just fits logically too. The utrace_flags bits mean "need to call into utrace", so UTRACE_EVENT(REAP) means that we need to call utrace_release_task. > Imho, this needs a comment. Or I missed something obvious. Sure, better comments are always good. How's this? @@ -899,6 +899,10 @@ static void utrace_reset(struct task_struct *task, struct utrace *utrace, * of the interests of the remaining tracing engines. * For any engine marked detached, remove it from the list. * We'll collect them on the detached list. + * + * Any engine that's not detached implies tracking the REAP event, + * whether or not that engine wants a report_reap callback. Any + * engine requires attention from utrace_release_task(). */ list_for_each_entry_safe(engine, next, &utrace->attached, entry) { if (engine->ops == &utrace_detached_ops) { > Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME. > But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ? The logic (in the utrace_add_engine comment) is to have ->report just to make sure splice_attaching() precedes the next reporting pass (start_report). It doesn't actually care about TIF_NOTIFY_RESUME (i.e. how soon the report happens), but just wants to keep the invariant that ->report matches TIF_NOTIFY_RESUME. But as you point out, this invariant will be violated later if tracehook_notify_resume() sees ->utrace_flags == 0. > Perhaps this is not problem per se. But let's suppose we call, say, > utrace_control(UTRACE_STOP) later. utrace_do_stop() sees ->report == 1 > and doesn't call set_notify_resume(). But TIF_NOTIFY_RESUME was already > cleared by do_notify_resume(). Right. So I think we need this: @@ -181,7 +181,13 @@ static int utrace_add_engine(struct task_struct *target, * also set. Otherwise utrace_control() or utrace_do_stop() * might skip setting TIF_NOTIFY_RESUME upon seeing ->report * already set, and we'd miss a necessary callback. + * + * In case we had no engines before, make sure that + * utrace_flags is not zero when tracehook_notify_resume() + * checks. That would bypass utrace reporting clearing + * TIF_NOTIFY_RESUME, and thus violate the same invariant. */ + target->utrace_flags |= UTRACE_EVENT(REAP); list_add_tail(&engine->entry, &utrace->attaching); utrace->report = 1; set_notify_resume(target); Does that need a barrier pair here and in tracehook_notify_resume()? Thanks, Roland From renzo at cs.unibo.it Fri Mar 13 06:36:17 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Fri, 13 Mar 2009 07:36:17 +0100 Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated) In-Reply-To: <20090312173532.GB26657@redhat.com> References: <20090312131330.GB25801@cs.unibo.it> <20090312173532.GB26657@redhat.com> Message-ID: <20090313063616.GA11403@cs.unibo.it> > Again, we need Roland's opinion, but could you explain why it would > be better to use _reverse in utrace_report_syscall_entry() ? I refer to this posting: http://www.mail-archive.com/utrace-devel at redhat.com/msg00579.html Item #4 explains why it is *needed* to reverse the order in utrace_report_syscall_entry to have a consistent implementation of nested virtualization. > I don't think this is safe. If we do utrace_stop() here, the next engine > can be detached before we return (UTRACE_DETACH assumes it it safe to > unlink the engine when the target is stopped). This means we can't > continue list_for_each_entry(engine, &utrace->attached, entry) after > return from finish_callback(). Maybe this is not the best patch, maybe we can solve the problem in a better way. The point is explained in #3 in the same posting cited above. When a report function of an engine returns UTRACE_STOP, it means (may mean) that it wants to change the status of the process before resuming it. VM monitors often change the status, sometimes debugger users want to set some variables too. IMHO, utrace should stop it *before* calling the report function of the next engine, otherwise we need to set up another structure to synchronize the engines (that may even be unknown one to the other). If there is a tracer/debugger among the engines, it is not even possible to know which snapshot it gets, after or before the modification created by the VM monitor? With these patches it is possible to run nested virtual machines based on utrace, it is also possbile to strace (use ptrace) on processes running inside a VM. renzo From oleg at redhat.com Fri Mar 13 21:59:12 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 13 Mar 2009 22:59:12 +0100 Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> Message-ID: <20090313215912.GA1856@redhat.com> On 03/12, Roland McGrath wrote: > > So I think we need this: > > @@ -181,7 +181,13 @@ static int utrace_add_engine(struct task_struct *target, > * also set. Otherwise utrace_control() or utrace_do_stop() > * might skip setting TIF_NOTIFY_RESUME upon seeing ->report > * already set, and we'd miss a necessary callback. > + * > + * In case we had no engines before, make sure that > + * utrace_flags is not zero when tracehook_notify_resume() > + * checks. That would bypass utrace reporting clearing > + * TIF_NOTIFY_RESUME, and thus violate the same invariant. > */ > + target->utrace_flags |= UTRACE_EVENT(REAP); > list_add_tail(&engine->entry, &utrace->attaching); > utrace->report = 1; > set_notify_resume(target); Agreed. > Does that need a barrier pair here and in No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(), > tracehook_notify_resume()? Ah. I think you are right, and I think it needs the barrier even without this change. Say, UTRACE_REPORT does: utrace->report = 1; set_notify_resume(); Without mb() there is no guarantee that utrace_resume() will notice and clear ->report. smp_mb__after_clear_bit() is enough, but in that case perhaps it is better to modify the arch dependent do_notify_resume(). A couple of minor nits, but please remember I often misread the comments. > Sure, better comments are always good. How's this? > > @@ -899,6 +899,10 @@ static void utrace_reset(struct task_struct *task, struct utrace *utrace, > * of the interests of the remaining tracing engines. > * For any engine marked detached, remove it from the list. > * We'll collect them on the detached list. > + * > + * Any engine that's not detached implies tracking the REAP event, > + * whether or not that engine wants a report_reap callback. Any > + * engine requires attention from utrace_release_task(). > */ > list_for_each_entry_safe(engine, next, &utrace->attached, entry) { This looks misleading, utrace_release_task() is called unconditionally, and we could use any unused bit afacis (REAP only makes sense for engine->flags, we never check ->utrace_flags & REAP). Also, whatever reason we have to keep ->utrace_flags != 0, the same reason applies to ->utrace_flags |= XXX in utrace_add_engine(). utrace_reset() also does if (task->exit_state) { flags &= DEAD_FLAGS_MASK; The comment about DEAD_FLAGS_MASK /* * Only these flags matter any more for a dead task (exit_state set). * We use this mask on flags installed in ->utrace_flags after * exit_notify (and possibly utrace_report_death) has run. Looks a bit confusing to me. Unless exit_notify() calls utrace_report_death() we don't change ->utrace_flags. * This ensures that utrace_release_task knows positively that * utrace_report_death will not run later. */ Yes. But this means we could do "flags &= ~DEATH_EVENTS" instead. This is subjective of course, but looks more clean to me. Note also that utrace_reset() is the only user of DEAD_FLAGS_MASK and LIVE_FLAGS_MASK has no users. Also, it would be better imho to change tracehook_report_death() to use DEATH_EVENTS too, it is always good when grep can find the usage. Oleg. From oleg at redhat.com Fri Mar 13 23:33:00 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Sat, 14 Mar 2009 00:33:00 +0100 Subject: utrace_set_events/utrace_control && death/reap checks Message-ID: <20090313233300.GA14605@redhat.com> utrace_set_events: (utrace->death && ((old_flags & ~events) & DEATH_EVENTS)) "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to clear DEATH/QUIESCE. Why this is not allowed? And why this is not allowed _only_ when the target runs utrace_report_death()->REPORT()? I think this line can be just killed. I guess the intent was to prevent utrace_release_task() from doing utrace_reap() in parallel with utrace_report_death(), but note that utrace_set_events() can never "shrinks" ->utrace_flags, it only sets new bits. The next line looks strange too, don't we need (utrace->reap && ((events & ~old_flags) & UTRACE_EVENT(REAP))) ? And I don't understand why do we need utrace->death at all. Apart from utrace_set_events (which I think doesn't need it), it is only used by utrace_control(UTRACE_DETACH). But I can't see how can we race with utrace_report_death(). If it can be called, we have DEATH_EVENTS bits set. But in that case utrace_do_stop() can't succeed, so UTRACE_DETACH can only do mark_engine_wants_stop() but not utrace_reset(). IOW, could you explain why the patch below is wrong? (and why can't we kill ->death then). Oleg. --- kernel/utrace.c +++ kernel/utrace.c @@ -1072,27 +1072,10 @@ int utrace_control(struct task_struct *t /* * You can't do anything to a dead task but detach it. * If release_task() has been called, you can't do that. - * - * On the exit path, DEATH and QUIESCE event bits are - * set only before utrace_report_death() has taken the - * lock. At that point, the death report will come - * soon, so disallow detach until it's done. This - * prevents us from racing with it detaching itself. */ - if (action != UTRACE_DETACH || - unlikely(utrace->reap)) { + if (action != UTRACE_DETACH || unlikely(utrace->reap)) { spin_unlock(&utrace->lock); return -ESRCH; - } else if (unlikely(target->utrace_flags & DEATH_EVENTS) || - unlikely(utrace->death)) { - /* - * We have already started the death report, or - * are about to very soon. We can't prevent - * the report_death and report_reap callbacks, - * so tell the caller they will happen. - */ - spin_unlock(&utrace->lock); - return -EALREADY; } } From oleg at redhat.com Sat Mar 14 00:14:20 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Sat, 14 Mar 2009 01:14:20 +0100 Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: <20090312224055.BA71CFC3B6@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> Message-ID: <20090314001420.GA15677@redhat.com> On 03/12, Roland McGrath wrote: > > > Yep. And utrace_reset() can be called because ->stopped == 1. > > Right. > > > Let me explain. Again, let's suppose D attaches engine E to the target T. > > > > T enters utrace_report_jctl() with ->stopped == 1. > > > > D calls utrace_set_events(events => 0), this removes JCTL from E->flags. > > > > D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this > > calls utrace_reset() and removes JCTL from T->utrace_flags. > > Right. In the utrace-indirect code this would have reset the utrace > pointer too. > > > T takes utrace->lock, clears ->stopped, and drops the lock. > > In the utrace-indirect code, this part would have been harmless even in the > race case where it happened (the more likely case being that task->utrace > was cleared already before utrace_report_jctl looked at it). (That code > just had the dangling utrace pointer issue I noticed yesterday, at the end > of the function.) > > But, yes, this is a problem. I think this ought to cover it: > > @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what) > * longer considered stopped while we run callbacks. > */ > spin_lock(&utrace->lock); > + /* > + * Now that we have the lock, check in case utrace_reset() has > + * just now cleared UTRACE_EVENT(JCTL) while it considered us > + * safely stopped. In that case, we should not touch ->stopped > + * and have nothing else to do. > + */ > + if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) { > + spin_unlock(&utrace->lock); > + return; I don't think this can help, even if we clear ->stopped before return. It is still possible to set ->stopped after that, and since we don't have JCTL we return from get_signal_to_deliver() bypassing tracehook calls. >From the previous message: > > That suggests we must preemptively go back to TASK_RUNNING before making > the callbacks, just in case they would do the transition. > ... I thought about this too. But this not easy and not nice. Roland, I _seem_ to have the vague idea, will return tomorrow. Oleg. From grenadier at edanddons.com Sun Mar 15 01:32:47 2009 From: grenadier at edanddons.com (Bleyer Pasche) Date: Sun, 15 Mar 2009 01:32:47 +0000 Subject: prolonged erecction Message-ID: <9821636804.20090315012912@edanddons.com> PProlonged erection Milk, she (word without utterance) yields diverse festivities on the hooglychapter xxxvii. The farewell of my life, i think happy and content. O my love, even the sweat from his brow, he rises up again together on the fourteenth day of the dark fortnight.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruza277 at inet.hr Mon Mar 16 08:14:36 2009 From: ruza277 at inet.hr (Joey Hale) Date: Mon, 16 Mar 2009 11:14:36 +0300 Subject: Every man can have manhood problems. Clever men know how to solve it once and for all. Message-ID: <20090316111436.8060000@inet.hr> She will love the cnages that blue pilule made with you. http://gfa.quietyoung.com/ From oleg at redhat.com Sun Mar 15 22:33:00 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Sun, 15 Mar 2009 23:33:00 +0100 Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: <20090314001420.GA15677@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> Message-ID: <20090315223300.GA10526@redhat.com> On 03/14, Oleg Nesterov wrote: > > On 03/12, Roland McGrath wrote: > > > > But, yes, this is a problem. I think this ought to cover it: > > > > @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what) > > * longer considered stopped while we run callbacks. > > */ > > spin_lock(&utrace->lock); > > + /* > > + * Now that we have the lock, check in case utrace_reset() has > > + * just now cleared UTRACE_EVENT(JCTL) while it considered us > > + * safely stopped. In that case, we should not touch ->stopped > > + * and have nothing else to do. > > + */ > > + if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) { > > + spin_unlock(&utrace->lock); > > + return; > > I don't think this can help, even if we clear ->stopped before return. > It is still possible to set ->stopped after that, and since we don't > have JCTL we return from get_signal_to_deliver() bypassing tracehook > calls. I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL. OK, let's look at utrace_do_stop: if (task_is_stopped(target) && !(target->utrace_flags & UTRACE_EVENT(JCTL))) { utrace->stopped = 1; return true; } This doesn't look correct. We don't hold ->siglock, the task can be SIGCONT'ed and return from get_signal_to_deliver(), and then we set ->stopped. Or I missed something again? Then we re-do this (well, almost) check under ->siglock, } else if (task_is_stopped(target)) { if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) utrace->stopped = stopped = true; } But this is not nice. Let's suppose the task is already stopped, we do UTRACE_ATTACH + utrace_set_events(JCTL). Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't even set ->report. Yes, we can't set ->stopped if JCTL, we can race with utrace_report_jctl() which does REPORT(). BTW, afaics utrace_report_jctl() has another bug, REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); I think it should do REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, what, notify); instead. > Roland, I _seem_ to have the vague idea, will return tomorrow. Well, this idea is not very nice. But see the draft patches below. With the first patch, we call utrace_report_jctl() before we actually stop. do_signal_stop() can fail then, but I think this is OK, we can pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, and with this patch we always call ->report_jctl with notify == 0. Just for discussion. --- xxx/include/linux/utrace.h~JCTL 2009-03-03 20:43:43.000000000 +0100 +++ xxx/include/linux/utrace.h 2009-03-15 21:55:45.000000000 +0100 @@ -102,7 +102,7 @@ void utrace_report_exit(long *exit_code) __attribute__((weak)); void utrace_report_death(struct task_struct *, struct utrace *, bool, int) __attribute__((weak)); -void utrace_report_jctl(int notify, int type) +bool utrace_report_jctl(bool sig_locked, int what) __attribute__((weak)); void utrace_report_exec(struct linux_binfmt *, struct linux_binprm *, struct pt_regs *regs) --- xxx/include/linux/tracehook.h~JCTL 2009-03-03 20:40:57.000000000 +0100 +++ xxx/include/linux/tracehook.h 2009-03-15 22:02:05.000000000 +0100 @@ -521,11 +521,11 @@ static inline int tracehook_get_signal(s * * Called with no locks held. */ -static inline int tracehook_notify_jctl(int notify, int why) +static inline bool tracehook_notify_jctl(bool sig_locked, int why) { if (task_utrace_flags(current) & UTRACE_EVENT(JCTL)) - utrace_report_jctl(notify, why); - return notify || (current->ptrace & PT_PTRACED); + return utrace_report_jctl(sig_locked, why); + return true; } #define DEATH_REAP -1 --- xxx/kernel/utrace.c~JCTL 2009-03-12 01:21:05.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-15 22:59:36.000000000 +0100 @@ -1637,12 +1637,14 @@ void utrace_finish_vfork(struct task_str /* * Called iff UTRACE_EVENT(JCTL) flag is set. */ -void utrace_report_jctl(int notify, int what) +bool utrace_report_jctl(bool sig_locked, int what) { struct task_struct *task = current; struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - bool was_stopped = task_is_stopped(task); + + if (sig_locked) + spin_unlock_irq(&task->sighand->siglock); /* * We get here with CLD_STOPPED when we've just entered @@ -1664,30 +1662,12 @@ void utrace_report_jctl(int notify, int spin_unlock(&utrace->lock); REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), - report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); + report_jctl, what, 0); - if (was_stopped && !task_is_stopped(task)) { - /* - * The event report hooks could have blocked, though - * it should have been briefly. Make sure we're in - * TASK_STOPPED state again to block properly, unless - * we've just come back out of job control stop. - */ + if (sig_locked) spin_lock_irq(&task->sighand->siglock); - if (task->signal->flags & SIGNAL_STOP_STOPPED) - __set_current_state(TASK_STOPPED); - spin_unlock_irq(&task->sighand->siglock); - } - if (task_is_stopped(current)) { - /* - * While in TASK_STOPPED, we can be considered safely - * stopped by utrace_do_stop() only once we set this. - */ - spin_lock(&utrace->lock); - utrace->stopped = 1; - spin_unlock(&utrace->lock); - } + return task->signal->group_stop_count != 0; } /* --- xxx/kernel/signal.c~JCTL 2009-03-03 18:11:47.000000000 +0100 +++ xxx/kernel/signal.c 2009-03-15 22:07:30.000000000 +0100 @@ -1641,7 +1641,7 @@ finish_stop(int stop_count) * a group stop in progress and we are the last to stop, * report to the parent. When ptraced, every thread reports itself. */ - if (tracehook_notify_jctl(stop_count == 0, CLD_STOPPED)) { + if (stop_count == 0) { read_lock(&tasklist_lock); do_notify_parent_cldstop(current, CLD_STOPPED); read_unlock(&tasklist_lock); @@ -1785,8 +1785,7 @@ relock: signal->flags &= ~SIGNAL_CLD_MASK; spin_unlock_irq(&sighand->siglock); - if (unlikely(!tracehook_notify_jctl(1, why))) - goto relock; + tracehook_notify_jctl(false, why); read_lock(&tasklist_lock); do_notify_parent_cldstop(current->group_leader, why); @@ -1798,6 +1797,7 @@ relock: struct k_sigaction *ka; if (unlikely(signal->group_stop_count > 0) && + tracehook_notify_jctl(true, CLD_STOPPED) && do_signal_stop(0)) goto relock; @@ -1872,6 +1872,7 @@ relock: if (is_current_pgrp_orphaned()) goto relock; + tracehook_notify_jctl(false, CLD_STOPPED); spin_lock_irq(&sighand->siglock); } @@ -1953,7 +1954,8 @@ void exit_signals(struct task_struct *ts out: spin_unlock_irq(&tsk->sighand->siglock); - if (unlikely(group_stop) && tracehook_notify_jctl(1, CLD_STOPPED)) { + if (unlikely(group_stop)) { + tracehook_notify_jctl(false, CLD_STOPPED); read_lock(&tasklist_lock); do_notify_parent_cldstop(tsk, CLD_STOPPED); read_unlock(&tasklist_lock); ------------------------------------------------------------------------------- Now we can change utrace_do_stop(), no need to check JCTL any longer, --- xxx/kernel/utrace.c~STOP 2009-03-15 22:59:36.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-15 23:29:19.000000000 +0100 @@ -794,20 +794,6 @@ static bool utrace_do_stop(struct task_s { bool stopped; - /* - * If it will call utrace_report_jctl() but has not gotten - * through it yet, then don't consider it quiescent yet. - * utrace_report_jctl() will take @utrace->lock and - * set @utrace->stopped itself once it finishes. After that, - * it is considered quiescent; when it wakes up, it will go - * through utrace_get_signal() before doing anything else. - */ - if (task_is_stopped(target) && - !(target->utrace_flags & UTRACE_EVENT(JCTL))) { - utrace->stopped = 1; - return true; - } - stopped = false; spin_lock_irq(&target->sighand->siglock); if (unlikely(target->exit_state)) { @@ -819,8 +805,7 @@ static bool utrace_do_stop(struct task_s if (!(target->utrace_flags & DEATH_EVENTS)) utrace->stopped = stopped = true; } else if (task_is_stopped(target)) { - if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) - utrace->stopped = stopped = true; + utrace->stopped = stopped = true; } else if (!utrace->report && !utrace->interrupt) { utrace->report = 1; set_notify_resume(target); ------------------------------------------------------------------------------- Again, this is not complete and likely buggy. But what do you think? Oleg. From roland at redhat.com Mon Mar 16 01:14:01 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 18:14:01 -0700 (PDT) Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: Oleg Nesterov's message of Sunday, 15 March 2009 23:33:00 +0100 <20090315223300.GA10526@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> Message-ID: <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> > I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL. Right, that is key. > OK, let's look at utrace_do_stop: > > if (task_is_stopped(target) && > !(target->utrace_flags & UTRACE_EVENT(JCTL))) { > utrace->stopped = 1; > return true; > } > > This doesn't look correct. We don't hold ->siglock, the task can be > SIGCONT'ed and return from get_signal_to_deliver(), and then we set > ->stopped. Or I missed something again? I think you're right. The logic there was supposed to be, "TASK_STOPPED means it will get into utrace_get_signal()." That much is true, but nothing inside utrace_get_signal() actually synchronizes with this to make that matter. All this check does is try to optimize the TASK_STOPPED case not to take the siglock. That doesn't seem worth much, so we can just drop it. > Then we re-do this (well, almost) check under ->siglock, > > } else if (task_is_stopped(target)) { > if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) > utrace->stopped = stopped = true; > } > > But this is not nice. Let's suppose the task is already stopped, we do > UTRACE_ATTACH + utrace_set_events(JCTL). This is exactly why utrace_set_events() sets ->stopped preemptively for that case. > Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't > even set ->report. Yes, we can't set ->stopped if JCTL, we can race with > utrace_report_jctl() which does REPORT(). Setting JCTL while in TASK_STOPPED made it set ->stopped, so utrace_control() succeeds without calling utrace_do_stop(). > BTW, afaics utrace_report_jctl() has another bug, > > REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), > report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); > > I think it should do > > REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), > report_jctl, what, notify); > > instead. There is a bug, but your fix changes a key API choice. I've put in this fix: - report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); + report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, + notify ? what : 0); There are two things a tracer might be tracking: state or events. The "state" is whether the thread is in job control stop or is running. The "events" are the SIGCHLD notifications that the thread tries to post to its parent. The @type argument shows the state we will be in after the callback. If the state changes, there will be another callback. That's what a state-tracking tracer needs, e.g. to keep a little light on the screen red while the thread is stopped and green while it's running. The @notify argument shows what SIGCHLD the parent sees (if it were dequeuing all possible SIGCHLD postings as quickly as they come). That's what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs are expected in the parent. Your change to @type would break state-trackers in the case where tracehook_notify_jctl() is called from get_signal_to_deliver() with CLD_STOPPED. > With the first patch, we call utrace_report_jctl() before we actually > stop. do_signal_stop() can fail then, but I think this is OK, we can > pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, > and with this patch we always call ->report_jctl with notify == 0. Just for > discussion. I think I sort of understand the intent of your patch. If we change the calling convention for tracehook_notify_jctl, I think we want to preserve the aspect that the hook decides about sending the notification. That's how the ptrace quirks can be reimplemented differently later without changing the tracehook layer again. Also, we certainly don't want one tracehook call with two different locking conditions. It seems right in principle to do the reporting before we change ->state, given that we have to allow for it changing during the callbacks. And indeed, that avoids the JCTL special case mess entirely. Thanks, Roland From opinions at imp.uni-erlangen.de Mon Mar 16 01:18:55 2009 From: opinions at imp.uni-erlangen.de (Judy) Date: Mon, 16 Mar 2009 03:18:55 +0200 Subject: Are you and your friends fine? Message-ID: <20090316031855.8040205@imp.uni-erlangen.de> I hope you haven.t been there http://zihzke.breakingnewsltd.com/news.php From roland at redhat.com Mon Mar 16 01:55:41 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 18:55:41 -0700 (PDT) Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: Oleg Nesterov's message of Friday, 13 March 2009 22:59:12 +0100 <20090313215912.GA1856@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> <20090313215912.GA1856@redhat.com> Message-ID: <20090316015541.11C33FC3AB@magilla.sf.frob.com> > > + * > > + * In case we had no engines before, make sure that > > + * utrace_flags is not zero when tracehook_notify_resume() > > + * checks. That would bypass utrace reporting clearing > > + * TIF_NOTIFY_RESUME, and thus violate the same invariant. > > */ > > + target->utrace_flags |= UTRACE_EVENT(REAP); > > list_add_tail(&engine->entry, &utrace->attaching); > > utrace->report = 1; > > set_notify_resume(target); > > Agreed. I put that in. > > Does that need a barrier pair here and in > > No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(), Ah, ok. > > tracehook_notify_resume()? > > Ah. I think you are right, and I think it needs the barrier even without > this change. Say, UTRACE_REPORT does: > > utrace->report = 1; > set_notify_resume(); > > Without mb() there is no guarantee that utrace_resume() will notice and > clear ->report. Wait, what? You just said that set_notify_resume() already implies an mb(). > smp_mb__after_clear_bit() is enough, but in that case perhaps it is better > to modify the arch dependent do_notify_resume(). I don't follow this. But we don't want a solution that requires changing arch code. Why can't tracehook_notify_resume() do whatever is required? > > + * > > + * Any engine that's not detached implies tracking the REAP event, > > + * whether or not that engine wants a report_reap callback. Any > > + * engine requires attention from utrace_release_task(). > > */ > > list_for_each_entry_safe(engine, next, &utrace->attached, entry) { > > This looks misleading, utrace_release_task() is called unconditionally, and > we could use any unused bit afacis (REAP only makes sense for engine->flags, > we never check ->utrace_flags & REAP). It's true that any bit at all would do, but REAP is one that makes some sense logically and also one that is implicitly reserved in utrace_flags already (without having to reserve another one from engine.flags). It's true that utrace_release_task() is called unconditionally now, but it might not always be so. It seems like a very intuitive and useful invariant that utrace_flags==0 means "utrace totally empty". It's unconditional now because the previous code tested the indirect pointer rather than flags (for reasons we can no longer be very sure of). If we can convince ourselves about the interlocks, then it would be better to have it test utrace_flags and not call into utrace.c for the common case (nor take the utrace lock). > Also, whatever reason we have to keep ->utrace_flags != 0, the same > reason applies to ->utrace_flags |= XXX in utrace_add_engine(). Hence the change we agreed to above. > utrace_reset() also does > > if (task->exit_state) { > flags &= DEAD_FLAGS_MASK; > > The comment about DEAD_FLAGS_MASK > > /* > * Only these flags matter any more for a dead task (exit_state set). > * We use this mask on flags installed in ->utrace_flags after > * exit_notify (and possibly utrace_report_death) has run. I think these macros are from when reap did a quiesce callback in a previous incarnation of the API. It doesn't make much sense to use the macro for just UTRACE_EVENT(REAP) now. > Looks a bit confusing to me. Unless exit_notify() calls utrace_report_death() > we don't change ->utrace_flags. If it doesn't call utrace_report_death(), that means DEATH_EVENTS were not in ->utrace_flags. > Yes. But this means we could do "flags &= ~DEATH_EVENTS" instead. This is > subjective of course, but looks more clean to me. > > Note also that utrace_reset() is the only user of DEAD_FLAGS_MASK and > LIVE_FLAGS_MASK has no users. I got rid of those macros and replaced the comment with this: if (task->exit_state) { + /* + * Once it's already dead, we never install any flags + * except REAP. When ->exit_state is set and events + * like DEATH are not set, then they never can be set. + * This ensures that utrace_release_task() knows + * positively that utrace_report_death() can never run. + */ BUG_ON(utrace->death); - flags &= DEAD_FLAGS_MASK; + flags &= UTRACE_EVENT(REAP); wake = false; I think it makes sense to use this mask because what we are specifically concerned with here is that utrace_release_task() is the one and only utrace entry point that the task might take hereafter. > Also, it would be better imho to change tracehook_report_death() to use > DEATH_EVENTS too, it is always good when grep can find the usage. I made _UTRACE_DEATH_EVENTS that common macro. Thanks, Roland From roland at redhat.com Mon Mar 16 02:34:21 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 19:34:21 -0700 (PDT) Subject: utrace_set_events/utrace_control && death/reap checks In-Reply-To: Oleg Nesterov's message of Saturday, 14 March 2009 00:33:00 +0100 <20090313233300.GA14605@redhat.com> References: <20090313233300.GA14605@redhat.com> Message-ID: <20090316023421.C6136FC3AB@magilla.sf.frob.com> > utrace_set_events: > > (utrace->death && ((old_flags & ~events) & DEATH_EVENTS)) > > "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to > clear DEATH/QUIESCE. Why this is not allowed? And why this is not > allowed _only_ when the target runs utrace_report_death()->REPORT()? This is specifically documented for -EALREADY, and in the DocBook section "Interlock with final callbacks". The idea is this: For most utrace events, you don't know whether you'll get some callbacks. It could be, the task got SIGKILL first thing after you attached, and it will never report anything. That is fine for the most part. But for the lifetime events it becomes a real burden on the users of the API. They have to manage their data structures, and so they have to know reliably when they can and can't get what callbacks. So, the utrace_set_events rules try to ensure that the caller knows for sure whether it will or won't get a callback when the task dies and/or is reaped. You can clear DEATH/QUIESCE, and be sure from the return value that it is now impossible that there is a report_death/report_quiesce callback racing with you because the guy just got a SIGKILL. If you can't be sure of that, then you do know for sure that your callback is being made right now or very soon. > I think this line can be just killed. I guess the intent was to > prevent utrace_release_task() from doing utrace_reap() in parallel > with utrace_report_death(), but note that utrace_set_events() can > never "shrinks" ->utrace_flags, it only sets new bits. It's not ->utrace_flags that matters here, it's engine->flags. That is one of the intents, but not the only one. It's just as important that the user of the API can rely on the ordering of its callbacks wrt its utrace_set_events/utrace_control calls as that it can rely on the ordering of its death and reap callbacks. > The next line looks strange too, don't we need > > (utrace->reap && ((events & ~old_flags) & UTRACE_EVENT(REAP))) > > ? get_utrace_lock() already returned -ESRCH if it was in EXIT_DEAD, so this is probably moot. > And I don't understand why do we need utrace->death at all. Apart from > utrace_set_events (which I think doesn't need it), it is only used by > utrace_control(UTRACE_DETACH). But I can't see how can we race with > utrace_report_death(). If it can be called, we have DEATH_EVENTS bits > set. But in that case utrace_do_stop() can't succeed, so UTRACE_DETACH > can only do mark_engine_wants_stop() but not utrace_reset(). It is used by utrace_set_events and utrace_control for the same purpose. Those calls must know for sure that report_death cannot happen, or else that it will (or it's already happening). Many tracers only keep track until death. For them, the simple thing is to have report_death clean up their data structures and return UTRACE_DETACH. But then they also want to do asynchronous detach. So they can do utrace_set_events or utrace_control as the synchronizing step of asynchronous tear-down. If it returns 0, then report_death will not and it is safe to destroy data structures the callback code would use. If it returns -EALREADY, then report_death will shortly be called and we can rely on our callback code to take care of the data structures before it returns UTRACE_DETACH. Thanks, Roland From roland at redhat.com Mon Mar 16 02:48:22 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 19:48:22 -0700 (PDT) Subject: UTRACE_STOP race condition? In-Reply-To: Renzo Davoli's message of Wednesday, 11 February 2009 10:59:46 +0100 <20090211095946.GA2597@cs.unibo.it> References: <20090211095946.GA2597@cs.unibo.it> Message-ID: <20090316024822.23585FC3AB@magilla.sf.frob.com> Thanks very much for the feedback, Renzo. You seem to be about the only person to thoroughly exercise this part of the API so far. I'm sure it can use some refinement. > please help me. Either I have not understood the meaning of UTRACE_STOP > or it is completely useless due to a race condition. I'm confident it can be a little bit of each. ;-) > There are always two entities in a utrace interaction: the traced > process and the tracing module. There are lots of ways to slice things into a notion like 'entity'. Let's be precise in what we're specifically discussing right now. The question at hand is about synchronization between two threads: a traced task and a control task. > When a traced event occurs in the traced process the correspondent > report function gets called in the module. Your engine's callback function is run by the traced task, yes. > If the report function returns UTRACE_STOP the traced process stays in a > quiescent state and the module wakes it up by a > utrace_control(...,UTRACE_RESUME) call *later*. A control task (i.e. whatever other task) can make this call at some time, yes. > If the module wakes the traced process too quickly, utrace has not yet put > it into a "stopped" state, therefore UTRACE_RESUME gets lost. > As a consequence, the execution is blocked. > > IMHO, given the current utrace code, there is no way to set up some kind > of synchronization in the module to prevent this error. I understand what scenario you mean. The rest of your message talks about implementation details of utrace internals. Frankly I find this confusing and distracting from the API discussion. I've gone to some pains to explicitly document what all the API guarantees and requirements are (and aren't), in the kerneldoc and docbook text. I would like us to discuss the problems for writing tracing engines in terms of the documented API constraints and guarantees. The API documentation says what the contract is between the kernel and the module writer. If that specification is ambiguous, we'll first fix the descriptions to be clear. If what it specifies needs to change into a better contract for module writers, we'll decide what new contract to agree on. Finally, if the utrace implementation does not do what it says, then we'll fix the implementation. Your postings have thrown all this together, which does not work for me. Please start a separate thread about each separate issue, such as callback order among engines. I understand your motivation for all these things is tied together, but they are separate subjects to address individually. In commit 3a9f4c87, I made a change/clarification to the API documentation for utrace_barrier() and a corresponding fix to the implementation. What this does that was missing before is that utrace_barrier() does not consider your engine's callback to be complete until your callback's return value has been processed. That means that if utrace_barrier() returned 0 and then you call utrace_control(UTRACE_RESUME), the UTRACE_STOP return value of your prior callback is definitely before the UTRACE_RESUME of your asynchronous control call. Please address your concerns on the synchronization issue with respect to the documented API guarantee now made by this utrace_barrier() behavior. Thanks, Roland From roland at redhat.com Mon Mar 16 04:22:58 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 21:22:58 -0700 (PDT) Subject: [PATCH] utrace_tracer_task: s/list_for_each_safe/list_for_each_entry In-Reply-To: Oleg Nesterov's message of Thursday, 12 March 2009 01:28:59 +0100 <20090312002859.GA20725@redhat.com> References: <20090310182327.GA3826@redhat.com> <20090310215757.1D3BCFC3B6@magilla.sf.frob.com> <20090312002859.GA20725@redhat.com> Message-ID: <20090316042258.96EC0FC3AB@magilla.sf.frob.com> > utrace_tracer_task() can use list_for_each_entry() too. Yes, but ... I'm reminded that this function is its own can of worms. It's called by other threads, without any synchronization, so it cannot safely used utrace->attached unlocked like reporting passes do. The tracer_task and unsafe_exec hooks are there mainly for ptrace. I've decided to punt these utrace hooks for now. When we get to doing a cleaned-up ptrace on utrace (or some other facility that brings in the need for the unsafe_exec hook), we can figure out how to cleanly and safely support some utrace API feature for that. Thanks, Roland From roland at redhat.com Mon Mar 16 03:59:32 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 15 Mar 2009 20:59:32 -0700 (PDT) Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated) In-Reply-To: Renzo Davoli's message of Friday, 13 March 2009 07:36:17 +0100 <20090313063616.GA11403@cs.unibo.it> References: <20090312131330.GB25801@cs.unibo.it> <20090312173532.GB26657@redhat.com> <20090313063616.GA11403@cs.unibo.it> Message-ID: <20090316035932.1CF73FC3AB@magilla.sf.frob.com> > When a report function of an engine returns UTRACE_STOP, it means (may mean) > that it wants to change the status of the process before resuming it. > VM monitors often change the status, sometimes debugger users want to set > some variables too. Yes. In ideal cases, it can decide up front quickly what it wants to do, and change the user state right in the callback without stopping. But when it needs another agent to decide what to do, it uses UTRACE_STOP. > IMHO, utrace should stop it *before* calling the report function of the > next engine, No, we'll never want to do it this way. One engine doesn't get to arbitrarily delay the reporting to other engines of the thread's events. This is both an efficiency point and a robustness point. It's important to remember that utrace is about the primitive events: the user thread had an event ... the user thread is about to run again. The high-level notion of "what did the other engine do?" is built from examining the state at these events, and knowing about the delays that other engines are imposing via UTRACE_STOP. > otherwise we need to set up another structure to synchronize > the engines (that may even be unknown one to the other). > If there is a tracer/debugger among the engines, it is not even possible to know > which snapshot it gets, after or before the modification created by the VM > monitor? This is where the broader discussion of callback order comes in. When a previous engine has decided to use UTRACE_STOP, your callback's @action argument reflects this. You know that another engine is going to do something asynchronous before it lets the user thread run. If your own engine doesn't especially want it stopped now but wants to see what it looks like when other engines are done fiddling with it, then you can use UTRACE_REPORT. That ensures that you'll get a report_quiesce callback after those other engines have done their thing. Thanks, Roland From fche at redhat.com Mon Mar 16 22:18:00 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 16 Mar 2009 18:18:00 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090316214526.GA15119@Krystal> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> Message-ID: <20090316221800.GE12974@redhat.com> Hi - On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote: > [...] > > As far as I know, utrace supports multiple trace-engines on a process. > > Since ptrace is just an engine of utrace, you can add another engine on utrace. > > > > utrace-+-ptrace_engine---owner_process > > | > > +-systemtap_module > > | > > +-ftrace_plugin Right. In this way, utrace is simply a multiplexing intermediary. > > Here, Frank had posted an example of utrace->ftrace engine. > > http://lkml.org/lkml/2009/1/27/294 > > > > And here is the latest his patch(which seems to support syscall tracing...) > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124 > > > > Reminder : we are looking at system-wide tracing here. Here are some > comments about the current utrace implementation. > > Looking at include/linux/utrace.h from the tree > > 17 * A tracing engine starts by calling utrace_attach_task() or > 18 * utrace_attach_pid() on the chosen thread, passing in a set of hooks > 19 * (&struct utrace_engine_ops), and some associated data. This produces a > 20 * &struct utrace_engine, which is the handle used for all other > 21 * operations. An attached engine has its ops vector, its data, and an > 22 * event mask controlled by utrace_set_events(). > > So if the system has, say 3000 threads, then we have 3000 struct > utrace_engine created ? I wonder what effet this could have one > cachelines if this is used to trace hot paths like system call > entry/exit. Have you benchmarked this kind of scenario under tbench ? It has not been a problem, since utrace_engines are designed to be lightweight. Starting or stopping a systemtap script of the form probe process.syscall {} appears to have no noticable impact on a tbench suite. > 24 * For each event bit that is set, that engine will get the > 25 * appropriate ops->report_*() callback when the event occurs. The > 26 * &struct utrace_engine_ops need not provide callbacks for an event > 27 * unless the engine sets one of the associated event bits. > > Looking at utrace_set_events(), we seem to be limited to 32 events on a > 32-bits architectures because it uses a bitmask ? Isn't it a bit small? There are only a few types of thread events that involve different classes of treatment, or different degrees of freedom in terms of interference with the uninstrumented fast path of the threads. For example, it does not make sense to have different flag bits for different system calls, since choosing to trace *any* system call involves taking the thread off of the fast path with the TIF_ flag. Once it's off the fast path, it doesn't matter whether the utrace core or some client performs syscall discrimination, so it is left to the client. > 682 /** > 683 * utrace_set_events_pid - choose which event reports a tracing engine gets > 684 * @pid: thread to affect > 685 * @engine: attached engine to affect > 686 * @eventmask: new event mask > 687 * > 688 * This is the same as utrace_set_events(), but takes a &struct pid > 689 * pointer rather than a &struct task_struct pointer. The caller must > 690 * hold a ref on @pid, but does not need to worry about the task > 691 * staying valid. If it's been reaped so that @pid points nowhere, > 692 * then this call returns -%ESRCH. > > > Comments like "but does not need to worry about the task staying valid" > does not make me feel safe and comfortable at all, could you explain > how you can assume that derefencing an "invalid" pointer will return > NULL ? (We're doing a final round of "internal" (pre-LKML) reviews of the utrace implementation right now on utrace-devel at redhat.com, where such comments get fastest attention from the experts.) For this particular issue, the utrace documentation file explains the liveness rules for the various pointers that can be fed to or received from utrace functions. This is not about "feeling" safe, it's about what the mechanism is deliberately designed to permit. > About the utrace_attach_task() : > > 244 if (unlikely(target->flags & PF_KTHREAD)) > 245 /* > 246 * Silly kernel, utrace is for users! > 247 */ > 248 return ERR_PTR(-EPERM); > > So we cannot trace kernel threads ? I'm not quite sure about all the reasons for this, but I believe that kernel threads don't tend to engage in job control / signal / system-call activities the same way as normal user threads do. > 118 /* > 119 * Called without locks, when we might be the first utrace engine to attach. > 120 * If this is a newborn thread and we are not the creator, we have to wait > 121 * for it. The creator gets the first chance to attach. The PF_STARTING > 122 * flag is cleared after its report_clone hook has had a chance to run. > 123 */ > 124 static inline int utrace_attach_delay(struct task_struct *target) > 125 { > 126 if ((target->flags & PF_STARTING) && target->real_parent != current) > 127 do { > 128 schedule_timeout_interruptible(1); > 129 if (signal_pending(current)) > 130 return -ERESTARTNOINTR; > 131 } while (target->flags & PF_STARTING); > 132 > 133 return 0; > 134 } > > Why do we absolutely have to poll until the thread has started ? (I don't know off the top of my head - Roland?) > utrace_add_engine() > set_notify_resume(target); > > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice > that it is set asynchronously with the execution of the target thread > (as I do with my TIF_KERNEL_TRACE thread flag). > > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in > entry_64.S > > int_signal: > and > retint_signal: > > code paths. However, if there is no syscall tracing to do upon syscall > entry, the thread flags are not re-read at syscall exit and you will > miss the syscall exit returning from your target thread if this thread > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in > some subtle way I did not figure out ? BTW re-reading the TIF flags from > the thread_info at syscall exit on the fast path is out of question > because it considerably degrades the kernel performances. entry_*.S is > a very, very critical path. (I don't know off the top of my head - Roland?) - FChE From fweisbec at gmail.com Mon Mar 16 23:46:58 2009 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Tue, 17 Mar 2009 00:46:58 +0100 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090316221800.GE12974@redhat.com> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090316221800.GE12974@redhat.com> Message-ID: <20090316234657.GC6150@nowhere> On Mon, Mar 16, 2009 at 06:18:00PM -0400, Frank Ch. Eigler wrote: > Hi - > > > On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote: > > > [...] > > > As far as I know, utrace supports multiple trace-engines on a process. > > > Since ptrace is just an engine of utrace, you can add another engine on utrace. > > > > > > utrace-+-ptrace_engine---owner_process > > > | > > > +-systemtap_module > > > | > > > +-ftrace_plugin > > Right. In this way, utrace is simply a multiplexing intermediary. > > > > > Here, Frank had posted an example of utrace->ftrace engine. > > > http://lkml.org/lkml/2009/1/27/294 > > > > > > And here is the latest his patch(which seems to support syscall tracing...) > > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124 > > > > > > > Reminder : we are looking at system-wide tracing here. Here are some > > comments about the current utrace implementation. > > > > Looking at include/linux/utrace.h from the tree > > > > 17 * A tracing engine starts by calling utrace_attach_task() or > > 18 * utrace_attach_pid() on the chosen thread, passing in a set of hooks > > 19 * (&struct utrace_engine_ops), and some associated data. This produces a > > 20 * &struct utrace_engine, which is the handle used for all other > > 21 * operations. An attached engine has its ops vector, its data, and an > > 22 * event mask controlled by utrace_set_events(). > > > > So if the system has, say 3000 threads, then we have 3000 struct > > utrace_engine created ? I wonder what effet this could have one > > cachelines if this is used to trace hot paths like system call > > entry/exit. Have you benchmarked this kind of scenario under tbench ? > > It has not been a problem, since utrace_engines are designed to be > lightweight. Starting or stopping a systemtap script of the form > > probe process.syscall {} > > appears to have no noticable impact on a tbench suite. > > > > 24 * For each event bit that is set, that engine will get the > > 25 * appropriate ops->report_*() callback when the event occurs. The > > 26 * &struct utrace_engine_ops need not provide callbacks for an event > > 27 * unless the engine sets one of the associated event bits. > > > > Looking at utrace_set_events(), we seem to be limited to 32 events on a > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small? > > There are only a few types of thread events that involve different > classes of treatment, or different degrees of freedom in terms of > interference with the uninstrumented fast path of the threads. > > For example, it does not make sense to have different flag bits for > different system calls, since choosing to trace *any* system call > involves taking the thread off of the fast path with the TIF_ flag. > Once it's off the fast path, it doesn't matter whether the utrace core > or some client performs syscall discrimination, so it is left to the > client. > > > > 682 /** > > 683 * utrace_set_events_pid - choose which event reports a tracing engine gets > > 684 * @pid: thread to affect > > 685 * @engine: attached engine to affect > > 686 * @eventmask: new event mask > > 687 * > > 688 * This is the same as utrace_set_events(), but takes a &struct pid > > 689 * pointer rather than a &struct task_struct pointer. The caller must > > 690 * hold a ref on @pid, but does not need to worry about the task > > 691 * staying valid. If it's been reaped so that @pid points nowhere, > > 692 * then this call returns -%ESRCH. > > > > > > Comments like "but does not need to worry about the task staying valid" > > does not make me feel safe and comfortable at all, could you explain > > how you can assume that derefencing an "invalid" pointer will return > > NULL ? > > (We're doing a final round of "internal" (pre-LKML) reviews of the > utrace implementation right now on utrace-devel at redhat.com, where such > comments get fastest attention from the experts.) > > For this particular issue, the utrace documentation file explains the > liveness rules for the various pointers that can be fed to or received > from utrace functions. This is not about "feeling" safe, it's about > what the mechanism is deliberately designed to permit. > > > > About the utrace_attach_task() : > > > > 244 if (unlikely(target->flags & PF_KTHREAD)) > > 245 /* > > 246 * Silly kernel, utrace is for users! > > 247 */ > > 248 return ERR_PTR(-EPERM); > > > > So we cannot trace kernel threads ? > > I'm not quite sure about all the reasons for this, but I believe that > kernel threads don't tend to engage in job control / signal / > system-call activities the same way as normal user threads do. > Some of them use some syscalls, but it doesn't involve a user/kernel switch. So it's not tracable by hooking syscall_entry/exit or using tracehooks. It would require specific hooks on sys_* functions for that. So this check is right (writing on each thread info seems somewhat costly so it's better if it is avoided like here). Frederic. > > 118 /* > > 119 * Called without locks, when we might be the first utrace engine to attach. > > 120 * If this is a newborn thread and we are not the creator, we have to wait > > 121 * for it. The creator gets the first chance to attach. The PF_STARTING > > 122 * flag is cleared after its report_clone hook has had a chance to run. > > 123 */ > > 124 static inline int utrace_attach_delay(struct task_struct *target) > > 125 { > > 126 if ((target->flags & PF_STARTING) && target->real_parent != current) > > 127 do { > > 128 schedule_timeout_interruptible(1); > > 129 if (signal_pending(current)) > > 130 return -ERESTARTNOINTR; > > 131 } while (target->flags & PF_STARTING); > > 132 > > 133 return 0; > > 134 } > > > > Why do we absolutely have to poll until the thread has started ? > > (I don't know off the top of my head - Roland?) > > > > utrace_add_engine() > > set_notify_resume(target); > > > > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice > > that it is set asynchronously with the execution of the target thread > > (as I do with my TIF_KERNEL_TRACE thread flag). > > > > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in > > entry_64.S > > > > int_signal: > > and > > retint_signal: > > > > code paths. However, if there is no syscall tracing to do upon syscall > > entry, the thread flags are not re-read at syscall exit and you will > > miss the syscall exit returning from your target thread if this thread > > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in > > some subtle way I did not figure out ? BTW re-reading the TIF flags from > > the thread_info at syscall exit on the fast path is out of question > > because it considerably degrades the kernel performances. entry_*.S is > > a very, very critical path. > > (I don't know off the top of my head - Roland?) > > > - FChE From oleg at redhat.com Tue Mar 17 01:21:43 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 17 Mar 2009 02:21:43 +0100 Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> Message-ID: <20090317012143.GA17780@redhat.com> On 03/15, Roland McGrath wrote: > > > Then we re-do this (well, almost) check under ->siglock, > > > > } else if (task_is_stopped(target)) { > > if (!(target->utrace_flags & UTRACE_EVENT(JCTL))) > > utrace->stopped = stopped = true; > > } > > > > But this is not nice. Let's suppose the task is already stopped, we do > > UTRACE_ATTACH + utrace_set_events(JCTL). > > This is exactly why utrace_set_events() sets ->stopped preemptively for > that case. Yes, thanks. I saw this code in utrace_set_events(), but then forgot. > > REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), > > report_jctl, what, notify); > > > > instead. > > There is a bug, but your fix changes a key API choice. > I've put in this fix: > > - report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); > + report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, > + notify ? what : 0); > > The @type argument shows the state we will be in after the callback. > If the state changes, there will be another callback. That's what a > state-tracking tracer needs, e.g. to keep a little light on the screen red > while the thread is stopped and green while it's running. > > The @notify argument shows what SIGCHLD the parent sees (if it were > dequeuing all possible SIGCHLD postings as quickly as they come). That's > what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs > are expected in the parent. I see, thanks. > > With the first patch, we call utrace_report_jctl() before we actually > > stop. do_signal_stop() can fail then, but I think this is OK, we can > > pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, > > and with this patch we always call ->report_jctl with notify == 0. Just for > > discussion. > > I think I sort of understand the intent of your patch. If we change the > calling convention for tracehook_notify_jctl, I think we want to preserve > the aspect that the hook decides about sending the notification. That's > how the ptrace quirks can be reimplemented differently later without > changing the tracehook layer again. Also, we certainly don't want one > tracehook call with two different locking conditions. Agreed, "bool sig_locked" is awful. But we can avoid it. The real problem is how to figure out the correct "notify" argument. I'll try to think more, but I am not sure I will find the clean solution :( Just in case. We can introduce PF_SIGCONTED flag and change prepare_signal(SIGCONT) and signal_wake_up(resume => 1) to set this flag. Since the task never changes its ->flags in finish_stop() path, it is safe to do this under ->siglock. This way utrace_report_jctl() can drop TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring the ->state. But yes sure, this is very, very ugly. Oleg. From oleg at redhat.com Tue Mar 17 01:34:22 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 17 Mar 2009 02:34:22 +0100 Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: <20090316015541.11C33FC3AB@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> <20090313215912.GA1856@redhat.com> <20090316015541.11C33FC3AB@magilla.sf.frob.com> Message-ID: <20090317013422.GB17780@redhat.com> On 03/15, Roland McGrath wrote: > > > > > Does that need a barrier pair here and in > > > > No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(), > > Ah, ok. > > > > tracehook_notify_resume()? > > > > Ah. I think you are right, and I think it needs the barrier even without > > this change. Say, UTRACE_REPORT does: > > > > utrace->report = 1; > > set_notify_resume(); > > > > Without mb() there is no guarantee that utrace_resume() will notice and > > clear ->report. > > Wait, what? You just said that set_notify_resume() already implies an mb(). Yes, but the other side lacks a barrier. UTRACE_REPORT does utrace->report = 1; wmb(); // actually mb, but wmb is enough set _TIF_NOTIFY_RESUME; do_notify_resume()->utrace_resume()->start_report() path does if (_TIF_NOTIFY_RESUME) // !!! we need rmb in between !!! if (utrace->report) ... and it can miss ->report. > But we don't want a solution that requires changing > arch code. Yes, agreed. Oleg. From oleg at redhat.com Tue Mar 17 02:33:48 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 17 Mar 2009 03:33:48 +0100 Subject: utrace_set_events/utrace_control && death/reap checks In-Reply-To: <20090316023421.C6136FC3AB@magilla.sf.frob.com> References: <20090313233300.GA14605@redhat.com> <20090316023421.C6136FC3AB@magilla.sf.frob.com> Message-ID: <20090317023348.GC17780@redhat.com> On 03/15, Roland McGrath wrote: > > > utrace_set_events: > > > > (utrace->death && ((old_flags & ~events) & DEATH_EVENTS)) > > > > "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to > > clear DEATH/QUIESCE. Why this is not allowed? And why this is not > > allowed _only_ when the target runs utrace_report_death()->REPORT()? > > This is specifically documented for -EALREADY, and in the DocBook section > "Interlock with final callbacks". The idea is this: Aha, I didn't know. > > And I don't understand why do we need utrace->death at all. > ... > > it is only used by > > utrace_control(UTRACE_DETACH). > ... > that it will (or it's already happening). > > utrace_control as the synchronizing step of > asynchronous tear-down. If it returns 0, then report_death will not and it > is safe to destroy data structures the callback code would use. Yes, with your explanation above this is clear. But can't we simplify this check a little bit? utrace_control: else if (unlikely(target->utrace_flags & DEATH_EVENTS) || unlikely(utrace->death)) { return -EALREADY; can't we just do else if (unlikely(utrace->death)) { return -EALREADY; I guess I missed something, but can't understand why do we need to check ->utrace_flags. We are going to call mark_engine_detached() below which clears engine->flags, and we hold utrace->lock. If utrace_flags & DEATH_EVENTS is true, the subsequent utrace_report_death() must see engine->flags == 0 (it takes utrace->lock before REPORT_CALLBACKS), so it won't call any callback. Yes, it can play with engine itself, but this should be safe because "struct utrace" has a reference to attached engine. No? Oleg. From oleg at redhat.com Tue Mar 17 05:24:42 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 17 Mar 2009 06:24:42 +0100 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090316214526.GA15119@Krystal> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> Message-ID: <20090317052442.GA32674@redhat.com> On 03/16, Mathieu Desnoyers wrote: > > utrace_add_engine() > set_notify_resume(target); > > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice > that it is set asynchronously with the execution of the target thread > (as I do with my TIF_KERNEL_TRACE thread flag). > > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in > entry_64.S > > int_signal: > and > retint_signal: > > code paths. However, if there is no syscall tracing to do upon syscall > entry, the thread flags are not re-read at syscall exit and you will > miss the syscall exit returning from your target thread if this thread > was blocked while you set its TIF_NOTIFY_RESUME. Afaics, TIF_NOTIFY_RESUME is not needed to trace syscall entry/exit. If engine wants the syscall tracing, utrace_set_events(UTRACE_SYSCALL_xxx) sets TIF_SYSCALL_TRACE. And syscall_trace_enter/syscall_trace_leave call tracehook_report_syscall_xxx(). Oleg. From mathieu.desnoyers at polymtl.ca Tue Mar 17 16:00:29 2009 From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers) Date: Tue, 17 Mar 2009 12:00:29 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090317052442.GA32674@redhat.com> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090317052442.GA32674@redhat.com> Message-ID: <20090317160029.GD10092@Krystal> * Oleg Nesterov (oleg at redhat.com) wrote: > On 03/16, Mathieu Desnoyers wrote: > > > > utrace_add_engine() > > set_notify_resume(target); > > > > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice > > that it is set asynchronously with the execution of the target thread > > (as I do with my TIF_KERNEL_TRACE thread flag). > > > > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in > > entry_64.S > > > > int_signal: > > and > > retint_signal: > > > > code paths. However, if there is no syscall tracing to do upon syscall > > entry, the thread flags are not re-read at syscall exit and you will > > miss the syscall exit returning from your target thread if this thread > > was blocked while you set its TIF_NOTIFY_RESUME. > > Afaics, TIF_NOTIFY_RESUME is not needed to trace syscall entry/exit. > If engine wants the syscall tracing, utrace_set_events(UTRACE_SYSCALL_xxx) > sets TIF_SYSCALL_TRACE. And syscall_trace_enter/syscall_trace_leave call > tracehook_report_syscall_xxx(). > > Oleg. I recall that TIF_SYSCALL_TRACE also suffers from the same problem as TIF_NOTIFY_RESUME if set asynchronously with the target thread's execution at least on x86_64 and arm. Do you take care to stop the target thread in utrace_set_events ? Mathieu > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 From info at mondopinione.redhat.com Tue Mar 17 13:41:50 2009 From: info at mondopinione.redhat.com (info at mondopinione.redhat.com) Date: Tue, 17 Mar 2009 14:41:50 +0100 Subject: La tua opinione conta! Message-ID: <9b184e282e9392a751a59d903bde5a15@www.consoleworld.org> Diventa membro di Globaltestmarket , una realt? in cui persone di tutto il mondo partecipano a indagini di opinioni online. Diventa membro di Globaltestmarket e partecipa alle indagini online, dando cos? il tuo contributo alla valutazione di prodotti di consumo nuovi e consolidati, campagne pubblicitarie e anteprime di film e canzoni. E per di pi?? partecipare a GlobalTestMarket ? del tutto gratis. Se avete delle domande su GlobalTestMarket, cliccate qui . GlobalTestMarket . 2835 82nd Ave. SE . Suite S100 . Mercer Island, WA 98040 . USA -- Powered by PHPlist, www.phplist.com -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at redhat.com Wed Mar 18 08:37:40 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 18 Mar 2009 01:37:40 -0700 (PDT) Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: Oleg Nesterov's message of Tuesday, 17 March 2009 02:34:22 +0100 <20090317013422.GB17780@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> <20090313215912.GA1856@redhat.com> <20090316015541.11C33FC3AB@magilla.sf.frob.com> <20090317013422.GB17780@redhat.com> Message-ID: <20090318083740.9EB39FC3AB@magilla.sf.frob.com> > > Wait, what? You just said that set_notify_resume() already implies an mb(). > > Yes, but the other side lacks a barrier. UTRACE_REPORT does > > utrace->report = 1; > wmb(); // actually mb, but wmb is enough > set _TIF_NOTIFY_RESUME; > > do_notify_resume()->utrace_resume()->start_report() path does > > if (_TIF_NOTIFY_RESUME) > // !!! we need rmb in between !!! > if (utrace->report) > ... > > and it can miss ->report. I see. We have a similar problem for (the first) attach, too, right? utrace_add_engine does: utrace_flags |= UTRACE_EVENT(REAP); utrace->report = 1; wmb(); // actually mb, but wmb is enough set _TIF_NOTIFY_RESUME; do_notify_resume()->tracehook_notify_resume() path does: if (_TIF_NOTIFY_RESUME) // !!! we need rmb in between !!! if (utrace_flags != 0) utrace_resume() This is what I put in (4d8a6fd6): --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -616,6 +616,12 @@ static inline void set_notify_resume(struct task_struct *task) static inline void tracehook_notify_resume(struct pt_regs *regs) { struct task_struct *task = current; + /* + * This pairs with the barrier implicit in set_notify_resume(). + * It ensures that we read the nonzero utrace_flags set before + * set_notify_resume() was called by utrace setup. + */ + smp_rmb(); if (task_utrace_flags(task)) utrace_resume(task, regs); } Thanks, Roland From roland at redhat.com Wed Mar 18 08:52:14 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 18 Mar 2009 01:52:14 -0700 (PDT) Subject: utrace_set_events/utrace_control && death/reap checks In-Reply-To: Oleg Nesterov's message of Tuesday, 17 March 2009 03:33:48 +0100 <20090317023348.GC17780@redhat.com> References: <20090313233300.GA14605@redhat.com> <20090316023421.C6136FC3AB@magilla.sf.frob.com> <20090317023348.GC17780@redhat.com> Message-ID: <20090318085214.8961CFC3AB@magilla.sf.frob.com> > But can't we simplify this check a little bit? > > utrace_control: > > else if (unlikely(target->utrace_flags & DEATH_EVENTS) || > unlikely(utrace->death)) { > return -EALREADY; > > can't we just do > > else if (unlikely(utrace->death)) { > return -EALREADY; Yes, it's sufficient. I've changed it. Thanks, Roland From roland at redhat.com Wed Mar 18 11:07:58 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 18 Mar 2009 04:07:58 -0700 (PDT) Subject: Q: utrace->stopped && utrace_report_jctl() In-Reply-To: Oleg Nesterov's message of Tuesday, 17 March 2009 02:21:43 +0100 <20090317012143.GA17780@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> <20090317012143.GA17780@redhat.com> Message-ID: <20090318110758.C7654FC3AB@magilla.sf.frob.com> > Agreed, "bool sig_locked" is awful. But we can avoid it. The real problem > is how to figure out the correct "notify" argument. I'll try to think more, > but I am not sure I will find the clean solution :( It does not seem hard if we move tracehook_notify_jctl inside siglock. > Just in case. We can introduce PF_SIGCONTED flag and change > prepare_signal(SIGCONT) and signal_wake_up(resume => 1) to set this flag. > Since the task never changes its ->flags in finish_stop() path, it is safe > to do this under ->siglock. This way utrace_report_jctl() can drop > TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring > the ->state. But yes sure, this is very, very ugly. Very! No need for this at all. It's OK to change the tracehook definition. I did this on the new git branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it. This drops all the JCTL bit kludgery and utrace_report_jctl just backs out of TASK_STOPPED before dropping the siglock in the first place. I think the bookkeeping covers all the angles, but please check it in the new code. Also please verify if you think all ->stopped bookkeeping is bulletproof now. I fiddled utrace_get_signal() a little because I wasn't quite sure that all possibly paths there after TASK_STOPPED were resetting it. With that, please tell me if the current code fixes all the issues (not just this one) you've noticed or what I've still missed. I want to post it to LKML in the next day or two so it has aired before the 2.6.30 merge window. If we've covered things that would hold up review and initial merge now, many follow-on changes will probably go in easily as we have them. Thanks, Roland From oleg at redhat.com Wed Mar 18 18:15:12 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 18 Mar 2009 19:15:12 +0100 Subject: [PATCH] simplify do_signal_stop() && utrace_report_jctl() interaction In-Reply-To: <20090318110758.C7654FC3AB@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> <20090317012143.GA17780@redhat.com> <20090318110758.C7654FC3AB@magilla.sf.frob.com> Message-ID: <20090318181512.GA697@redhat.com> On 03/18, Roland McGrath wrote: > > It's OK to change the tracehook definition. I did this on the new git > branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it. Roland, I think it better to change tracehook definition more, please see below. > This drops all the JCTL bit kludgery and utrace_report_jctl just backs out > of TASK_STOPPED before dropping the siglock in the first place. I think > the bookkeeping covers all the angles, but please check it in the new code. Heh. I was thinking about the very similar change. But I have problems with tracehook_notify_jctl(). Please find the patch below, on top of your changes. At the cost of one additional ->group_stop_count != 0 in do_signal_stop(), we can avoid playing with state/group_stop_count/flags twice. But, with or without this patch we have a small problem: we can wrongly send SIGCHLD twice. I'll write a separate email. > Also please verify if you think all ->stopped bookkeeping is bulletproof > now. I fiddled utrace_get_signal() a little because I wasn't quite sure > that all possibly paths there after TASK_STOPPED were resetting it. Will do. I didn't study the signal part of utrace yet. > I want to post it > to LKML in the next day or two so it has aired before the 2.6.30 merge > window. Yes! I think it should be posted really soon. BTW. exit_signals() calls tracehook_notify_jctl(why => CLD_STOPPED), could you confirm this is right? ------------------------------------------------------------------------- [PATCH] simplify do_signal_stop() && utrace_report_jctl() interaction do_signal_stop() can call utrace_report_jctl() before decrementing ->group_stop_count and setting TASK_STOPPED/SIGNAL_STOP_STOPPED. This allow to greatly simplify utrace_report_jctl() and avoid playing with group-stop bookkeeping twice. Signed-off-by: Oleg Nesterov signal.c | 29 +++++++++++------------------ utrace.c | 20 -------------------- 2 files changed, 11 insertions(+), 38 deletions(-) --- xxx/kernel/signal.c~JCTL_SIMPLIFY 2009-03-18 14:50:06.000000000 +0100 +++ xxx/kernel/signal.c 2009-03-18 18:20:35.000000000 +0100 @@ -1638,16 +1638,9 @@ void ptrace_notify(int exit_code) static int do_signal_stop(int signr) { struct signal_struct *sig = current->signal; - int stop_count; int notify; - if (sig->group_stop_count > 0) { - /* - * There is a group stop in progress. We don't need to - * start another one. - */ - stop_count = --sig->group_stop_count; - } else { + if (!sig->group_stop_count) { struct task_struct *t; if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) || @@ -1659,7 +1652,7 @@ static int do_signal_stop(int signr) */ sig->group_exit_code = signr; - stop_count = 0; + sig->group_stop_count = 1; for (t = next_thread(current); t != current; t = next_thread(t)) /* * Setting state to TASK_STOPPED for a group @@ -1668,25 +1661,25 @@ static int do_signal_stop(int signr) */ if (!(t->flags & PF_EXITING) && !task_is_stopped_or_traced(t)) { - stop_count++; + sig->group_stop_count++; signal_wake_up(t, 0); } - sig->group_stop_count = stop_count; } - if (stop_count == 0) - sig->flags = SIGNAL_STOP_STOPPED; - current->exit_code = sig->group_exit_code; - __set_current_state(TASK_STOPPED); - /* * If there are no other threads in the group, or if there is * a group stop in progress and we are the last to stop, * report to the parent. When ptraced, every thread reports itself. */ - notify = tracehook_notify_jctl(stop_count == 0 ? CLD_STOPPED : 0, - CLD_STOPPED); + notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0; + notify = tracehook_notify_jctl(notify, CLD_STOPPED); + if (sig->group_stop_count) { + if (!--sig->group_stop_count) + sig->flags = SIGNAL_STOP_STOPPED; + current->exit_code = sig->group_exit_code; + __set_current_state(TASK_STOPPED); + } spin_unlock_irq(¤t->sighand->siglock); if (notify) { --- xxx/kernel/utrace.c~JCTL_SIMPLIFY 2009-03-18 14:50:06.000000000 +0100 +++ xxx/kernel/utrace.c 2009-03-18 18:23:01.000000000 +0100 @@ -1618,18 +1618,7 @@ void utrace_report_jctl(int notify, int struct task_struct *task = current; struct utrace *utrace = task_utrace_struct(task); INIT_REPORT(report); - bool stop = task_is_stopped(task); - /* - * We have to come out of TASK_STOPPED in case the event report - * hooks might block. Since we held the siglock throughout, it's - * as if we were never in TASK_STOPPED yet at all. - */ - if (stop) { - __set_current_state(TASK_RUNNING); - task->signal->flags &= ~SIGNAL_STOP_STOPPED; - ++task->signal->group_stop_count; - } spin_unlock_irq(&task->sighand->siglock); /* @@ -1654,16 +1643,7 @@ void utrace_report_jctl(int notify, int REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), report_jctl, what, notify); - /* - * Retake the lock, and go back into TASK_STOPPED - * unless the stop was just cleared. - */ spin_lock_irq(&task->sighand->siglock); - if (stop && task->signal->group_stop_count > 0) { - __set_current_state(TASK_STOPPED); - if (--task->signal->group_stop_count == 0) - task->signal->flags |= SIGNAL_STOP_STOPPED; - } } /* From oleg at redhat.com Wed Mar 18 18:22:45 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 18 Mar 2009 19:22:45 +0100 Subject: Q: utrace_reset() && UTRACE_EVENT(REAP) In-Reply-To: <20090318083740.9EB39FC3AB@magilla.sf.frob.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312195021.GB3529@redhat.com> <20090312231607.7F9E5FC3B6@magilla.sf.frob.com> <20090313215912.GA1856@redhat.com> <20090316015541.11C33FC3AB@magilla.sf.frob.com> <20090317013422.GB17780@redhat.com> <20090318083740.9EB39FC3AB@magilla.sf.frob.com> Message-ID: <20090318182245.GB697@redhat.com> On 03/18, Roland McGrath wrote: > > > Yes, but the other side lacks a barrier. UTRACE_REPORT does > > > > utrace->report = 1; > > wmb(); // actually mb, but wmb is enough > > set _TIF_NOTIFY_RESUME; > > > > do_notify_resume()->utrace_resume()->start_report() path does > > > > if (_TIF_NOTIFY_RESUME) > > // !!! we need rmb in between !!! > > if (utrace->report) > > ... > > > > and it can miss ->report. > > I see. We have a similar problem for (the first) attach, too, right? > utrace_add_engine does: Yes sure. I just meant the barrier was needed even before you changed utrace_add_engine() to set ->report. > --- a/include/linux/tracehook.h > +++ b/include/linux/tracehook.h > @@ -616,6 +616,12 @@ static inline void set_notify_resume(struct task_struct *task) > static inline void tracehook_notify_resume(struct pt_regs *regs) > { > struct task_struct *task = current; > + /* > + * This pairs with the barrier implicit in set_notify_resume(). > + * It ensures that we read the nonzero utrace_flags set before > + * set_notify_resume() was called by utrace setup. > + */ > + smp_rmb(); smp_mb__after_clear_bit() is enough, but I agree, smp_rmb() is more understandable. Oleg. From oleg at redhat.com Wed Mar 18 19:49:41 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 18 Mar 2009 20:49:41 +0100 Subject: PATCH? tracehook_notify_jctl && SIGCONT In-Reply-To: <20090318181512.GA697@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> <20090317012143.GA17780@redhat.com> <20090318110758.C7654FC3AB@magilla.sf.frob.com> <20090318181512.GA697@redhat.com> Message-ID: <20090318194941.GA7563@redhat.com> On 03/18, Oleg Nesterov wrote: > > On 03/18, Roland McGrath wrote: > > > > It's OK to change the tracehook definition. I did this on the new git > > branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it. > > Roland, I think it better to change tracehook definition more, please > see below. The problem is that, since utrace_report_jctl() drops ->siglock, tracehook_notify_jctl() can return false positive. This is easy to fix, but then we have to check PT_PTRACED twice, not good. Suppose we have 2 threads, T1 and T2, T1 has JCTL in ->utrace_flags. T2 dequeues SIGSTOP, calls do_signal_stop(), and sleeps in TASK_STOPPED. T1 calls do_signal_stop(). ->group_stop_count == 1, so it does notify = tracehook_notify_jctl(notify => CLD_STOPPED), this means that notify always becomes CLD_STOPPED. When tracehook_notify_jctl()->utrace_notify_jctl() drops siglock, SIGCONT comes, notices ->group_stop_count != 0, and adds SIGNAL_CLD_STOPPED to signal flags. Now we send SIGCHLD with si_code = CLD_STOPPED twice. By T1 from do_signal_stop(), and by T1 or T2 from get_signal_to_deliver() which checks SIGNAL_CLD_MASK. I'd suggest something like the patch below. At least for now. Oleg. --- xxx/include/linux/tracehook.h~JCTL_NOTIFY 2009-03-18 14:50:05.000000000 +0100 +++ xxx/include/linux/tracehook.h 2009-03-18 20:18:54.000000000 +0100 @@ -520,11 +520,10 @@ static inline int tracehook_get_signal(s * * Called with the siglock held. */ -static inline int tracehook_notify_jctl(int notify, int why) +static inline void tracehook_notify_jctl(int notify, int why) { if (task_utrace_flags(current) & UTRACE_EVENT(JCTL)) utrace_report_jctl(notify, why); - return notify ?: (current->ptrace & PT_PTRACED) ? why : 0; } #define DEATH_REAP -1 --- xxx/kernel/signal.c~JCTL_NOTIFY 2009-03-18 18:20:35.000000000 +0100 +++ xxx/kernel/signal.c 2009-03-18 20:28:39.000000000 +0100 @@ -1671,18 +1671,21 @@ static int do_signal_stop(int signr) * a group stop in progress and we are the last to stop, * report to the parent. When ptraced, every thread reports itself. */ - notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0; - notify = tracehook_notify_jctl(notify, CLD_STOPPED); + tracehook_notify_jctl(sig->group_stop_count == 1 ? CLD_STOPPED : 0, + CLD_STOPPED); + notify = 0; if (sig->group_stop_count) { - if (!--sig->group_stop_count) + if (!--sig->group_stop_count) { sig->flags = SIGNAL_STOP_STOPPED; + notify = 1; + } current->exit_code = sig->group_exit_code; __set_current_state(TASK_STOPPED); } spin_unlock_irq(¤t->sighand->siglock); - if (notify) { + if (notify || (current->ptrace & PT_PTRACED)) { read_lock(&tasklist_lock); do_notify_parent_cldstop(current, notify); read_unlock(&tasklist_lock); @@ -1765,14 +1768,12 @@ relock: ? CLD_CONTINUED : CLD_STOPPED; signal->flags &= ~SIGNAL_CLD_MASK; - why = tracehook_notify_jctl(why, CLD_CONTINUED); + tracehook_notify_jctl(why, CLD_CONTINUED); spin_unlock_irq(&sighand->siglock); - if (why) { - read_lock(&tasklist_lock); - do_notify_parent_cldstop(current->group_leader, why); - read_unlock(&tasklist_lock); - } + read_lock(&tasklist_lock); + do_notify_parent_cldstop(current->group_leader, why); + read_unlock(&tasklist_lock); goto relock; } @@ -1930,7 +1931,8 @@ void exit_signals(struct task_struct *ts if (unlikely(tsk->signal->group_stop_count) && !--tsk->signal->group_stop_count) { tsk->signal->flags = SIGNAL_STOP_STOPPED; - group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED); + tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED); + group_stop = 1; } out: spin_unlock_irq(&tsk->sighand->siglock); From no-reply at BancoPostaonline.it Wed Mar 18 17:54:23 2009 From: no-reply at BancoPostaonline.it (BancoPostaonline ) Date: Wed, 18 Mar 2009 12:54:23 -0500 Subject: Misure di Sicurezza ! Message-ID: <1237398863.13930.qmail@BancoPostaonline.it> An HTML attachment was scrubbed... URL: From roland at redhat.com Thu Mar 19 07:43:16 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 19 Mar 2009 00:43:16 -0700 (PDT) Subject: [PATCH] simplify do_signal_stop() && utrace_report_jctl() interaction In-Reply-To: Oleg Nesterov's message of Wednesday, 18 March 2009 19:15:12 +0100 <20090318181512.GA697@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> <20090317012143.GA17780@redhat.com> <20090318110758.C7654FC3AB@magilla.sf.frob.com> <20090318181512.GA697@redhat.com> Message-ID: <20090319074316.B68D8FC3AB@magilla.sf.frob.com> > Roland, I think it better to change tracehook definition more, please > see below. I don't really object to this in principle. But this touches signal.c a lot more in less obviously-trivial ways than my tracehook patch. That is more of an issue at the outset than some extra fiddling in the utrace code. I think we should consider this for a later clean-up after merging. > BTW. exit_signals() calls tracehook_notify_jctl(why => CLD_STOPPED), > could you confirm this is right? Yes, it's right. I considered passing CLD_EXITED here to distinguish this odd case, but that would make the vanilla tracehook_notify_jctl() definition less simple. Instead, we put the onus on a ->report_jctl hook to check for PF_EXITING to tell if it's really going to stop. Thanks, Roland From roland at redhat.com Thu Mar 19 07:47:50 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 19 Mar 2009 00:47:50 -0700 (PDT) Subject: PATCH? tracehook_notify_jctl && SIGCONT In-Reply-To: Oleg Nesterov's message of Wednesday, 18 March 2009 20:49:41 +0100 <20090318194941.GA7563@redhat.com> References: <20090311222401.GA13512@redhat.com> <20090312073652.75811FC3B6@magilla.sf.frob.com> <20090312190738.GA3529@redhat.com> <20090312224055.BA71CFC3B6@magilla.sf.frob.com> <20090314001420.GA15677@redhat.com> <20090315223300.GA10526@redhat.com> <20090316011401.8EAE7FC3AB@magilla.sf.frob.com> <20090317012143.GA17780@redhat.com> <20090318110758.C7654FC3AB@magilla.sf.frob.com> <20090318181512.GA697@redhat.com> <20090318194941.GA7563@redhat.com> Message-ID: <20090319074750.4EB4EFC3AB@magilla.sf.frob.com> > Now we send SIGCHLD with si_code = CLD_STOPPED twice. By T1 from > do_signal_stop(), and by T1 or T2 from get_signal_to_deliver() which > checks SIGNAL_CLD_MASK. Yes, I considered this problem. It's just not so big a deal as to worry overmuch about this corner case in the first version. What seems to me will be the obvious and straightforward way to address it is to give utrace_report_jctl() a return value that tracehook_notify_jctl() uses. Then we can omit a notification that has been superceded. Your patch does not seem very straightforward to me. Moreover, you moved some ptrace magic out of the tracehook function back into core signals code. That is going in the wrong direction and we won't have any of that. Thanks, Roland From Holbrook_Serena at daeilind.com Thu Mar 19 08:31:47 2009 From: Holbrook_Serena at daeilind.com (Kara Arellano) Date: Thu, 19 Mar 2009 16:31:47 +0800 (CST) Subject: Send emails directly to dentists Message-ID: <20090319083147.7B1E2D87343@mailcenter.gdrc.com> The package below is valued at over $2000 when purchased individually Currently Practicing Physicians in America 788,981 in total * 17,019 emails Physicians in many different specialties Over a dozen sortable fields American Pharmaceutical Company Listing 47,000 names and emails of the major positions Hospitals in the US complete contact information for CEO's, CFO's, Directors and more - over 23,000 listings in total for more than 7,000 hospitals in the USA Extensive Contact List of Dentists in the USA Practically every dentist in America is listed here US Chiropractor List 100,000 Chiropractors in the USA (worth $250 alone) This week's special price = $397 for everything send us an email: Jack at thebestdatamed.com above expires on March 21 to stop this email in future email us at xyz at thebestdatamed.com From roland at redhat.com Thu Mar 19 10:34:34 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 19 Mar 2009 03:34:34 -0700 (PDT) Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: Mathieu Desnoyers's message of Tuesday, 17 March 2009 12:00:29 -0400 <20090317160029.GD10092@Krystal> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090317052442.GA32674@redhat.com> <20090317160029.GD10092@Krystal> Message-ID: <20090319103434.CBE69FC3AB@magilla.sf.frob.com> The utrace API itself is not a good fit for global tracing, since its purpose is tracing and control of individual user threads. There is no reason to allocate its per-task data structures when you are going to treat all tasks the same anyway. The points that I think are being missed are about the possibilities of overloading TIF_SYSCALL_TRACE. It's true that ptrace uses TIF_SYSCALL_TRACE as a flag for whether you are in the middle of a PTRACE_SYSCALL, so it can be confused by setting it for other purposes on a task that is also ptrace'd (but not with PTRACE_SYSCALL). Until we are able to do away with these parts of the old ptrace innards, you can't overload TIF_SYSCALL_TRACE without perturbing ptrace behavior. The utrace code does not have this problem. It keeps its own state bits, so for it, TIF_SYSCALL_TRACE means exactly "the task will call tracehook_report_syscall_*" and no more. To use TIF_SYSCALL_TRACE for another purpose, just set it on all the tasks you like (and/or set it on new tasks in fork.c) and add your code (tracepoints, whatever) to tracehook_report_syscall_* alongside the calls there into utrace. There is exactly one place in utrace code that clears TIF_SYSCALL_TRACE, and you just add "&& !global_syscall_tracing_enabled" to the condition there. You don't need to bother clearing TIF_SYSCALL_TRACE on any task when you're done. If your "global_syscall_tracing_enabled" (or whatever it is) is clear, each task will lazily fall into utrace at its next syscall entry/exit and then utrace will reset TIF_SYSCALL_TRACE when it finds no reason left to have it on. I'm not really going to delve into utrace internals in this thread. Please raise those questions in review of the utrace patches when current code is actually posted, where they belong. Here I'll just mention the relevant things that relate to the underlying issue you raised about synchronization. As thoroughly documented, utrace_set_events() is a quick, asynchronous call that itself makes no guarantees about how quickly a running task will start to report the newly-requested events. For purposes relevant here, it just sets TIF_SYSCALL_TRACE and nothing else. In utrace, if you want synchronous assurance that a task misses no events you ask for, then you must first use utrace_control (et al) to stop it and synchronize. That is not something that makes much sense at all for a "flip on global tracing" operation, which is not generally especially synchronous with anything else. If you want best effort that a task will pick up newly-requested events Real Soon Now, you can use utrace_control with just UTRACE_REPORT. For purposes relevant here, this just does set_notify_resume(). That will send an IPI if the task is running, and then it will start noticing before it returns to user mode. So: set_tsk_thread_flag(task, TIF_SYSCALL_TRACE); set_notify_resume(task); is what I would expect you to do on each task if you want to quickly get it to start hitting tracehook_report_syscall_*. (I'm a bit dubious that there is really any need to speed it up with set_notify_resume, but that's just me.) Finally, some broader points about TIF_SYSCALL_TRACE that I think have been overlooked. The key special feature of TIF_SYSCALL_TRACE is that it gets you to a place where full user_regset access is available. Debuggers need this to read (and write) the full user register state arbitrarily, which they also need to do user backtraces and the like. If you do not need user_regset to work, then you don't need to be on this (slowest) code path. If you are only interested in reading syscall arguments and results (or even in changing syscall results in exit tracing) then you do not need user_regset and you do not need to take the slowest syscall path. (If you are doing backtraces but already rely on full kernel stack unwinding to do it, you also do not need user_regset.) From anywhere inside the kernel, you can use the asm/syscall.h calls to read syscall args, whichever entry path the task took. The other mechanism to hook into every syscall entry/exit is TIF_SYSCALL_AUDIT. On some machines (like x86), this takes a third, "warm" code path that is faster than the TIF_SYSCALL_TRACE path (though obviously still off the fastest direct code path). It can be faster precisely because it doesn't need to allow for user_regset access, nor for modification of syscall arguments in entry tracing. For normal read-only tracing of just the actual syscall details, it has all you need. Unfortunately the arch code all looks like: if (unlikely(current->audit_context)) audit_syscall_{entry,exit}(...); So we need to change that to: if (unlikely(test_thread_flag(TIF_SYSCALL_AUDIT))) audit_syscall_{entry,exit}(...); But that is pretty easy to get right, even doing it blind on arch's you can't test. Far better than adding new asm hackery for each arch that's almost identical to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT (and finding out that some are fresh out of TIF bits in the range that their asm code can handle). TIF_SYSCALL_AUDIT is only set when allocating audit_context, and its paths already have !context tests so overloading is harmless today. (Whereas with TIF_SYSCALL_TRACE, you have to wait for later ptrace cleanups or write off using ptrace simultaneously.) Then you can do the lazy disable in audit_syscall_{entry,exit} with: if (unlikely(!context)) { if (unlikely(!global_syscall_tracing_enabled)) clear_thread_flag(TIF_SYSCALL_AUDIT); return; } Plus add there your tracepoint or whatnot. Unless you really plan to use user_regset in your tracepoints, then I think this is a better plan for global syscall tracing than either fiddling with TIF_SYSCALL_TRACE or adding new arch asm requirements. (IMHO, the latter is the worst idea on the table.) Thanks, Roland From remodulation at mebel24.ru Fri Mar 20 10:21:07 2009 From: remodulation at mebel24.ru (Belback Schepp) Date: Fri, 20 Mar 2009 10:21:07 +0000 Subject: Second passionaate youth Message-ID: <49C36B7A.2324573@mebel24.ru> Seccond passionate youth Arjuna comes back safely. I desire to ascertain of men who were remarkable for their character of this world, and accordingly these, when acquired, who, sir?' 'acknowledge miss reynoldsyour granddaughter is explained by nilakantha as sutaram abhava.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From support at chineseclits.com Fri Mar 20 12:01:52 2009 From: support at chineseclits.com (Chalermphon) Date: Fri, 20 Mar 2009 12:01:52 +0000 Subject: 9 Reasons Xxoozero Sucks Message-ID: <587c01c9a953$129f8260$b08fb479@[121.180.143.176]> you'll see the difference Voyage Injuries Altars Grippe Raild Altars Loudly Enchanted Voyage Injuries Taurus Raild Altars Consecrated Injuries Altars Loudly Injuries Slightly get to it -------------- next part -------------- An HTML attachment was scrubbed... URL: From rich47 at alftanes.is Fri Mar 20 23:42:29 2009 From: rich47 at alftanes.is (Blanch Good) Date: Sat, 21 Mar 2009 08:42:29 +0900 Subject: Show her how real man drills Message-ID: <000901c9a9b5$88538740$b771ccdc@LocalHostxwv> Whats your male score? http://spuz.clappingguide.at/ From roland at redhat.com Sat Mar 21 01:39:46 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 20 Mar 2009 18:39:46 -0700 (PDT) Subject: [PATCH 0/3] utrace Message-ID: <20090321013946.890F4FC3AB@magilla.sf.frob.com> utrace is a new kernel-side API for kernel modules, intended to make it tractable to work on novel ways to trace and debug user-mode tasks. These patches apply to the current Linus tree (v2.6.29-rc8-241-g65c2449). The first two should apply fine on the -tip tree as well, and we will be glad to rebase the set to whichever tree. Frank has another version of the ftrace patch (3/3) that works for -tip. The utrace patches don't touch anything unless you set a new kconfig option (still marked EXPERIMENTAL), and so are quite safe in that regard. utrace cannot be enabled without CONFIG_HAVE_ARCH_TRACEHOOK and the arch details it indicates. If your arch does not have it yet, its maintainers will have to work on that. The details are in the comments in arch/Kconfig. The first patch makes a small update to one of the tracehook.h interfaces that we needed for utrace. It moves code a little but does not change any of the logic in the existing code. The second patch adds the utrace kernel API (if CONFIG_UTRACE=y is set). There is no change at all without the config option, and with it there is no effect on anything at all until a kernel module using the utrace API is loaded. There is detailed documentation on the API in DocBook form. The third patch is an ftrace widget based on utrace, by Frank Eigler. Frank will follow up on any issues about that patch. Thanks, Roland From roland at redhat.com Sat Mar 21 01:41:00 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 20 Mar 2009 18:41:00 -0700 (PDT) Subject: [PATCH 1/3] signals: tracehook_notify_jctl change In-Reply-To: Roland McGrath's message of Friday, 20 March 2009 18:39:46 -0700 <20090321013946.890F4FC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> Message-ID: <20090321014100.5C4A9FC3AB@magilla.sf.frob.com> This changes tracehook_notify_jctl() so it's called with the siglock held, and changes its argument and return value definition. These clean-ups make it a better fit for what new tracing hooks need to check. Tracing needs the siglock here, held from the time TASK_STOPPED was set, to avoid potential SIGCONT races if it wants to allow any blocking in its tracing hooks. This also folds the finish_stop() function into its caller do_signal_stop(). The function is short, called only once and only unconditionally. It aids readability to fold it in. Signed-off-by: Roland McGrath --- include/linux/tracehook.h | 25 ++++++++++------ kernel/signal.c | 69 +++++++++++++++++++++++---------------------- 2 files changed, 51 insertions(+), 43 deletions(-) diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 6186a78..b622498 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -1,7 +1,7 @@ /* * Tracing hooks * - * Copyright (C) 2008 Red Hat, Inc. All rights reserved. + * Copyright (C) 2008-2009 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -469,22 +469,29 @@ static inline int tracehook_get_signal(s /** * tracehook_notify_jctl - report about job control stop/continue - * @notify: nonzero if this is the last thread in the group to stop + * @notify: zero, %CLD_STOPPED or %CLD_CONTINUED * @why: %CLD_STOPPED or %CLD_CONTINUED * * This is called when we might call do_notify_parent_cldstop(). - * It's called when about to stop for job control; we are already in - * %TASK_STOPPED state, about to call schedule(). It's also called when - * a delayed %CLD_STOPPED or %CLD_CONTINUED report is ready to be made. * - * Return nonzero to generate a %SIGCHLD with @why, which is - * normal if @notify is nonzero. + * @notify is zero if we would not ordinarily send a %SIGCHLD, + * or is the %CLD_STOPPED or %CLD_CONTINUED .si_code for %SIGCHLD. * - * Called with no locks held. + * @why is %CLD_STOPPED when about to stop for job control; + * we are already in %TASK_STOPPED state, about to call schedule(). + * It might also be that we have just exited (check %PF_EXITING), + * but need to report that a group-wide stop is complete. + * + * @why is %CLD_CONTINUED when waking up after job control stop and + * ready to make a delayed @notify report. + * + * Return the %CLD_* value for %SIGCHLD, or zero to generate no signal. + * + * Called with the siglock held. */ static inline int tracehook_notify_jctl(int notify, int why) { - return notify || (current->ptrace & PT_PTRACED); + return notify ?: (current->ptrace & PT_PTRACED) ? why : 0; } #define DEATH_REAP -1 diff --git a/kernel/signal.c b/kernel/signal.c index 2a74fe8..9a0d98f 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -691,7 +691,7 @@ static int prepare_signal(int sig, struc if (why) { /* - * The first thread which returns from finish_stop() + * The first thread which returns from do_signal_stop() * will take ->siglock, notice SIGNAL_CLD_MASK, and * notify its parent. See get_signal_to_deliver(). */ @@ -1629,29 +1629,6 @@ void ptrace_notify(int exit_code) spin_unlock_irq(¤t->sighand->siglock); } -static void -finish_stop(int stop_count) -{ - /* - * If there are no other threads in the group, or if there is - * a group stop in progress and we are the last to stop, - * report to the parent. When ptraced, every thread reports itself. - */ - if (tracehook_notify_jctl(stop_count == 0, CLD_STOPPED)) { - read_lock(&tasklist_lock); - do_notify_parent_cldstop(current, CLD_STOPPED); - read_unlock(&tasklist_lock); - } - - do { - schedule(); - } while (try_to_freeze()); - /* - * Now we don't run again until continued. - */ - current->exit_code = 0; -} - /* * This performs the stopping for SIGSTOP and other stop signals. * We have to stop all threads in the thread group. @@ -1662,6 +1639,7 @@ static int do_signal_stop(int signr) { struct signal_struct *sig = current->signal; int stop_count; + int notify; if (sig->group_stop_count > 0) { /* @@ -1701,8 +1679,30 @@ static int do_signal_stop(int signr) current->exit_code = sig->group_exit_code; __set_current_state(TASK_STOPPED); + /* + * If there are no other threads in the group, or if there is + * a group stop in progress and we are the last to stop, + * report to the parent. When ptraced, every thread reports itself. + */ + notify = tracehook_notify_jctl(stop_count == 0 ? CLD_STOPPED : 0, + CLD_STOPPED); + spin_unlock_irq(¤t->sighand->siglock); - finish_stop(stop_count); + + if (notify) { + read_lock(&tasklist_lock); + do_notify_parent_cldstop(current, notify); + read_unlock(&tasklist_lock); + } + + do { + schedule(); + } while (try_to_freeze()); + /* + * Now we don't run again until continued. + */ + current->exit_code = 0; + return 1; } @@ -1771,14 +1771,15 @@ relock: int why = (signal->flags & SIGNAL_STOP_CONTINUED) ? CLD_CONTINUED : CLD_STOPPED; signal->flags &= ~SIGNAL_CLD_MASK; - spin_unlock_irq(&sighand->siglock); - if (unlikely(!tracehook_notify_jctl(1, why))) - goto relock; + why = tracehook_notify_jctl(why, CLD_CONTINUED); + spin_unlock_irq(&sighand->siglock); - read_lock(&tasklist_lock); - do_notify_parent_cldstop(current->group_leader, why); - read_unlock(&tasklist_lock); + if (why) { + read_lock(&tasklist_lock); + do_notify_parent_cldstop(current->group_leader, why); + read_unlock(&tasklist_lock); + } goto relock; } @@ -1936,14 +1937,14 @@ void exit_signals(struct task_struct *ts if (unlikely(tsk->signal->group_stop_count) && !--tsk->signal->group_stop_count) { tsk->signal->flags = SIGNAL_STOP_STOPPED; - group_stop = 1; + group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED); } out: spin_unlock_irq(&tsk->sighand->siglock); - if (unlikely(group_stop) && tracehook_notify_jctl(1, CLD_STOPPED)) { + if (unlikely(group_stop)) { read_lock(&tasklist_lock); - do_notify_parent_cldstop(tsk, CLD_STOPPED); + do_notify_parent_cldstop(tsk, group_stop); read_unlock(&tasklist_lock); } } From roland at redhat.com Sat Mar 21 01:41:40 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 20 Mar 2009 18:41:40 -0700 (PDT) Subject: [PATCH 2/3] utrace core References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> Message-ID: <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> This adds the utrace facility, a new modular interface in the kernel for implementing user thread tracing and debugging. This fits on top of the tracehook_* layer, so the new code is well-isolated. The new interface is in and the DocBook utrace book describes it. It allows for multiple separate tracing engines to work in parallel without interfering with each other. Higher-level tracing facilities can be implemented as loadable kernel modules using this layer. The new facility is made optional under CONFIG_UTRACE. When this is not enabled, no new code is added. It can only be enabled on machines that have all the prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK. In this initial version, utrace and ptrace do not play together at all. If ptrace is attached to a thread, the attach calls in the utrace kernel API return -EBUSY. If utrace is attached to a thread, the PTRACE_ATTACH or PTRACE_TRACEME request will return EBUSY to userland. The old ptrace code is otherwise unchanged and nothing using ptrace should be affected by this patch as long as utrace is not used at the same time. In the future we can clean up the ptrace implementation and rework it to use the utrace API. Signed-off-by: Roland McGrath --- Documentation/DocBook/Makefile | 2 +- Documentation/DocBook/utrace.tmpl | 571 +++++++++ fs/proc/array.c | 3 + include/linux/init_task.h | 1 + include/linux/sched.h | 6 + include/linux/tracehook.h | 50 +- include/linux/utrace.h | 692 +++++++++++ include/linux/utrace_struct.h | 58 + init/Kconfig | 9 + kernel/Makefile | 1 + kernel/ptrace.c | 18 +- kernel/utrace.c | 2348 +++++++++++++++++++++++++++++++++++++ 12 files changed, 3756 insertions(+), 3 deletions(-) diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 1462ed8..f5da1b4 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -9,7 +9,7 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \ kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ procfs-guide.xml writing_usb_driver.xml networking.xml \ - kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ + kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml utrace.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ mac80211.xml debugobjects.xml sh.xml regulator.xml diff --git a/Documentation/DocBook/utrace.tmpl b/Documentation/DocBook/utrace.tmpl new file mode 100644 index ...b802c55 100644 --- /dev/null +++ b/Documentation/DocBook/utrace.tmpl @@ -0,0 +1,571 @@ + + + + + + The utrace User Debugging Infrastructure + + + + + utrace concepts + + Introduction + + + utrace is infrastructure code for tracing + and controlling user threads. This is the foundation for writing + tracing engines, which can be loadable kernel modules. + + + + The basic actors in utrace are the thread + and the tracing engine. A tracing engine is some body of code that + calls into the <linux/utrace.h> + interfaces, represented by a struct + utrace_engine_ops. (Usually it's a kernel module, + though the legacy ptrace support is a tracing + engine that is not in a kernel module.) The interface operates on + individual threads (struct task_struct). + If an engine wants to treat several threads as a group, that is up + to its higher-level code. + + + + Tracing begins by attaching an engine to a thread, using + utrace_attach_task or + utrace_attach_pid. If successful, it returns a + pointer that is the handle used in all other calls. + + + + + Events and Callbacks + + + An attached engine does nothing by default. An engine makes something + happen by requesting callbacks via utrace_set_events + and poking the thread with utrace_control. + The synchronization issues related to these two calls + are discussed further below in . + + + + Events are specified using the macro + UTRACE_EVENT(type). + Each event type is associated with a callback in struct + utrace_engine_ops. A tracing engine can leave unused + callbacks NULL. The only callbacks required + are those used by the event flags it sets. + + + + Many engines can be attached to each thread. When a thread has an + event, each engine gets a callback if it has set the event flag for + that event type. Engines are called in the order they attached. + Engines that attach after the event has occurred do not get callbacks + for that event. This includes any new engines just attached by an + existing engine's callback function. Once the sequence of callbacks + for that one event has completed, such new engines are then eligible in + the next sequence that starts when there is another event. + + + + Event reporting callbacks have details particular to the event type, + but are all called in similar environments and have the same + constraints. Callbacks are made from safe points, where no locks + are held, no special resources are pinned (usually), and the + user-mode state of the thread is accessible. So, callback code has + a pretty free hand. But to be a good citizen, callback code should + never block for long periods. It is fine to block in + kmalloc and the like, but never wait for i/o or + for user mode to do something. If you need the thread to wait, use + UTRACE_STOP and return from the callback + quickly. When your i/o finishes or whatever, you can use + utrace_control to resume the thread. + + + + + Stopping Safely + + Writing well-behaved callbacks + + + Well-behaved callbacks are important to maintain two essential + properties of the interface. The first of these is that unrelated + tracing engines should not interfere with each other. If your engine's + event callback does not return quickly, then another engine won't get + the event notification in a timely manner. The second important + property is that tracing should be as noninvasive as possible to the + normal operation of the system overall and of the traced thread in + particular. That is, attached tracing engines should not perturb a + thread's behavior, except to the extent that changing its user-visible + state is explicitly what you want to do. (Obviously some perturbation + is unavoidable, primarily timing changes, ranging from small delays due + to the overhead of tracing, to arbitrary pauses in user code execution + when a user stops a thread with a debugger for examination.) Even when + you explicitly want the perturbation of making the traced thread block, + just blocking directly in your callback has more unwanted effects. For + example, the CLONE event callbacks are called when + the new child thread has been created but not yet started running; the + child can never be scheduled until the CLONE + tracing callbacks return. (This allows engines tracing the parent to + attach to the child.) If a CLONE event callback + blocks the parent thread, it also prevents the child thread from + running (even to process a SIGKILL). If what you + want is to make both the parent and child block, then use + utrace_attach_task on the child and then use + UTRACE_STOP on both threads. A more crucial + problem with blocking in callbacks is that it can prevent + SIGKILL from working. A thread that is blocking + due to UTRACE_STOP will still wake up and die + immediately when sent a SIGKILL, as all threads + should. Relying on the utrace + infrastructure rather than on private synchronization calls in event + callbacks is an important way to help keep tracing robustly + noninvasive. + + + + + Using <constant>UTRACE_STOP</constant> + + + To control another thread and access its state, it must be stopped + with UTRACE_STOP. This means that it is + stopped and won't start running again while we access it. When a + thread is not already stopped, utrace_control + returns -EINPROGRESS and an engine must wait + for an event callback when the thread is ready to stop. The thread + may be running on another CPU or may be blocked. When it is ready + to be examined, it will make callbacks to engines that set the + UTRACE_EVENT(QUIESCE) event bit. To wake up an + interruptible wait, use UTRACE_INTERRUPT. + + + + As long as some engine has used UTRACE_STOP and + not called utrace_control to resume the thread, + then the thread will remain stopped. SIGKILL + will wake it up, but it will not run user code. When the stop is + cleared with utrace_control or a callback + return value, the thread starts running again. + (See also .) + + + + + + + Tear-down Races + + Primacy of <constant>SIGKILL</constant> + + Ordinarily synchronization issues for tracing engines are kept fairly + straightforward by using UTRACE_STOP. You ask a + thread to stop, and then once it makes the + report_quiesce callback it cannot do anything else + that would result in another callback, until you let it with a + utrace_control call. This simple arrangement + avoids complex and error-prone code in each one of a tracing engine's + event callbacks to keep them serialized with the engine's other + operations done on that thread from another thread of control. + However, giving tracing engines complete power to keep a traced thread + stuck in place runs afoul of a more important kind of simplicity that + the kernel overall guarantees: nothing can prevent or delay + SIGKILL from making a thread die and release its + resources. To preserve this important property of + SIGKILL, it as a special case can break + UTRACE_STOP like nothing else normally can. This + includes both explicit SIGKILL signals and the + implicit SIGKILL sent to each other thread in the + same thread group by a thread doing an exec, or processing a fatal + signal, or making an exit_group system call. A + tracing engine can prevent a thread from beginning the exit or exec or + dying by signal (other than SIGKILL) if it is + attached to that thread, but once the operation begins, no tracing + engine can prevent or delay all other threads in the same thread group + dying. + + + + Final callbacks + + The report_reap callback is always the final event + in the life cycle of a traced thread. Tracing engines can use this as + the trigger to clean up their own data structures. The + report_death callback is always the penultimate + event a tracing engine might see; it's seen unless the thread was + already in the midst of dying when the engine attached. Many tracing + engines will have no interest in when a parent reaps a dead process, + and nothing they want to do with a zombie thread once it dies; for + them, the report_death callback is the natural + place to clean up data structures and detach. To facilitate writing + such engines robustly, given the asynchrony of + SIGKILL, and without error-prone manual + implementation of synchronization schemes, the + utrace infrastructure provides some special + guarantees about the report_death and + report_reap callbacks. It still takes some care + to be sure your tracing engine is robust to tear-down races, but these + rules make it reasonably straightforward and concise to handle a lot of + corner cases correctly. + + + + Engine and task pointers + + The first sort of guarantee concerns the core data structures + themselves. struct utrace_engine is + a reference-counted data structure. While you hold a reference, an + engine pointer will always stay valid so that you can safely pass it to + any utrace call. Each call to + utrace_attach_task or + utrace_attach_pid returns an engine pointer with a + reference belonging to the caller. You own that reference until you + drop it using utrace_engine_put. There is an + implicit reference on the engine while it is attached. So if you drop + your only reference, and then use + utrace_attach_task without + UTRACE_ATTACH_CREATE to look up that same engine, + you will get the same pointer with a new reference to replace the one + you dropped, just like calling utrace_engine_get. + When an engine has been detached, either explicitly with + UTRACE_DETACH or implicitly after + report_reap, then any references you hold are all + that keep the old engine pointer alive. + + + + There is nothing a kernel module can do to keep a struct + task_struct alive outside of + rcu_read_lock. When the task dies and is reaped + by its parent (or itself), that structure can be freed so that any + dangling pointers you have stored become invalid. + utrace will not prevent this, but it can + help you detect it safely. By definition, a task that has been reaped + has had all its engines detached. All + utrace calls can be safely called on a + detached engine if the caller holds a reference on that engine pointer, + even if the task pointer passed in the call is invalid. All calls + return -ESRCH for a detached engine, which tells + you that the task pointer you passed could be invalid now. Since + utrace_control and + utrace_set_events do not block, you can call those + inside a rcu_read_lock section and be sure after + they don't return -ESRCH that the task pointer is + still valid until rcu_read_unlock. The + infrastructure never holds task references of its own. Though neither + rcu_read_lock nor any other lock is held while + making a callback, it's always guaranteed that the struct + task_struct and the struct + utrace_engine passed as arguments remain valid + until the callback function returns. + + + + The common means for safely holding task pointers that is available to + kernel modules is to use struct pid, which + permits put_pid from kernel modules. When using + that, the calls utrace_attach_pid, + utrace_control_pid, + utrace_set_events_pid, and + utrace_barrier_pid are available. + + + + + + Serialization of <constant>DEATH</constant> and <constant>REAP</constant> + + + The second guarantee is the serialization of + DEATH and REAP event + callbacks for a given thread. The actual reaping by the parent + (release_task call) can occur simultaneously + while the thread is still doing the final steps of dying, including + the report_death callback. If a tracing engine + has requested both DEATH and + REAP event reports, it's guaranteed that the + report_reap callback will not be made until + after the report_death callback has returned. + If the report_death callback itself detaches + from the thread, then the report_reap callback + will never be made. Thus it is safe for a + report_death callback to clean up data + structures and detach. + + + + Interlock with final callbacks + + The final sort of guarantee is that a tracing engine will know for sure + whether or not the report_death and/or + report_reap callbacks will be made for a certain + thread. These tear-down races are disambiguated by the error return + values of utrace_set_events and + utrace_control. Normally + utrace_control called with + UTRACE_DETACH returns zero, and this means that no + more callbacks will be made. If the thread is in the midst of dying, + it returns -EALREADY to indicate that the + report_death callback may already be in progress; + when you get this error, you know that any cleanup your + report_death callback does is about to happen or + has just happened--note that if the report_death + callback does not detach, the engine remains attached until the thread + gets reaped. If the thread is in the midst of being reaped, + utrace_control returns -ESRCH + to indicate that the report_reap callback may + already be in progress; this means the engine is implicitly detached + when the callback completes. This makes it possible for a tracing + engine that has decided asynchronously to detach from a thread to + safely clean up its data structures, knowing that no + report_death or report_reap + callback will try to do the same. utrace_detach + returns -ESRCH when the struct + utrace_engine has already been detached, but is + still a valid pointer because of its reference count. A tracing engine + can use this to safely synchronize its own independent multiple threads + of control with each other and with its event callbacks that detach. + + + + In the same vein, utrace_set_events normally + returns zero; if the target thread was stopped before the call, then + after a successful call, no event callbacks not requested in the new + flags will be made. It fails with -EALREADY if + you try to clear UTRACE_EVENT(DEATH) when the + report_death callback may already have begun, if + you try to clear UTRACE_EVENT(REAP) when the + report_reap callback may already have begun, or if + you try to newly set UTRACE_EVENT(DEATH) or + UTRACE_EVENT(QUIESCE) when the target is already + dead or dying. Like utrace_control, it returns + -ESRCH when the thread has already been detached + (including forcible detach on reaping). This lets the tracing engine + know for sure which event callbacks it will or won't see after + utrace_set_events has returned. By checking for + errors, it can know whether to clean up its data structures immediately + or to let its callbacks do the work. + + + + Using <function>utrace_barrier</function> + + When a thread is safely stopped, calling + utrace_control with UTRACE_DETACH + or calling utrace_set_events to disable some events + ensures synchronously that your engine won't get any more of the callbacks + that have been disabled (none at all when detaching). But these can also + be used while the thread is not stopped, when it might be simultaneously + making a callback to your engine. For this situation, these calls return + -EINPROGRESS when it's possible a callback is in + progress. If you are not prepared to have your old callbacks still run, + then you can synchronize to be sure all the old callbacks are finished, + using utrace_barrier. This is necessary if the + kernel module containing your callback code is going to be unloaded. + + + After using UTRACE_DETACH once, further calls to + utrace_control with the same engine pointer will + return -ESRCH. In contrast, after getting + -EINPROGRESS from + utrace_set_events, you can call + utrace_set_events again later and if it returns zero + then know the old callbacks have finished. + + + Unlike all other calls, utrace_barrier (and + utrace_barrier_pid) will accept any engine pointer you + hold a reference on, even if UTRACE_DETACH has already + been used. After any utrace_control or + utrace_set_events call (these do not block), you can + call utrace_barrier to block until callbacks have + finished. This returns -ESRCH only if the engine is + completely detached (finished all callbacks). Otherwise it waits + until the thread is definitely not in the midst of a callback to this + engine and then returns zero, but can return + -ERESTARTSYS if its wait is interrupted. + + + + + + + +utrace core API + + + The utrace API is declared in <linux/utrace.h>. + + +!Iinclude/linux/utrace.h +!Ekernel/utrace.c + + + +Machine State + + + The task_current_syscall function can be used on any + valid struct task_struct at any time, and does + not even require that utrace_attach_task was used at all. + + + + The other ways to access the registers and other machine-dependent state of + a task can only be used on a task that is at a known safe point. The safe + points are all the places where utrace_set_events can + request callbacks (except for the DEATH and + REAP events). So at any event callback, it is safe to + examine current. + + + + One task can examine another only after a callback in the target task that + returns UTRACE_STOP so that task will not return to user + mode after the safe point. This guarantees that the task will not resume + until the same engine uses utrace_control, unless the + task dies suddenly. To examine safely, one must use a pair of calls to + utrace_prepare_examine and + utrace_finish_examine surrounding the calls to + struct user_regset functions or direct examination + of task data structures. utrace_prepare_examine returns + an error if the task is not properly stopped and not dead. After a + successful examination, the paired utrace_finish_examine + call returns an error if the task ever woke up during the examination. If + so, any data gathered may be scrambled and should be discarded. This means + there was a spurious wake-up (which should not happen), or a sudden death. + + +<structname>struct user_regset</structname> + + + The struct user_regset API + is declared in <linux/regset.h>. + + +!Finclude/linux/regset.h + + + + + <filename>System Call Information</filename> + + + This function is declared in <linux/ptrace.h>. + + +!Elib/syscall.c + + + +<filename>System Call Tracing</filename> + + + The arch API for system call information is declared in + <asm/syscall.h>. + Each of these calls can be used only at system call entry tracing, + or can be used only at system call exit and the subsequent safe points + before returning to user mode. + At system call entry tracing means either during a + report_syscall_entry callback, + or any time after that callback has returned UTRACE_STOP. + + +!Finclude/asm-generic/syscall.h + + + + + +Kernel Internals + + + This chapter covers the interface to the tracing infrastructure + from the core of the kernel and the architecture-specific code. + This is for maintainers of the kernel and arch code, and not relevant + to using the tracing facilities described in preceding chapters. + + +Core Calls In + + + These calls are declared in <linux/tracehook.h>. + The core kernel calls these functions at various important places. + + +!Finclude/linux/tracehook.h + + + +Architecture Calls Out + + + An arch that has done all these things sets + CONFIG_HAVE_ARCH_TRACEHOOK. + This is required to enable the utrace code. + + +<filename><asm/ptrace.h></filename> + + + An arch defines these in <asm/ptrace.h> + if it supports hardware single-step or block-step features. + + +!Finclude/linux/ptrace.h arch_has_single_step arch_has_block_step +!Finclude/linux/ptrace.h user_enable_single_step user_enable_block_step +!Finclude/linux/ptrace.h user_disable_single_step + + + + + <filename><asm/syscall.h></filename> + + + An arch provides <asm/syscall.h> that + defines these as inlines, or declares them as exported functions. + These interfaces are described in . + + + + + + <filename><linux/tracehook.h></filename> + + + An arch must define TIF_NOTIFY_RESUME + and TIF_SYSCALL_TRACE + in its <asm/thread_info.h>. + The arch code must call the following functions, all declared + in <linux/tracehook.h> and + described in : + + + + tracehook_notify_resume + + + tracehook_report_syscall_entry + + + tracehook_report_syscall_exit + + + tracehook_signal_handler + + + + + + + + + + + + diff --git a/fs/proc/array.c b/fs/proc/array.c index 7e4877d..0c683ed 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -81,6 +81,7 @@ #include #include #include +#include #include #include @@ -187,6 +188,8 @@ static inline void task_state(struct seq cred->uid, cred->euid, cred->suid, cred->fsuid, cred->gid, cred->egid, cred->sgid, cred->fsgid); + task_utrace_proc_status(m, p); + task_lock(p); if (p->files) fdt = files_fdtable(p->files); diff --git a/include/linux/init_task.h b/include/linux/init_task.h index e752d97..39eebc8 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -181,6 +181,7 @@ extern struct cred init_cred; [PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID), \ }, \ .dirties = INIT_PROP_LOCAL_SINGLE(dirties), \ + INIT_UTRACE(tsk) \ INIT_IDS \ INIT_TRACE_IRQFLAGS \ INIT_LOCKDEP \ diff --git a/include/linux/sched.h b/include/linux/sched.h index 011db2f..786ef2d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -59,6 +59,7 @@ struct sched_param { #include #include #include +#include #include #include @@ -1287,6 +1288,11 @@ struct task_struct { #endif seccomp_t seccomp; +#ifdef CONFIG_UTRACE + struct utrace utrace; + unsigned long utrace_flags; +#endif + /* Thread group tracking */ u32 parent_exec_id; u32 self_exec_id; diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index b622498..6ff7277 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -49,6 +49,7 @@ #include #include #include +#include struct linux_binprm; /** @@ -63,6 +64,8 @@ struct linux_binprm; */ static inline int tracehook_expect_breakpoints(struct task_struct *task) { + if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_CORE))) + return 1; return (task_ptrace(task) & PT_PTRACED) != 0; } @@ -111,6 +114,9 @@ static inline void ptrace_report_syscall static inline __must_check int tracehook_report_syscall_entry( struct pt_regs *regs) { + if ((task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_ENTRY)) && + utrace_report_syscall_entry(regs)) + return 1; ptrace_report_syscall(regs); return 0; } @@ -134,6 +140,8 @@ static inline __must_check int tracehook */ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) { + if (task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_EXIT)) + utrace_report_syscall_exit(regs); ptrace_report_syscall(regs); } @@ -194,6 +202,8 @@ static inline void tracehook_report_exec struct linux_binprm *bprm, struct pt_regs *regs) { + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXEC))) + utrace_report_exec(fmt, bprm, regs); if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) && unlikely(task_ptrace(current) & PT_PTRACED)) send_sig(SIGTRAP, current, 0); @@ -211,6 +221,8 @@ static inline void tracehook_report_exec */ static inline void tracehook_report_exit(long *exit_code) { + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXIT))) + utrace_report_exit(exit_code); ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code); } @@ -254,6 +266,7 @@ static inline int tracehook_prepare_clon static inline void tracehook_finish_clone(struct task_struct *child, unsigned long clone_flags, int trace) { + utrace_init_task(child); ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace); } @@ -280,6 +293,8 @@ static inline void tracehook_report_clon unsigned long clone_flags, pid_t pid, struct task_struct *child) { + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE))) + utrace_report_clone(clone_flags, child); if (unlikely(trace) || unlikely(clone_flags & CLONE_PTRACE)) { /* * The child starts up with an immediate SIGSTOP. @@ -311,6 +326,9 @@ static inline void tracehook_report_clon pid_t pid, struct task_struct *child) { + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE)) && + (clone_flags & CLONE_VFORK)) + utrace_finish_vfork(current); if (unlikely(trace)) ptrace_event(0, trace, pid); } @@ -345,6 +363,7 @@ static inline void tracehook_report_vfor */ static inline void tracehook_prepare_release_task(struct task_struct *task) { + utrace_release_task(task); } /** @@ -359,6 +378,7 @@ static inline void tracehook_prepare_rel static inline void tracehook_finish_release_task(struct task_struct *task) { ptrace_release_task(task); + BUG_ON(task->exit_state != EXIT_DEAD); } /** @@ -380,6 +400,8 @@ static inline void tracehook_signal_hand const struct k_sigaction *ka, struct pt_regs *regs, int stepping) { + if (task_utrace_flags(current)) + utrace_signal_handler(current, stepping); if (stepping) ptrace_notify(SIGTRAP); } @@ -400,6 +422,8 @@ static inline int tracehook_consider_ign int sig, void __user *handler) { + if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_IGN))) + return 1; return (task_ptrace(task) & PT_PTRACED) != 0; } @@ -421,6 +445,9 @@ static inline int tracehook_consider_fat int sig, void __user *handler) { + if (unlikely(task_utrace_flags(task) & (UTRACE_EVENT(SIGNAL_TERM) | + UTRACE_EVENT(SIGNAL_CORE)))) + return 1; return (task_ptrace(task) & PT_PTRACED) != 0; } @@ -435,6 +462,8 @@ static inline int tracehook_consider_fat */ static inline int tracehook_force_sigpending(void) { + if (unlikely(task_utrace_flags(current))) + return utrace_interrupt_pending(); return 0; } @@ -464,6 +493,8 @@ static inline int tracehook_get_signal(s siginfo_t *info, struct k_sigaction *return_ka) { + if (unlikely(task_utrace_flags(task))) + return utrace_get_signal(task, regs, info, return_ka); return 0; } @@ -491,6 +522,8 @@ static inline int tracehook_get_signal(s */ static inline int tracehook_notify_jctl(int notify, int why) { + if (task_utrace_flags(current) & UTRACE_EVENT(JCTL)) + utrace_report_jctl(notify, why); return notify ?: (current->ptrace & PT_PTRACED) ? why : 0; } @@ -514,6 +547,8 @@ static inline int tracehook_notify_jctl( static inline int tracehook_notify_death(struct task_struct *task, void **death_cookie, int group_dead) { + *death_cookie = task_utrace_struct(task); + if (task->exit_signal == -1) return task->ptrace ? SIGCHLD : DEATH_REAP; @@ -550,6 +585,9 @@ static inline void tracehook_report_deat int signal, void *death_cookie, int group_dead) { + smp_mb(); + if (task_utrace_flags(task) & _UTRACE_DEATH_EVENTS) + utrace_report_death(task, death_cookie, group_dead, signal); } #ifdef TIF_NOTIFY_RESUME @@ -579,10 +617,20 @@ static inline void set_notify_resume(str * asynchronously, this will be called again before we return to * user mode. * - * Called without locks. + * Called without locks. However, on some machines this may be + * called with interrupts disabled. */ static inline void tracehook_notify_resume(struct pt_regs *regs) { + struct task_struct *task = current; + /* + * This pairs with the barrier implicit in set_notify_resume(). + * It ensures that we read the nonzero utrace_flags set before + * set_notify_resume() was called by utrace setup. + */ + smp_rmb(); + if (task_utrace_flags(task)) + utrace_resume(task, regs); } #endif /* TIF_NOTIFY_RESUME */ diff --git a/include/linux/utrace.h b/include/linux/utrace.h new file mode 100644 index ...f46cc0f 100644 --- /dev/null +++ b/include/linux/utrace.h @@ -0,0 +1,692 @@ +/* + * utrace infrastructure interface for debugging user processes + * + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + * + * Red Hat Author: Roland McGrath. + * + * This interface allows for notification of interesting events in a + * thread. It also mediates access to thread state such as registers. + * Multiple unrelated users can be associated with a single thread. + * We call each of these a tracing engine. + * + * A tracing engine starts by calling utrace_attach_task() or + * utrace_attach_pid() on the chosen thread, passing in a set of hooks + * (&struct utrace_engine_ops), and some associated data. This produces a + * &struct utrace_engine, which is the handle used for all other + * operations. An attached engine has its ops vector, its data, and an + * event mask controlled by utrace_set_events(). + * + * For each event bit that is set, that engine will get the + * appropriate ops->report_*() callback when the event occurs. The + * &struct utrace_engine_ops need not provide callbacks for an event + * unless the engine sets one of the associated event bits. + */ + +#ifndef _LINUX_UTRACE_H +#define _LINUX_UTRACE_H 1 + +#include +#include +#include +#include + +struct linux_binprm; +struct pt_regs; +struct utrace; +struct user_regset; +struct user_regset_view; + +/* + * Event bits passed to utrace_set_events(). + * These appear in &struct task_struct. at utrace_flags + * and &struct utrace_engine. at flags. + */ +enum utrace_events { + _UTRACE_EVENT_QUIESCE, /* Thread is available for examination. */ + _UTRACE_EVENT_REAP, /* Zombie reaped, no more tracing possible. */ + _UTRACE_EVENT_CLONE, /* Successful clone/fork/vfork just done. */ + _UTRACE_EVENT_EXEC, /* Successful execve just completed. */ + _UTRACE_EVENT_EXIT, /* Thread exit in progress. */ + _UTRACE_EVENT_DEATH, /* Thread has died. */ + _UTRACE_EVENT_SYSCALL_ENTRY, /* User entered kernel for system call. */ + _UTRACE_EVENT_SYSCALL_EXIT, /* Returning to user after system call. */ + _UTRACE_EVENT_SIGNAL, /* Signal delivery will run a user handler. */ + _UTRACE_EVENT_SIGNAL_IGN, /* No-op signal to be delivered. */ + _UTRACE_EVENT_SIGNAL_STOP, /* Signal delivery will suspend. */ + _UTRACE_EVENT_SIGNAL_TERM, /* Signal delivery will terminate. */ + _UTRACE_EVENT_SIGNAL_CORE, /* Signal delivery will dump core. */ + _UTRACE_EVENT_JCTL, /* Job control stop or continue completed. */ + _UTRACE_NEVENTS +}; +#define UTRACE_EVENT(type) (1UL << _UTRACE_EVENT_##type) + +/* + * All the kinds of signal events. + * These all use the @report_signal() callback. + */ +#define UTRACE_EVENT_SIGNAL_ALL (UTRACE_EVENT(SIGNAL) \ + | UTRACE_EVENT(SIGNAL_IGN) \ + | UTRACE_EVENT(SIGNAL_STOP) \ + | UTRACE_EVENT(SIGNAL_TERM) \ + | UTRACE_EVENT(SIGNAL_CORE)) +/* + * Both kinds of syscall events; these call the @report_syscall_entry() + * and @report_syscall_exit() callbacks, respectively. + */ +#define UTRACE_EVENT_SYSCALL \ + (UTRACE_EVENT(SYSCALL_ENTRY) | UTRACE_EVENT(SYSCALL_EXIT)) + +/* + * The event reports triggered synchronously by task death. + */ +#define _UTRACE_DEATH_EVENTS (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE)) + +/* + * Hooks in call these entry points to the + * utrace dispatch. They are weak references here only so + * tracehook.h doesn't need to #ifndef CONFIG_UTRACE them to + * avoid external references in case of unoptimized compilation. + */ +bool utrace_interrupt_pending(void) + __attribute__((weak)); +void utrace_resume(struct task_struct *, struct pt_regs *) + __attribute__((weak)); +int utrace_get_signal(struct task_struct *, struct pt_regs *, + siginfo_t *, struct k_sigaction *) + __attribute__((weak)); +void utrace_report_clone(unsigned long, struct task_struct *) + __attribute__((weak)); +void utrace_finish_vfork(struct task_struct *) + __attribute__((weak)); +void utrace_report_exit(long *exit_code) + __attribute__((weak)); +void utrace_report_death(struct task_struct *, struct utrace *, bool, int) + __attribute__((weak)); +void utrace_report_jctl(int notify, int type) + __attribute__((weak)); +void utrace_report_exec(struct linux_binfmt *, struct linux_binprm *, + struct pt_regs *regs) + __attribute__((weak)); +bool utrace_report_syscall_entry(struct pt_regs *) + __attribute__((weak)); +void utrace_report_syscall_exit(struct pt_regs *) + __attribute__((weak)); +void utrace_signal_handler(struct task_struct *, int) + __attribute__((weak)); + +#ifndef CONFIG_UTRACE + +/* + * uses these accessors to avoid #ifdef CONFIG_UTRACE. + */ +static inline unsigned long task_utrace_flags(struct task_struct *task) +{ + return 0; +} +static inline struct utrace *task_utrace_struct(struct task_struct *task) +{ + return NULL; +} +static inline void utrace_init_task(struct task_struct *child) +{ +} +static inline void utrace_release_task(struct task_struct *task) +{ +} + +static inline void task_utrace_proc_status(struct seq_file *m, + struct task_struct *p) +{ +} + +#else /* CONFIG_UTRACE */ + +static inline unsigned long task_utrace_flags(struct task_struct *task) +{ + return task->utrace_flags; +} + +static inline struct utrace *task_utrace_struct(struct task_struct *task) +{ + return &task->utrace; +} + +static inline void utrace_init_task(struct task_struct *task) +{ + task->utrace_flags = 0; + memset(&task->utrace, 0, sizeof(task->utrace)); + INIT_LIST_HEAD(&task->utrace.attached); + INIT_LIST_HEAD(&task->utrace.attaching); + spin_lock_init(&task->utrace.lock); +} + +void utrace_release_task(struct task_struct *); +void task_utrace_proc_status(struct seq_file *m, struct task_struct *p); + + +/* + * Version number of the API defined in this file. This will change + * whenever a tracing engine's code would need some updates to keep + * working. We maintain this here for the benefit of tracing engine code + * that is developed concurrently with utrace API improvements before they + * are merged into the kernel, making LINUX_VERSION_CODE checks unwieldy. + */ +#define UTRACE_API_VERSION 20090302 + +/** + * enum utrace_resume_action - engine's choice of action for a traced task + * @UTRACE_STOP: Stay quiescent after callbacks. + * @UTRACE_REPORT: Make some callback soon. + * @UTRACE_INTERRUPT: Make @report_signal() callback soon. + * @UTRACE_SINGLESTEP: Resume in user mode for one instruction. + * @UTRACE_BLOCKSTEP: Resume in user mode until next branch. + * @UTRACE_RESUME: Resume normally in user mode. + * @UTRACE_DETACH: Detach my engine (implies %UTRACE_RESUME). + * + * See utrace_control() for detailed descriptions of each action. This is + * encoded in the @action argument and the return value for every callback + * with a &u32 return value. + * + * The order of these is important. When there is more than one engine, + * each supplies its choice and the smallest value prevails. + */ +enum utrace_resume_action { + UTRACE_STOP, + UTRACE_REPORT, + UTRACE_INTERRUPT, + UTRACE_SINGLESTEP, + UTRACE_BLOCKSTEP, + UTRACE_RESUME, + UTRACE_DETACH +}; +#define UTRACE_RESUME_MASK 0x0f + +/** + * utrace_resume_action - &enum utrace_resume_action from callback action + * @action: &u32 callback @action argument or return value + * + * This extracts the &enum utrace_resume_action from @action, + * which is the @action argument to a &struct utrace_engine_ops + * callback or the return value from one. + */ +static inline enum utrace_resume_action utrace_resume_action(u32 action) +{ + return action & UTRACE_RESUME_MASK; +} + +/** + * enum utrace_signal_action - disposition of signal + * @UTRACE_SIGNAL_DELIVER: Deliver according to sigaction. + * @UTRACE_SIGNAL_IGN: Ignore the signal. + * @UTRACE_SIGNAL_TERM: Terminate the process. + * @UTRACE_SIGNAL_CORE: Terminate with core dump. + * @UTRACE_SIGNAL_STOP: Deliver as absolute stop. + * @UTRACE_SIGNAL_TSTP: Deliver as job control stop. + * @UTRACE_SIGNAL_REPORT: Reporting before pending signals. + * @UTRACE_SIGNAL_HANDLER: Reporting after signal handler setup. + * + * This is encoded in the @action argument and the return value for + * a @report_signal() callback. It says what will happen to the + * signal described by the &siginfo_t parameter to the callback. + * + * The %UTRACE_SIGNAL_REPORT value is used in an @action argument when + * a tracing report is being made before dequeuing any pending signal. + * If this is immediately after a signal handler has been set up, then + * %UTRACE_SIGNAL_HANDLER is used instead. A @report_signal callback + * that uses %UTRACE_SIGNAL_DELIVER|%UTRACE_SINGLESTEP will ensure + * it sees a %UTRACE_SIGNAL_HANDLER report. + */ +enum utrace_signal_action { + UTRACE_SIGNAL_DELIVER = 0x00, + UTRACE_SIGNAL_IGN = 0x10, + UTRACE_SIGNAL_TERM = 0x20, + UTRACE_SIGNAL_CORE = 0x30, + UTRACE_SIGNAL_STOP = 0x40, + UTRACE_SIGNAL_TSTP = 0x50, + UTRACE_SIGNAL_REPORT = 0x60, + UTRACE_SIGNAL_HANDLER = 0x70 +}; +#define UTRACE_SIGNAL_MASK 0xf0 +#define UTRACE_SIGNAL_HOLD 0x100 /* Flag, push signal back on queue. */ + +/** + * utrace_signal_action - &enum utrace_signal_action from callback action + * @action: @report_signal callback @action argument or return value + * + * This extracts the &enum utrace_signal_action from @action, which + * is the @action argument to a @report_signal callback or the + * return value from one. + */ +static inline enum utrace_signal_action utrace_signal_action(u32 action) +{ + return action & UTRACE_SIGNAL_MASK; +} + +/** + * enum utrace_syscall_action - disposition of system call attempt + * @UTRACE_SYSCALL_RUN: Run the system call. + * @UTRACE_SYSCALL_ABORT: Don't run the system call. + * + * This is encoded in the @action argument and the return value for + * a @report_syscall_entry callback. + */ +enum utrace_syscall_action { + UTRACE_SYSCALL_RUN = 0x00, + UTRACE_SYSCALL_ABORT = 0x10 +}; +#define UTRACE_SYSCALL_MASK 0xf0 + +/** + * utrace_syscall_action - &enum utrace_syscall_action from callback action + * @action: @report_syscall_entry callback @action or return value + * + * This extracts the &enum utrace_syscall_action from @action, which + * is the @action argument to a @report_syscall_entry callback or the + * return value from one. + */ +static inline enum utrace_syscall_action utrace_syscall_action(u32 action) +{ + return action & UTRACE_SYSCALL_MASK; +} + +/* + * Flags for utrace_attach_task() and utrace_attach_pid(). + */ +#define UTRACE_ATTACH_CREATE 0x0010 /* Attach a new engine. */ +#define UTRACE_ATTACH_EXCLUSIVE 0x0020 /* Refuse if existing match. */ +#define UTRACE_ATTACH_MATCH_OPS 0x0001 /* Match engines on ops. */ +#define UTRACE_ATTACH_MATCH_DATA 0x0002 /* Match engines on data. */ +#define UTRACE_ATTACH_MATCH_MASK 0x000f + +/** + * struct utrace_engine - per-engine structure + * @ops: &struct utrace_engine_ops pointer passed to utrace_attach_task() + * @data: engine-private &void * passed to utrace_attach_task() + * @flags: event mask set by utrace_set_events() plus internal flag bits + * + * The task itself never has to worry about engines detaching while + * it's doing event callbacks. These structures are removed from the + * task's active list only when it's stopped, or by the task itself. + * + * utrace_engine_get() and utrace_engine_put() maintain a reference count. + * When it drops to zero, the structure is freed. One reference is held + * implicitly while the engine is attached to its task. + */ +struct utrace_engine { +/* private: */ + struct kref kref; + struct list_head entry; + +/* public: */ + const struct utrace_engine_ops *ops; + void *data; + + unsigned long flags; +}; + +/** + * utrace_engine_get - acquire a reference on a &struct utrace_engine + * @engine: &struct utrace_engine pointer + * + * You must hold a reference on @engine, and you get another. + */ +static inline void utrace_engine_get(struct utrace_engine *engine) +{ + kref_get(&engine->kref); +} + +void __utrace_engine_release(struct kref *); + +/** + * utrace_engine_put - release a reference on a &struct utrace_engine + * @engine: &struct utrace_engine pointer + * + * You must hold a reference on @engine, and you lose that reference. + * If it was the last one, @engine becomes an invalid pointer. + */ +static inline void utrace_engine_put(struct utrace_engine *engine) +{ + kref_put(&engine->kref, __utrace_engine_release); +} + +/** + * struct utrace_engine_ops - tracing engine callbacks + * + * Each @report_*() callback corresponds to an %UTRACE_EVENT(*) bit. + * utrace_set_events() calls on @engine choose which callbacks will be made + * to @engine from @task. + * + * Most callbacks take an @action argument, giving the resume action + * chosen by other tracing engines. All callbacks take an @engine + * argument, and a @task argument, which is always equal to @current. + * For some calls, @action also includes bits specific to that event + * and utrace_resume_action() is used to extract the resume action. + * This shows what would happen if @engine wasn't there, or will if + * the callback's return value uses %UTRACE_RESUME. This always + * starts as %UTRACE_RESUME when no other tracing is being done on + * this task. + * + * All return values contain &enum utrace_resume_action bits. For + * some calls, other bits specific to that kind of event are added to + * the resume action bits with OR. These are the same bits used in + * the @action argument. The resume action returned by a callback + * does not override previous engines' choices, it only says what + * @engine wants done. What @task actually does is the action that's + * most constrained among the choices made by all attached engines. + * See utrace_control() for more information on the actions. + * + * When %UTRACE_STOP is used in @report_syscall_entry, then @task + * stops before attempting the system call. In other cases, the + * resume action does not take effect until @task is ready to check + * for signals and return to user mode. If there are more callbacks + * to be made, the last round of calls determines the final action. + * A @report_quiesce callback with @event zero, or a @report_signal + * callback, will always be the last one made before @task resumes. + * Only %UTRACE_STOP is "sticky"--if @engine returned %UTRACE_STOP + * then @task stays stopped unless @engine returns different from a + * following callback. + * + * The report_death() and report_reap() callbacks do not take @action + * arguments, and only %UTRACE_DETACH is meaningful in the return value + * from a report_death() callback. None of the resume actions applies + * to a dead thread. + * + * All @report_*() hooks are called with no locks held, in a generally + * safe environment when we will be returning to user mode soon (or just + * entered the kernel). It is fine to block for memory allocation and + * the like, but all hooks are asynchronous and must not block on + * external events! If you want the thread to block, use %UTRACE_STOP + * in your hook's return value; then later wake it up with utrace_control(). + * + * @report_quiesce: + * Requested by %UTRACE_EVENT(%QUIESCE). + * This does not indicate any event, but just that @task (the current + * thread) is in a safe place for examination. This call is made + * before each specific event callback, except for @report_reap. + * The @event argument gives the %UTRACE_EVENT(@which) value for + * the event occurring. This callback might be made for events @engine + * has not requested, if some other engine is tracing the event; + * calling utrace_set_events() call here can request the immediate + * callback for this occurrence of @event. @event is zero when there + * is no other event, @task is now ready to check for signals and + * return to user mode, and some engine has used %UTRACE_REPORT or + * %UTRACE_INTERRUPT to request this callback. For this case, + * if @report_signal is not %NULL, the @report_quiesce callback + * may be replaced with a @report_signal callback passing + * %UTRACE_SIGNAL_REPORT in its @action argument, whenever @task is + * entering the signal-check path anyway. + * + * @report_signal: + * Requested by %UTRACE_EVENT(%SIGNAL_*) or %UTRACE_EVENT(%QUIESCE). + * Use utrace_signal_action() and utrace_resume_action() on @action. + * The signal action is %UTRACE_SIGNAL_REPORT when some engine has + * used %UTRACE_REPORT or %UTRACE_INTERRUPT; the callback can choose + * to stop or to deliver an artificial signal, before pending signals. + * It's %UTRACE_SIGNAL_HANDLER instead when signal handler setup just + * finished (after a previous %UTRACE_SIGNAL_DELIVER return); this + * serves in lieu of any %UTRACE_SIGNAL_REPORT callback requested by + * %UTRACE_REPORT or %UTRACE_INTERRUPT, and is also implicitly + * requested by %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP into the + * signal delivery. The other signal actions indicate a signal about + * to be delivered; the previous engine's return value sets the signal + * action seen by the the following engine's callback. The @info data + * can be changed at will, including @info->si_signo. The settings in + * @return_ka determines what %UTRACE_SIGNAL_DELIVER does. @orig_ka + * is what was in force before other tracing engines intervened, and + * it's %NULL when this report began as %UTRACE_SIGNAL_REPORT or + * %UTRACE_SIGNAL_HANDLER. For a report without a new signal, @info + * is left uninitialized and must be set completely by an engine that + * chooses to deliver a signal; if there was a previous @report_signal + * callback ending in %UTRACE_STOP and it was just resumed using + * %UTRACE_REPORT or %UTRACE_INTERRUPT, then @info is left unchanged + * from the previous callback. In this way, the original signal can + * be left in @info while returning %UTRACE_STOP|%UTRACE_SIGNAL_IGN + * and then found again when resuming @task with %UTRACE_INTERRUPT. + * The %UTRACE_SIGNAL_HOLD flag bit can be OR'd into the return value, + * and might be in @action if the previous engine returned it. This + * flag asks that the signal in @info be pushed back on @task's queue + * so that it will be seen again after whatever action is taken now. + * + * @report_clone: + * Requested by %UTRACE_EVENT(%CLONE). + * Event reported for parent, before the new task @child might run. + * @clone_flags gives the flags used in the clone system call, + * or equivalent flags for a fork() or vfork() system call. + * This function can use utrace_attach_task() on @child. It's guaranteed + * that asynchronous utrace_attach_task() calls will be ordered after + * any calls in @report_clone callbacks for the parent. Thus + * when using %UTRACE_ATTACH_EXCLUSIVE in the asynchronous calls, + * you can be sure that the parent's @report_clone callback has + * already attached to @child or chosen not to. Passing %UTRACE_STOP + * to utrace_control() on @child here keeps the child stopped before + * it ever runs in user mode, %UTRACE_REPORT or %UTRACE_INTERRUPT + * ensures a callback from @child before it starts in user mode. + * + * @report_jctl: + * Requested by %UTRACE_EVENT(%JCTL). + * Job control event; @type is %CLD_STOPPED or %CLD_CONTINUED, + * indicating whether we are stopping or resuming now. If @notify + * is nonzero, @task is the last thread to stop and so will send + * %SIGCHLD to its parent after this callback; @notify reflects + * what the parent's %SIGCHLD has in @si_code, which can sometimes + * be %CLD_STOPPED even when @type is %CLD_CONTINUED. + * + * @report_exec: + * Requested by %UTRACE_EVENT(%EXEC). + * An execve system call has succeeded and the new program is about to + * start running. The initial user register state is handy to be tweaked + * directly in @regs. @fmt and @bprm gives the details of this exec. + * + * @report_syscall_entry: + * Requested by %UTRACE_EVENT(%SYSCALL_ENTRY). + * Thread has entered the kernel to request a system call. + * The user register state is handy to be tweaked directly in @regs. + * The @action argument contains an &enum utrace_syscall_action, + * use utrace_syscall_action() to extract it. The return value + * overrides the last engine's action for the system call. + * If the final action is %UTRACE_SYSCALL_ABORT, no system call + * is made. The details of the system call being attempted can + * be fetched here with syscall_get_nr() and syscall_get_arguments(). + * The parameter registers can be changed with syscall_set_arguments(). + * + * @report_syscall_exit: + * Requested by %UTRACE_EVENT(%SYSCALL_EXIT). + * Thread is about to leave the kernel after a system call request. + * The user register state is handy to be tweaked directly in @regs. + * The results of the system call attempt can be examined here using + * syscall_get_error() and syscall_get_return_value(). It is safe + * here to call syscall_set_return_value() or syscall_rollback(). + * + * @report_exit: + * Requested by %UTRACE_EVENT(%EXIT). + * Thread is exiting and cannot be prevented from doing so, + * but all its state is still live. The @code value will be + * the wait result seen by the parent, and can be changed by + * this engine or others. The @orig_code value is the real + * status, not changed by any tracing engine. Returning %UTRACE_STOP + * here keeps @task stopped before it cleans up its state and dies, + * so it can be examined by other processes. When @task is allowed + * to run, it will die and get to the @report_death callback. + * + * @report_death: + * Requested by %UTRACE_EVENT(%DEATH). + * Thread is really dead now. It might be reaped by its parent at + * any time, or self-reap immediately. Though the actual reaping + * may happen in parallel, a report_reap() callback will always be + * ordered after a report_death() callback. + * + * @report_reap: + * Requested by %UTRACE_EVENT(%REAP). + * Called when someone reaps the dead task (parent, init, or self). + * This means the parent called wait, or else this was a detached + * thread or a process whose parent ignores SIGCHLD. + * No more callbacks are made after this one. + * The engine is always detached. + * There is nothing more a tracing engine can do about this thread. + * After this callback, the @engine pointer will become invalid. + * The @task pointer may become invalid if get_task_struct() hasn't + * been used to keep it alive. + * An engine should always request this callback if it stores the + * @engine pointer or stores any pointer in @engine->data, so it + * can clean up its data structures. + * Unlike other callbacks, this can be called from the parent's context + * rather than from the traced thread itself--it must not delay the + * parent by blocking. + */ +struct utrace_engine_ops { + u32 (*report_quiesce)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + unsigned long event); + u32 (*report_signal)(u32 action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka); + u32 (*report_clone)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *parent, + unsigned long clone_flags, + struct task_struct *child); + u32 (*report_jctl)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + int type, int notify); + u32 (*report_exec)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs); + u32 (*report_syscall_entry)(u32 action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs); + u32 (*report_syscall_exit)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs); + u32 (*report_exit)(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + long orig_code, long *code); + u32 (*report_death)(struct utrace_engine *engine, + struct task_struct *task, + bool group_dead, int signal); + void (*report_reap)(struct utrace_engine *engine, + struct task_struct *task); +}; + +/** + * struct utrace_examiner - private state for using utrace_prepare_examine() + * + * The members of &struct utrace_examiner are private to the implementation. + * This data type holds the state from a call to utrace_prepare_examine() + * to be used by a call to utrace_finish_examine(). + */ +struct utrace_examiner { +/* private: */ + long state; + unsigned long ncsw; +}; + +/* + * These are the exported entry points for tracing engines to use. + * See kernel/utrace.c for their kerneldoc comments with interface details. + */ +struct utrace_engine *utrace_attach_task(struct task_struct *, int, + const struct utrace_engine_ops *, + void *); +struct utrace_engine *utrace_attach_pid(struct pid *, int, + const struct utrace_engine_ops *, + void *); +int __must_check utrace_control(struct task_struct *, + struct utrace_engine *, + enum utrace_resume_action); +int __must_check utrace_set_events(struct task_struct *, + struct utrace_engine *, + unsigned long eventmask); +int __must_check utrace_barrier(struct task_struct *, + struct utrace_engine *); +int __must_check utrace_prepare_examine(struct task_struct *, + struct utrace_engine *, + struct utrace_examiner *); +int __must_check utrace_finish_examine(struct task_struct *, + struct utrace_engine *, + struct utrace_examiner *); + +/** + * utrace_control_pid - control a thread being traced by a tracing engine + * @pid: thread to affect + * @engine: attached engine to affect + * @action: &enum utrace_resume_action for thread to do + * + * This is the same as utrace_control(), but takes a &struct pid + * pointer rather than a &struct task_struct pointer. The caller must + * hold a ref on @pid, but does not need to worry about the task + * staying valid. If it's been reaped so that @pid points nowhere, + * then this call returns -%ESRCH. + */ +static inline __must_check int utrace_control_pid( + struct pid *pid, struct utrace_engine *engine, + enum utrace_resume_action action) +{ + /* + * We don't bother with rcu_read_lock() here to protect the + * task_struct pointer, because utrace_control will return + * -ESRCH without looking at that pointer if the engine is + * already detached. A task_struct pointer can't die before + * all the engines are detached in release_task() first. + */ + struct task_struct *task = pid_task(pid, PIDTYPE_PID); + return unlikely(!task) ? -ESRCH : utrace_control(task, engine, action); +} + +/** + * utrace_set_events_pid - choose which event reports a tracing engine gets + * @pid: thread to affect + * @engine: attached engine to affect + * @eventmask: new event mask + * + * This is the same as utrace_set_events(), but takes a &struct pid + * pointer rather than a &struct task_struct pointer. The caller must + * hold a ref on @pid, but does not need to worry about the task + * staying valid. If it's been reaped so that @pid points nowhere, + * then this call returns -%ESRCH. + */ +static inline __must_check int utrace_set_events_pid( + struct pid *pid, struct utrace_engine *engine, unsigned long eventmask) +{ + struct task_struct *task = pid_task(pid, PIDTYPE_PID); + return unlikely(!task) ? -ESRCH : + utrace_set_events(task, engine, eventmask); +} + +/** + * utrace_barrier_pid - synchronize with simultaneous tracing callbacks + * @pid: thread to affect + * @engine: engine to affect (can be detached) + * + * This is the same as utrace_barrier(), but takes a &struct pid + * pointer rather than a &struct task_struct pointer. The caller must + * hold a ref on @pid, but does not need to worry about the task + * staying valid. If it's been reaped so that @pid points nowhere, + * then this call returns -%ESRCH. + */ +static inline __must_check int utrace_barrier_pid(struct pid *pid, + struct utrace_engine *engine) +{ + struct task_struct *task = pid_task(pid, PIDTYPE_PID); + return unlikely(!task) ? -ESRCH : utrace_barrier(task, engine); +} + +#endif /* CONFIG_UTRACE */ + +#endif /* linux/utrace.h */ diff --git a/include/linux/utrace_struct.h b/include/linux/utrace_struct.h new file mode 100644 index ...aba7e09 100644 --- /dev/null +++ b/include/linux/utrace_struct.h @@ -0,0 +1,58 @@ +/* + * 'struct utrace' data structure for kernel/utrace.c private use. + * + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + */ + +#ifndef _LINUX_UTRACE_STRUCT_H +#define _LINUX_UTRACE_STRUCT_H 1 + +#ifdef CONFIG_UTRACE + +#include +#include + +/* + * Per-thread structure private to utrace implementation. This properly + * belongs in kernel/utrace.c and its use is entirely private to the code + * there. It is only defined in a header file so that it can be embedded + * in the struct task_struct layout. It is here rather than in utrace.h + * to avoid header nesting order issues getting too complex. + * + */ +struct utrace { + struct task_struct *cloning; + + struct list_head attached, attaching; + spinlock_t lock; + + struct utrace_engine *reporting; + + unsigned int stopped:1; + unsigned int report:1; + unsigned int interrupt:1; + unsigned int signal_handler:1; + unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */ + unsigned int death:1; /* in utrace_report_death() now */ + unsigned int reap:1; /* release_task() has run */ +}; + +# define INIT_UTRACE(tsk) \ + .utrace_flags = 0, \ + .utrace = { \ + .lock = __SPIN_LOCK_UNLOCKED(tsk.utrace.lock), \ + .attached = LIST_HEAD_INIT(tsk.utrace.attached), \ + .attaching = LIST_HEAD_INIT(tsk.utrace.attaching), \ + }, + +#else + +# define INIT_UTRACE(tsk) /* Nothing. */ + +#endif /* CONFIG_UTRACE */ + +#endif /* linux/utrace_struct.h */ diff --git a/init/Kconfig b/init/Kconfig index 6a5c5fe..4b5ab3e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1060,6 +1060,15 @@ config STOP_MACHINE help Need stop_machine() primitive. +menuconfig UTRACE + bool "Infrastructure for tracing and debugging user processes" + depends on EXPERIMENTAL + depends on HAVE_ARCH_TRACEHOOK + help + Enable the utrace process tracing interface. This is an internal + kernel interface exported to kernel modules, to track events in + user threads, extract and change user thread state. + source "block/Kconfig" config PREEMPT_NOTIFIERS diff --git a/kernel/Makefile b/kernel/Makefile index e4791b3..7bff724 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -68,6 +68,7 @@ obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o +obj-$(CONFIG_UTRACE) += utrace.o obj-$(CONFIG_AUDIT) += audit.o auditfilter.o obj-$(CONFIG_AUDITSYSCALL) += auditsc.o obj-$(CONFIG_AUDIT_TREE) += audit_tree.o diff --git a/kernel/ptrace.c b/kernel/ptrace.c index c9cf48b..41e9542 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -172,6 +173,14 @@ bool ptrace_may_access(struct task_struc return (!err ? true : false); } +/* + * For experimental use of utrace, exclude ptrace on the same task. + */ +static inline bool exclude_ptrace(struct task_struct *task) +{ + return unlikely(!!task_utrace_flags(task)); +} + int ptrace_attach(struct task_struct *task) { int retval; @@ -210,6 +219,11 @@ repeat: goto repeat; } + if (exclude_ptrace(task)) { + retval = -EBUSY; + goto bad; + } + if (!task->mm) goto bad; /* the same process cannot be attached many times */ @@ -515,7 +529,9 @@ int ptrace_traceme(void) */ repeat: task_lock(current); - if (!(current->ptrace & PT_PTRACED)) { + if (exclude_ptrace(current)) { + ret = -EBUSY; + } else if (!(current->ptrace & PT_PTRACED)) { /* * See ptrace_attach() comments about the locking here. */ diff --git a/kernel/utrace.c b/kernel/utrace.c new file mode 100644 index ...3af06a6 100644 --- /dev/null +++ b/kernel/utrace.c @@ -0,0 +1,2348 @@ +/* + * utrace infrastructure interface for debugging user processes + * + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + * + * Red Hat Author: Roland McGrath. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +/* + * Rules for 'struct utrace', defined in + * but used entirely privately in this file. + * + * The common event reporting loops are done by the task making the + * report without ever taking any locks. To facilitate this, the two + * lists @attached and @attaching work together for smooth asynchronous + * attaching with low overhead. Modifying either list requires @lock. + * The @attaching list can be modified any time while holding @lock. + * New engines being attached always go on this list. + * + * The @attached list is what the task itself uses for its reporting + * loops. When the task itself is not quiescent, it can use the + * @attached list without taking any lock. Nobody may modify the list + * when the task is not quiescent. When it is quiescent, that means + * that it won't run again without taking @lock itself before using + * the list. + * + * At each place where we know the task is quiescent (or it's current), + * while holding @lock, we call splice_attaching(), below. This moves + * the @attaching list members on to the end of the @attached list. + * Since this happens at the start of any reporting pass, any new + * engines attached asynchronously go on the stable @attached list + * in time to have their callbacks seen. + */ + +static struct kmem_cache *utrace_engine_cachep; +static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */ + +static int __init utrace_init(void) +{ + utrace_engine_cachep = KMEM_CACHE(utrace_engine, SLAB_PANIC); + return 0; +} +module_init(utrace_init); + +/* + * This is called with @utrace->lock held when the task is safely + * quiescent, i.e. it won't consult utrace->attached without the lock. + * Move any engines attached asynchronously from @utrace->attaching + * onto the @utrace->attached list. + */ +static void splice_attaching(struct utrace *utrace) +{ + list_splice_tail_init(&utrace->attaching, &utrace->attached); +} + +/* + * This is the exported function used by the utrace_engine_put() inline. + */ +void __utrace_engine_release(struct kref *kref) +{ + struct utrace_engine *engine = container_of(kref, struct utrace_engine, + kref); + BUG_ON(!list_empty(&engine->entry)); + kmem_cache_free(utrace_engine_cachep, engine); +} +EXPORT_SYMBOL_GPL(__utrace_engine_release); + +static bool engine_matches(struct utrace_engine *engine, int flags, + const struct utrace_engine_ops *ops, void *data) +{ + if ((flags & UTRACE_ATTACH_MATCH_OPS) && engine->ops != ops) + return false; + if ((flags & UTRACE_ATTACH_MATCH_DATA) && engine->data != data) + return false; + return engine->ops && engine->ops != &utrace_detached_ops; +} + +static struct utrace_engine *matching_engine( + struct utrace *utrace, int flags, + const struct utrace_engine_ops *ops, void *data) +{ + struct utrace_engine *engine; + list_for_each_entry(engine, &utrace->attached, entry) + if (engine_matches(engine, flags, ops, data)) + return engine; + list_for_each_entry(engine, &utrace->attaching, entry) + if (engine_matches(engine, flags, ops, data)) + return engine; + return NULL; +} + +/* + * For experimental use, utrace attach is mutually exclusive with ptrace. + */ +static inline bool exclude_utrace(struct task_struct *task) +{ + return unlikely(!!task->ptrace); +} + +/* + * Called without locks, when we might be the first utrace engine to attach. + * If this is a newborn thread and we are not the creator, we have to wait + * for it. The creator gets the first chance to attach. The PF_STARTING + * flag is cleared after its report_clone hook has had a chance to run. + */ +static inline int utrace_attach_delay(struct task_struct *target) +{ + if ((target->flags & PF_STARTING) && + current->utrace.cloning != target) + do { + schedule_timeout_interruptible(1); + if (signal_pending(current)) + return -ERESTARTNOINTR; + } while (target->flags & PF_STARTING); + + return 0; +} + +/* + * Enqueue @engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE. + */ +static int utrace_add_engine(struct task_struct *target, + struct utrace *utrace, + struct utrace_engine *engine, + int flags, + const struct utrace_engine_ops *ops, + void *data) +{ + int ret; + + spin_lock(&utrace->lock); + + if (utrace->reap) { + /* + * Already entered utrace_release_task(), cannot attach now. + */ + ret = -ESRCH; + } else if ((flags & UTRACE_ATTACH_EXCLUSIVE) && + unlikely(matching_engine(utrace, flags, ops, data))) { + ret = -EEXIST; + } else { + /* + * Put the new engine on the pending ->attaching list. + * Make sure it gets onto the ->attached list by the next + * time it's examined. + * + * When target == current, it would be safe just to call + * splice_attaching() right here. But if we're inside a + * callback, that would mean the new engine also gets + * notified about the event that precipitated its own + * creation. This is not what the user wants. + * + * Setting ->report ensures that start_report() takes the + * lock and does it next time. Whenever setting ->report, + * we must maintain the invariant that TIF_NOTIFY_RESUME is + * also set. Otherwise utrace_control() or utrace_do_stop() + * might skip setting TIF_NOTIFY_RESUME upon seeing ->report + * already set, and we'd miss a necessary callback. + * + * In case we had no engines before, make sure that + * utrace_flags is not zero when tracehook_notify_resume() + * checks. That would bypass utrace reporting clearing + * TIF_NOTIFY_RESUME, and thus violate the same invariant. + */ + target->utrace_flags |= UTRACE_EVENT(REAP); + list_add_tail(&engine->entry, &utrace->attaching); + utrace->report = 1; + set_notify_resume(target); + + ret = 0; + } + + spin_unlock(&utrace->lock); + + return ret; +} + +/** + * utrace_attach_task - attach new engine, or look up an attached engine + * @target: thread to attach to + * @flags: flag bits combined with OR, see below + * @ops: callback table for new engine + * @data: engine private data pointer + * + * The caller must ensure that the @target thread does not get freed, + * i.e. hold a ref or be its parent. It is always safe to call this + * on @current, or on the @child pointer in a @report_clone callback. + * For most other cases, it's easier to use utrace_attach_pid() instead. + * + * UTRACE_ATTACH_CREATE: + * Create a new engine. If %UTRACE_ATTACH_CREATE is not specified, you + * only look up an existing engine already attached to the thread. + * + * UTRACE_ATTACH_EXCLUSIVE: + * Attempting to attach a second (matching) engine fails with -%EEXIST. + * + * UTRACE_ATTACH_MATCH_OPS: Only consider engines matching @ops. + * UTRACE_ATTACH_MATCH_DATA: Only consider engines matching @data. + */ +struct utrace_engine *utrace_attach_task( + struct task_struct *target, int flags, + const struct utrace_engine_ops *ops, void *data) +{ + struct utrace *utrace; + struct utrace_engine *engine; + int ret; + + utrace = &target->utrace; + + if (unlikely(target->exit_state == EXIT_DEAD)) { + /* + * The target has already been reaped. + * Check this early, though it's not synchronized. + * utrace_add_engine() will do the final check. + */ + if (!(flags & UTRACE_ATTACH_CREATE)) + return ERR_PTR(-ENOENT); + return ERR_PTR(-ESRCH); + } + + if (!(flags & UTRACE_ATTACH_CREATE)) { + spin_lock(&utrace->lock); + engine = matching_engine(utrace, flags, ops, data); + if (engine) + utrace_engine_get(engine); + spin_unlock(&utrace->lock); + return engine ?: ERR_PTR(-ENOENT); + } + + if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops)) + return ERR_PTR(-EINVAL); + + if (unlikely(target->flags & PF_KTHREAD)) + /* + * Silly kernel, utrace is for users! + */ + return ERR_PTR(-EPERM); + + engine = kmem_cache_alloc(utrace_engine_cachep, GFP_KERNEL); + if (unlikely(!engine)) + return ERR_PTR(-ENOMEM); + + /* + * Initialize the new engine structure. It starts out with two + * refs: one ref to return, and one ref for being attached. + */ + kref_set(&engine->kref, 2); + engine->flags = 0; + engine->ops = ops; + engine->data = data; + + ret = utrace_attach_delay(target); + if (likely(!ret)) + ret = utrace_add_engine(target, utrace, engine, + flags, ops, data); + + if (unlikely(ret)) { + kmem_cache_free(utrace_engine_cachep, engine); + engine = ERR_PTR(ret); + } + + return engine; +} +EXPORT_SYMBOL_GPL(utrace_attach_task); + +/** + * utrace_attach_pid - attach new engine, or look up an attached engine + * @pid: &struct pid pointer representing thread to attach to + * @flags: flag bits combined with OR, see utrace_attach_task() + * @ops: callback table for new engine + * @data: engine private data pointer + * + * This is the same as utrace_attach_task(), but takes a &struct pid + * pointer rather than a &struct task_struct pointer. The caller must + * hold a ref on @pid, but does not need to worry about the task + * staying valid. If it's been reaped so that @pid points nowhere, + * then this call returns -%ESRCH. + */ +struct utrace_engine *utrace_attach_pid( + struct pid *pid, int flags, + const struct utrace_engine_ops *ops, void *data) +{ + struct utrace_engine *engine = ERR_PTR(-ESRCH); + struct task_struct *task = get_pid_task(pid, PIDTYPE_PID); + if (task) { + engine = utrace_attach_task(task, flags, ops, data); + put_task_struct(task); + } + return engine; +} +EXPORT_SYMBOL_GPL(utrace_attach_pid); + +/* + * When an engine is detached, the target thread may still see it and + * make callbacks until it quiesces. We install a special ops vector + * with these two callbacks. When the target thread quiesces, it can + * safely free the engine itself. For any event we will always get + * the report_quiesce() callback first, so we only need this one + * pointer to be set. The only exception is report_reap(), so we + * supply that callback too. + */ +static u32 utrace_detached_quiesce(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + unsigned long event) +{ + return UTRACE_DETACH; +} + +static void utrace_detached_reap(struct utrace_engine *engine, + struct task_struct *task) +{ +} + +static const struct utrace_engine_ops utrace_detached_ops = { + .report_quiesce = &utrace_detached_quiesce, + .report_reap = &utrace_detached_reap +}; + +/* + * After waking up from TASK_TRACED, clear bookkeeping in @utrace. + * Returns true if we were woken up prematurely by SIGKILL. + */ +static inline bool finish_utrace_stop(struct task_struct *task, + struct utrace *utrace) +{ + bool killed = false; + + /* + * utrace_wakeup() clears @utrace->stopped before waking us up. + * We're officially awake if it's clear. + */ + spin_lock(&utrace->lock); + if (unlikely(utrace->stopped)) { + /* + * If we're here with it still set, it must have been + * signal_wake_up() instead, waking us up for a SIGKILL. + */ + spin_lock_irq(&task->sighand->siglock); + WARN_ON(!sigismember(&task->pending.signal, SIGKILL)); + spin_unlock_irq(&task->sighand->siglock); + utrace->stopped = 0; + killed = true; + } + spin_unlock(&utrace->lock); + + return killed; +} + +/* + * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up. + * @task == current, @utrace == current->utrace, which is not locked. + * Return true if we were woken up by SIGKILL even though some utrace + * engine may still want us to stay stopped. + */ +static bool utrace_stop(struct task_struct *task, struct utrace *utrace, + bool report) +{ + bool killed; + + /* + * @utrace->stopped is the flag that says we are safely + * inside this function. It should never be set on entry. + */ + BUG_ON(utrace->stopped); + + /* + * The siglock protects us against signals. As well as SIGKILL + * waking us up, we must synchronize with the signal bookkeeping + * for stop signals and SIGCONT. + */ + spin_lock(&utrace->lock); + spin_lock_irq(&task->sighand->siglock); + + if (unlikely(sigismember(&task->pending.signal, SIGKILL))) { + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + return true; + } + + if (report) { + /* + * Ensure a reporting pass when we're resumed. + */ + utrace->report = 1; + set_thread_flag(TIF_NOTIFY_RESUME); + } + + utrace->stopped = 1; + __set_current_state(TASK_TRACED); + + /* + * If there is a group stop in progress, + * we must participate in the bookkeeping. + */ + if (task->signal->group_stop_count > 0) + --task->signal->group_stop_count; + + spin_unlock_irq(&task->sighand->siglock); + spin_unlock(&utrace->lock); + + schedule(); + + /* + * While in TASK_TRACED, we were considered "frozen enough". + * Now that we woke up, it's crucial if we're supposed to be + * frozen that we freeze now before running anything substantial. + */ + try_to_freeze(); + + killed = finish_utrace_stop(task, utrace); + + /* + * While we were in TASK_TRACED, complete_signal() considered + * us "uninterested" in signal wakeups. Now make sure our + * TIF_SIGPENDING state is correct for normal running. + */ + spin_lock_irq(&task->sighand->siglock); + recalc_sigpending(); + spin_unlock_irq(&task->sighand->siglock); + + return killed; +} + +/* + * The caller has to hold a ref on the engine. If the attached flag is + * true (all but utrace_barrier() calls), the engine is supposed to be + * attached. If the attached flag is false (utrace_barrier() only), + * then return -ERESTARTSYS for an engine marked for detach but not yet + * fully detached. The task pointer can be invalid if the engine is + * detached. + * + * Get the utrace lock for the target task. + * Returns the struct if locked, or ERR_PTR(-errno). + * + * This has to be robust against races with: + * utrace_control(target, UTRACE_DETACH) calls + * UTRACE_DETACH after reports + * utrace_report_death + * utrace_release_task + */ +static struct utrace *get_utrace_lock(struct task_struct *target, + struct utrace_engine *engine, + bool attached) + __acquires(utrace->lock) +{ + struct utrace *utrace; + + rcu_read_lock(); + + /* + * If this engine was already detached, bail out before we look at + * the task_struct pointer at all. If it's detached after this + * check, then RCU is still keeping this task_struct pointer valid. + * + * The ops pointer is NULL when the engine is fully detached. + * It's &utrace_detached_ops when it's marked detached but still + * on the list. In the latter case, utrace_barrier() still works, + * since the target might be in the middle of an old callback. + */ + if (unlikely(!engine->ops)) { + rcu_read_unlock(); + return ERR_PTR(-ESRCH); + } + + if (unlikely(engine->ops == &utrace_detached_ops)) { + rcu_read_unlock(); + return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS); + } + + utrace = &target->utrace; + if (unlikely(target->exit_state == EXIT_DEAD)) { + /* + * If all engines detached already, utrace is clear. + * Otherwise, we're called after utrace_release_task might + * have started. A call to this engine's report_reap + * callback might already be in progress. + */ + utrace = ERR_PTR(-ESRCH); + } else { + spin_lock(&utrace->lock); + if (unlikely(!engine->ops) || + unlikely(engine->ops == &utrace_detached_ops)) { + /* + * By the time we got the utrace lock, + * it had been reaped or detached already. + */ + spin_unlock(&utrace->lock); + utrace = ERR_PTR(-ESRCH); + if (!attached && engine->ops == &utrace_detached_ops) + utrace = ERR_PTR(-ERESTARTSYS); + } + } + rcu_read_unlock(); + + return utrace; +} + +/* + * Now that we don't hold any locks, run through any + * detached engines and free their references. Each + * engine had one implicit ref while it was attached. + */ +static void put_detached_list(struct list_head *list) +{ + struct utrace_engine *engine, *next; + list_for_each_entry_safe(engine, next, list, entry) { + list_del_init(&engine->entry); + utrace_engine_put(engine); + } +} + +/* + * Called with utrace->lock held. + * Notify and clean up all engines, then free utrace. + */ +static void utrace_reap(struct task_struct *target, struct utrace *utrace) + __releases(utrace->lock) +{ + struct utrace_engine *engine, *next; + const struct utrace_engine_ops *ops; + LIST_HEAD(detached); + +restart: + splice_attaching(utrace); + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + ops = engine->ops; + engine->ops = NULL; + list_move(&engine->entry, &detached); + + /* + * If it didn't need a callback, we don't need to drop + * the lock. Now nothing else refers to this engine. + */ + if (!(engine->flags & UTRACE_EVENT(REAP))) + continue; + + /* + * This synchronizes with utrace_barrier(). Since we + * need the utrace->lock here anyway (unlike the other + * reporting loops), we don't need any memory barrier + * as utrace_barrier() holds the lock. + */ + utrace->reporting = engine; + spin_unlock(&utrace->lock); + + (*ops->report_reap)(engine, target); + + utrace->reporting = NULL; + + put_detached_list(&detached); + + spin_lock(&utrace->lock); + goto restart; + } + + spin_unlock(&utrace->lock); + + put_detached_list(&detached); +} + +/* + * Called by release_task. After this, target->utrace must be cleared. + */ +void utrace_release_task(struct task_struct *target) +{ + struct utrace *utrace; + + utrace = &target->utrace; + + spin_lock(&utrace->lock); + + utrace->reap = 1; + + if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS)) { + utrace_reap(target, utrace); /* Unlocks and frees. */ + return; + } + + /* + * The target will do some final callbacks but hasn't + * finished them yet. We know because it clears these + * event bits after it's done. Instead of cleaning up here + * and requiring utrace_report_death to cope with it, we + * delay the REAP report and the teardown until after the + * target finishes its death reports. + */ + + spin_unlock(&utrace->lock); +} + +/* + * We use an extra bit in utrace_engine.flags past the event bits, + * to record whether the engine is keeping the target thread stopped. + */ +#define ENGINE_STOP (1UL << _UTRACE_NEVENTS) + +static void mark_engine_wants_stop(struct utrace_engine *engine) +{ + engine->flags |= ENGINE_STOP; +} + +static void clear_engine_wants_stop(struct utrace_engine *engine) +{ + engine->flags &= ~ENGINE_STOP; +} + +static bool engine_wants_stop(struct utrace_engine *engine) +{ + return (engine->flags & ENGINE_STOP) != 0; +} + +/** + * utrace_set_events - choose which event reports a tracing engine gets + * @target: thread to affect + * @engine: attached engine to affect + * @events: new event mask + * + * This changes the set of events for which @engine wants callbacks made. + * + * This fails with -%EALREADY and does nothing if you try to clear + * %UTRACE_EVENT(%DEATH) when the @report_death callback may already have + * begun, if you try to clear %UTRACE_EVENT(%REAP) when the @report_reap + * callback may already have begun, or if you try to newly set + * %UTRACE_EVENT(%DEATH) or %UTRACE_EVENT(%QUIESCE) when @target is + * already dead or dying. + * + * This can fail with -%ESRCH when @target has already been detached, + * including forcible detach on reaping. + * + * If @target was stopped before the call, then after a successful call, + * no event callbacks not requested in @events will be made; if + * %UTRACE_EVENT(%QUIESCE) is included in @events, then a @report_quiesce + * callback will be made when @target resumes. If @target was not stopped, + * and was about to make a callback to @engine, this returns -%EINPROGRESS. + * In this case, the callback in progress might be one excluded from the + * new @events setting. When this returns zero, you can be sure that no + * event callbacks you've disabled in @events can be made. + * + * To synchronize after an -%EINPROGRESS return, see utrace_barrier(). + * + * When @target is @current, -%EINPROGRESS is not returned. But + * note that a newly-created engine will not receive any callbacks + * related to an event notification already in progress. This call + * enables @events callbacks to be made as soon as @engine becomes + * eligible for any callbacks, see utrace_attach_task(). + * + * These rules provide for coherent synchronization based on %UTRACE_STOP, + * even when %SIGKILL is breaking its normal simple rules. + */ +int utrace_set_events(struct task_struct *target, + struct utrace_engine *engine, + unsigned long events) +{ + struct utrace *utrace; + unsigned long old_flags, old_utrace_flags, set_utrace_flags; + int ret; + + utrace = get_utrace_lock(target, engine, true); + if (unlikely(IS_ERR(utrace))) + return PTR_ERR(utrace); + + old_utrace_flags = target->utrace_flags; + set_utrace_flags = events; + old_flags = engine->flags; + + if (target->exit_state && + (((events & ~old_flags) & _UTRACE_DEATH_EVENTS) || + (utrace->death && + ((old_flags & ~events) & _UTRACE_DEATH_EVENTS)) || + (utrace->reap && ((old_flags & ~events) & UTRACE_EVENT(REAP))))) { + spin_unlock(&utrace->lock); + return -EALREADY; + } + + /* + * When setting these flags, it's essential that we really + * synchronize with exit_notify(). They cannot be set after + * exit_notify() takes the tasklist_lock. By holding the read + * lock here while setting the flags, we ensure that the calls + * to tracehook_notify_death() and tracehook_report_death() will + * see the new flags. This ensures that utrace_release_task() + * knows positively that utrace_report_death() will be called or + * that it won't. + */ + if ((set_utrace_flags & ~old_utrace_flags) & _UTRACE_DEATH_EVENTS) { + read_lock(&tasklist_lock); + if (unlikely(target->exit_state)) { + read_unlock(&tasklist_lock); + spin_unlock(&utrace->lock); + return -EALREADY; + } + target->utrace_flags |= set_utrace_flags; + read_unlock(&tasklist_lock); + } + + engine->flags = events | (engine->flags & ENGINE_STOP); + target->utrace_flags |= set_utrace_flags; + + if ((set_utrace_flags & UTRACE_EVENT_SYSCALL) && + !(old_utrace_flags & UTRACE_EVENT_SYSCALL)) + set_tsk_thread_flag(target, TIF_SYSCALL_TRACE); + + ret = 0; + if (!utrace->stopped && target != current) { + /* + * This barrier ensures that our engine->flags changes + * have hit before we examine utrace->reporting, + * pairing with the barrier in start_callback(). If + * @target has not yet hit finish_callback() to clear + * utrace->reporting, we might be in the middle of a + * callback to @engine. + */ + smp_mb(); + if (utrace->reporting == engine) + ret = -EINPROGRESS; + } + + spin_unlock(&utrace->lock); + + return ret; +} +EXPORT_SYMBOL_GPL(utrace_set_events); + +/* + * Asynchronously mark an engine as being detached. + * + * This must work while the target thread races with us doing + * start_callback(), defined below. It uses smp_rmb() between checking + * @engine->flags and using @engine->ops. Here we change @engine->ops + * first, then use smp_wmb() before changing @engine->flags. This ensures + * it can check the old flags before using the old ops, or check the old + * flags before using the new ops, or check the new flags before using the + * new ops, but can never check the new flags before using the old ops. + * Hence, utrace_detached_ops might be used with any old flags in place. + * It has report_quiesce() and report_reap() callbacks to handle all cases. + */ +static void mark_engine_detached(struct utrace_engine *engine) +{ + engine->ops = &utrace_detached_ops; + smp_wmb(); + engine->flags = UTRACE_EVENT(QUIESCE); +} + +/* + * Get @target to stop and return true if it is already stopped now. + * If we return false, it will make some event callback soonish. + * Called with @utrace locked. + */ +static bool utrace_do_stop(struct task_struct *target, struct utrace *utrace) +{ + bool stopped = false; + + spin_lock_irq(&target->sighand->siglock); + if (unlikely(target->exit_state)) { + /* + * On the exit path, it's only truly quiescent + * if it has already been through + * utrace_report_death(), or never will. + */ + if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS)) + utrace->stopped = stopped = true; + } else if (task_is_stopped(target)) { + /* + * Stopped is considered quiescent; when it wakes up, it will + * go through utrace_get_signal() before doing anything else. + */ + utrace->stopped = stopped = true; + } else if (!utrace->report && !utrace->interrupt) { + utrace->report = 1; + set_notify_resume(target); + } + spin_unlock_irq(&target->sighand->siglock); + + return stopped; +} + +/* + * If the target is not dead it should not be in tracing + * stop any more. Wake it unless it's in job control stop. + * + * Called with @utrace->lock held and @utrace->stopped set. + */ +static void utrace_wakeup(struct task_struct *target, struct utrace *utrace) +{ + struct sighand_struct *sighand; + unsigned long irqflags; + + utrace->stopped = 0; + + sighand = lock_task_sighand(target, &irqflags); + if (unlikely(!sighand)) + return; + + if (likely(task_is_stopped_or_traced(target))) { + if (target->signal->flags & SIGNAL_STOP_STOPPED) + target->state = TASK_STOPPED; + else + wake_up_state(target, __TASK_STOPPED | __TASK_TRACED); + } + + unlock_task_sighand(target, &irqflags); +} + +/* + * This is called when there might be some detached engines on the list or + * some stale bits in @task->utrace_flags. Clean them up and recompute the + * flags. + * + * @action is NULL when @task is stopped and @utrace->stopped is set; wake + * it up if it should not be. @action is set when @task is current; if + * we're fully detached, reset *@action to UTRACE_RESUME. + * + * Called with @utrace->lock held, returns with it released. + * After this returns, @utrace might be freed if everything detached. + */ +static void utrace_reset(struct task_struct *task, struct utrace *utrace, + enum utrace_resume_action *action) + __releases(utrace->lock) +{ + struct utrace_engine *engine, *next; + unsigned long flags = 0; + LIST_HEAD(detached); + bool wake = !action; + BUG_ON(wake != (task != current)); + + splice_attaching(utrace); + + /* + * Update the set of events of interest from the union + * of the interests of the remaining tracing engines. + * For any engine marked detached, remove it from the list. + * We'll collect them on the detached list. + */ + list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + if (engine->ops == &utrace_detached_ops) { + engine->ops = NULL; + list_move(&engine->entry, &detached); + } else { + flags |= engine->flags | UTRACE_EVENT(REAP); + wake = wake && !engine_wants_stop(engine); + } + } + + if (task->exit_state) { + /* + * Once it's already dead, we never install any flags + * except REAP. When ->exit_state is set and events + * like DEATH are not set, then they never can be set. + * This ensures that utrace_release_task() knows + * positively that utrace_report_death() can never run. + */ + BUG_ON(utrace->death); + flags &= UTRACE_EVENT(REAP); + wake = false; + } else if (!(flags & UTRACE_EVENT_SYSCALL) && + test_tsk_thread_flag(task, TIF_SYSCALL_TRACE)) { + clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE); + } + + task->utrace_flags = flags; + + if (wake) + utrace_wakeup(task, utrace); + + /* + * If any engines are left, we're done. + */ + spin_unlock(&utrace->lock); + if (!flags) { + /* + * No more engines, cleared out the utrace. + */ + + if (action) + *action = UTRACE_RESUME; + } + + put_detached_list(&detached); +} + +/* + * You can't do anything to a dead task but detach it. + * If release_task() has been called, you can't do that. + * + * On the exit path, DEATH and QUIESCE event bits are set only + * before utrace_report_death() has taken the lock. At that point, + * the death report will come soon, so disallow detach until it's + * done. This prevents us from racing with it detaching itself. + * + * Called with utrace->lock held, when @target->exit_state is nonzero. + */ +static inline int utrace_control_dead(struct task_struct *target, + struct utrace *utrace, + enum utrace_resume_action action) +{ + if (action != UTRACE_DETACH || unlikely(utrace->reap)) + return -ESRCH; + + if (unlikely(utrace->death)) + /* + * We have already started the death report. We can't + * prevent the report_death and report_reap callbacks, + * so tell the caller they will happen. + */ + return -EALREADY; + + return 0; +} + +/** + * utrace_control - control a thread being traced by a tracing engine + * @target: thread to affect + * @engine: attached engine to affect + * @action: &enum utrace_resume_action for thread to do + * + * This is how a tracing engine asks a traced thread to do something. + * This call is controlled by the @action argument, which has the + * same meaning as the &enum utrace_resume_action value returned by + * event reporting callbacks. + * + * If @target is already dead (@target->exit_state nonzero), + * all actions except %UTRACE_DETACH fail with -%ESRCH. + * + * The following sections describe each option for the @action argument. + * + * UTRACE_DETACH: + * + * After this, the @engine data structure is no longer accessible, + * and the thread might be reaped. The thread will start running + * again if it was stopped and no longer has any attached engines + * that want it stopped. + * + * If the @report_reap callback may already have begun, this fails + * with -%ESRCH. If the @report_death callback may already have + * begun, this fails with -%EALREADY. + * + * If @target is not already stopped, then a callback to this engine + * might be in progress or about to start on another CPU. If so, + * then this returns -%EINPROGRESS; the detach happens as soon as + * the pending callback is finished. To synchronize after an + * -%EINPROGRESS return, see utrace_barrier(). + * + * If @target is properly stopped before utrace_control() is called, + * then after successful return it's guaranteed that no more callbacks + * to the @engine->ops vector will be made. + * + * The only exception is %SIGKILL (and exec or group-exit by another + * thread in the group), which can cause asynchronous @report_death + * and/or @report_reap callbacks even when %UTRACE_STOP was used. + * (In that event, this fails with -%ESRCH or -%EALREADY, see above.) + * + * UTRACE_STOP: + * This asks that @target stop running. This returns 0 only if + * @target is already stopped, either for tracing or for job + * control. Then @target will remain stopped until another + * utrace_control() call is made on @engine; @target can be woken + * only by %SIGKILL (or equivalent, such as exec or termination by + * another thread in the same thread group). + * + * This returns -%EINPROGRESS if @target is not already stopped. + * Then the effect is like %UTRACE_REPORT. A @report_quiesce or + * @report_signal callback will be made soon. Your callback can + * then return %UTRACE_STOP to keep @target stopped. + * + * This does not interrupt system calls in progress, including ones + * that sleep for a long time. For that, use %UTRACE_INTERRUPT. + * To interrupt system calls and then keep @target stopped, your + * @report_signal callback can return %UTRACE_STOP. + * + * UTRACE_RESUME: + * + * Just let @target continue running normally, reversing the effect + * of a previous %UTRACE_STOP. If another engine is keeping @target + * stopped, then it remains stopped until all engines let it resume. + * If @target was not stopped, this has no effect. + * + * UTRACE_REPORT: + * + * This is like %UTRACE_RESUME, but also ensures that there will be + * a @report_quiesce or @report_signal callback made soon. If + * @target had been stopped, then there will be a callback before it + * resumes running normally. If another engine is keeping @target + * stopped, then there might be no callbacks until all engines let + * it resume. + * + * UTRACE_INTERRUPT: + * + * This is like %UTRACE_REPORT, but ensures that @target will make a + * @report_signal callback before it resumes or delivers signals. + * If @target was in a system call or about to enter one, work in + * progress will be interrupted as if by %SIGSTOP. If another + * engine is keeping @target stopped, then there might be no + * callbacks until all engines let it resume. + * + * This gives @engine an opportunity to introduce a forced signal + * disposition via its @report_signal callback. + * + * UTRACE_SINGLESTEP: + * + * It's invalid to use this unless arch_has_single_step() returned true. + * This is like %UTRACE_RESUME, but resumes for one user instruction + * only. It's invalid to use this in utrace_control() unless @target + * had been stopped by @engine previously. + * + * Note that passing %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP to + * utrace_control() or returning it from an event callback alone does + * not necessarily ensure that stepping will be enabled. If there are + * more callbacks made to any engine before returning to user mode, + * then the resume action is chosen only by the last set of callbacks. + * To be sure, enable %UTRACE_EVENT(%QUIESCE) and look for the + * @report_quiesce callback with a zero event mask, or the + * @report_signal callback with %UTRACE_SIGNAL_REPORT. + * + * UTRACE_BLOCKSTEP: + * + * It's invalid to use this unless arch_has_block_step() returned true. + * This is like %UTRACE_SINGLESTEP, but resumes for one whole basic + * block of user instructions. + * + * %UTRACE_BLOCKSTEP devolves to %UTRACE_SINGLESTEP when another + * tracing engine is using %UTRACE_SINGLESTEP at the same time. + */ +int utrace_control(struct task_struct *target, + struct utrace_engine *engine, + enum utrace_resume_action action) +{ + struct utrace *utrace; + bool resume; + int ret; + + if (unlikely(action > UTRACE_DETACH)) + return -EINVAL; + + utrace = get_utrace_lock(target, engine, true); + if (unlikely(IS_ERR(utrace))) + return PTR_ERR(utrace); + + if (target->exit_state) { + ret = utrace_control_dead(target, utrace, action); + if (ret) { + spin_unlock(&utrace->lock); + return ret; + } + } + + resume = utrace->stopped; + ret = 0; + + clear_engine_wants_stop(engine); + switch (action) { + case UTRACE_STOP: + mark_engine_wants_stop(engine); + if (!resume && !utrace_do_stop(target, utrace)) + ret = -EINPROGRESS; + resume = false; + break; + + case UTRACE_DETACH: + mark_engine_detached(engine); + resume = resume || utrace_do_stop(target, utrace); + if (!resume) { + /* + * As in utrace_set_events(), this barrier ensures + * that our engine->flags changes have hit before we + * examine utrace->reporting, pairing with the barrier + * in start_callback(). If @target has not yet hit + * finish_callback() to clear utrace->reporting, we + * might be in the middle of a callback to @engine. + */ + smp_mb(); + if (utrace->reporting == engine) + ret = -EINPROGRESS; + break; + } + /* Fall through. */ + + case UTRACE_RESUME: + /* + * This and all other cases imply resuming if stopped. + * There might not be another report before it just + * resumes, so make sure single-step is not left set. + */ + if (likely(resume)) + user_disable_single_step(target); + break; + + case UTRACE_REPORT: + /* + * Make the thread call tracehook_notify_resume() soon. + * But don't bother if it's already been interrupted. + * In that case, utrace_get_signal() will be reporting soon. + */ + if (!utrace->report && !utrace->interrupt) { + utrace->report = 1; + set_notify_resume(target); + } + break; + + case UTRACE_INTERRUPT: + /* + * Make the thread call tracehook_get_signal() soon. + */ + if (utrace->interrupt) + break; + utrace->interrupt = 1; + + /* + * If it's not already stopped, interrupt it now. + * We need the siglock here in case it calls + * recalc_sigpending() and clears its own + * TIF_SIGPENDING. By taking the lock, we've + * serialized any later recalc_sigpending() after + * our setting of utrace->interrupt to force it on. + */ + if (resume) { + /* + * This is really just to keep the invariant + * that TIF_SIGPENDING is set with utrace->interrupt. + * When it's stopped, we know it's always going + * through utrace_get_signal and will recalculate. + */ + set_tsk_thread_flag(target, TIF_SIGPENDING); + } else { + struct sighand_struct *sighand; + unsigned long irqflags; + sighand = lock_task_sighand(target, &irqflags); + if (likely(sighand)) { + signal_wake_up(target, 0); + unlock_task_sighand(target, &irqflags); + } + } + break; + + case UTRACE_BLOCKSTEP: + /* + * Resume from stopped, step one block. + */ + if (unlikely(!arch_has_block_step())) { + WARN_ON(1); + /* Fall through to treat it as SINGLESTEP. */ + } else if (likely(resume)) { + user_enable_block_step(target); + break; + } + + case UTRACE_SINGLESTEP: + /* + * Resume from stopped, step one instruction. + */ + if (unlikely(!arch_has_single_step())) { + WARN_ON(1); + resume = false; + ret = -EOPNOTSUPP; + break; + } + + if (likely(resume)) + user_enable_single_step(target); + else + /* + * You were supposed to stop it before asking + * it to step. + */ + ret = -EAGAIN; + break; + } + + /* + * Let the thread resume running. If it's not stopped now, + * there is nothing more we need to do. + */ + if (resume) + utrace_reset(target, utrace, NULL); + else + spin_unlock(&utrace->lock); + + return ret; +} +EXPORT_SYMBOL_GPL(utrace_control); + +/** + * utrace_barrier - synchronize with simultaneous tracing callbacks + * @target: thread to affect + * @engine: engine to affect (can be detached) + * + * This blocks while @target might be in the midst of making a callback to + * @engine. It can be interrupted by signals and will return -%ERESTARTSYS. + * A return value of zero means no callback from @target to @engine was + * in progress. Any effect of its return value (such as %UTRACE_STOP) has + * already been applied to @engine. + * + * It's not necessary to keep the @target pointer alive for this call. + * It's only necessary to hold a ref on @engine. This will return + * safely even if @target has been reaped and has no task refs. + * + * A successful return from utrace_barrier() guarantees its ordering + * with respect to utrace_set_events() and utrace_control() calls. If + * @target was not properly stopped, event callbacks just disabled might + * still be in progress; utrace_barrier() waits until there is no chance + * an unwanted callback can be in progress. + */ +int utrace_barrier(struct task_struct *target, struct utrace_engine *engine) +{ + struct utrace *utrace; + int ret = -ERESTARTSYS; + + if (unlikely(target == current)) + return 0; + + do { + utrace = get_utrace_lock(target, engine, false); + if (unlikely(IS_ERR(utrace))) { + ret = PTR_ERR(utrace); + if (ret != -ERESTARTSYS) + break; + } else { + /* + * All engine state changes are done while + * holding the lock, i.e. before we get here. + * Since we have the lock, we only need to + * worry about @target making a callback. + * When it has entered start_callback() but + * not yet gotten to finish_callback(), we + * will see utrace->reporting == @engine. + * When @target doesn't take the lock, it uses + * barriers to order setting utrace->reporting + * before it examines the engine state. + */ + if (utrace->reporting != engine) + ret = 0; + spin_unlock(&utrace->lock); + if (!ret) + break; + } + schedule_timeout_interruptible(1); + } while (!signal_pending(current)); + + return ret; +} +EXPORT_SYMBOL_GPL(utrace_barrier); + +/* + * This is local state used for reporting loops, perhaps optimized away. + */ +struct utrace_report { + enum utrace_resume_action action; + u32 result; + bool detaches; + bool reports; + bool takers; + bool killed; +}; + +#define INIT_REPORT(var) \ + struct utrace_report var = { UTRACE_RESUME, 0, \ + false, false, false, false } + +/* + * We are now making the report, so clear the flag saying we need one. + */ +static void start_report(struct utrace *utrace) +{ + BUG_ON(utrace->stopped); + if (utrace->report) { + spin_lock(&utrace->lock); + utrace->report = 0; + splice_attaching(utrace); + spin_unlock(&utrace->lock); + } +} + +/* + * Complete a normal reporting pass, pairing with a start_report() call. + * This handles any UTRACE_DETACH or UTRACE_REPORT or UTRACE_INTERRUPT + * returns from engine callbacks. If any engine's last callback used + * UTRACE_STOP, we do UTRACE_REPORT here to ensure we stop before user + * mode. If there were no callbacks made, it will recompute + * @task->utrace_flags to avoid another false-positive. + */ +static void finish_report(struct utrace_report *report, + struct task_struct *task, struct utrace *utrace) +{ + bool clean = (report->takers && !report->detaches); + + if (report->action <= UTRACE_REPORT && !utrace->report) { + spin_lock(&utrace->lock); + utrace->report = 1; + set_tsk_thread_flag(task, TIF_NOTIFY_RESUME); + } else if (report->action == UTRACE_INTERRUPT && !utrace->interrupt) { + spin_lock(&utrace->lock); + utrace->interrupt = 1; + set_tsk_thread_flag(task, TIF_SIGPENDING); + } else if (clean) { + return; + } else { + spin_lock(&utrace->lock); + } + + if (clean) + spin_unlock(&utrace->lock); + else + utrace_reset(task, utrace, &report->action); +} + +/* + * Apply the return value of one engine callback to @report. + * Returns true if @engine detached and should not get any more callbacks. + */ +static bool finish_callback(struct utrace *utrace, + struct utrace_report *report, + struct utrace_engine *engine, + u32 ret) +{ + enum utrace_resume_action action = utrace_resume_action(ret); + + report->result = ret & ~UTRACE_RESUME_MASK; + + /* + * If utrace_control() was used, treat that like UTRACE_DETACH here. + */ + if (action == UTRACE_DETACH || engine->ops == &utrace_detached_ops) { + engine->ops = &utrace_detached_ops; + report->detaches = true; + } else { + if (action < report->action) + report->action = action; + + if (action == UTRACE_STOP) { + if (!engine_wants_stop(engine)) { + spin_lock(&utrace->lock); + mark_engine_wants_stop(engine); + spin_unlock(&utrace->lock); + } + } else { + if (action == UTRACE_REPORT) + report->reports = true; + + if (engine_wants_stop(engine)) { + spin_lock(&utrace->lock); + clear_engine_wants_stop(engine); + spin_unlock(&utrace->lock); + } + } + } + + /* + * Now that we have applied the effect of the return value, + * clear this so that utrace_barrier() can stop waiting. + * A subsequent utrace_control() can stop or resume @engine + * and know this was ordered after its callback's action. + * + * We don't need any barriers here because utrace_barrier() + * takes utrace->lock. If we touched engine->flags above, + * the lock guaranteed this change was before utrace_barrier() + * examined utrace->reporting. + */ + utrace->reporting = NULL; + + /* + * This is a good place to make sure tracing engines don't + * introduce too much latency under voluntary preemption. + */ + if (need_resched()) + cond_resched(); + + return engine->ops == &utrace_detached_ops; +} + +/* + * Start the callbacks for @engine to consider @event (a bit mask). + * This makes the report_quiesce() callback first. If @engine wants + * a specific callback for @event, we return the ops vector to use. + * If not, we return NULL. The return value from the ops->callback + * function called should be passed to finish_callback(). + */ +static const struct utrace_engine_ops *start_callback( + struct utrace *utrace, struct utrace_report *report, + struct utrace_engine *engine, struct task_struct *task, + unsigned long event) +{ + const struct utrace_engine_ops *ops; + unsigned long want; + + /* + * This barrier ensures that we've set utrace->reporting before + * we examine engine->flags or engine->ops. utrace_barrier() + * relies on this ordering to indicate that the effect of any + * utrace_control() and utrace_set_events() calls is in place + * by the time utrace->reporting can be seen to be NULL. + */ + utrace->reporting = engine; + smp_mb(); + + /* + * This pairs with the barrier in mark_engine_detached(). + * It makes sure that we never see the old ops vector with + * the new flags, in case the original vector had no report_quiesce. + */ + want = engine->flags; + smp_rmb(); + ops = engine->ops; + + if (want & UTRACE_EVENT(QUIESCE)) { + if (finish_callback(utrace, report, engine, + (*ops->report_quiesce)(report->action, + engine, task, + event))) + return NULL; + + /* + * finish_callback() reset utrace->reporting after the + * quiesce callback. Now we set it again (as above) + * before re-examining engine->flags, which could have + * been changed synchronously by ->report_quiesce or + * asynchronously by utrace_control() or utrace_set_events(). + */ + utrace->reporting = engine; + smp_mb(); + want = engine->flags; + } + + if (want & ENGINE_STOP) + report->action = UTRACE_STOP; + + if (want & event) { + report->takers = true; + return ops; + } + + return NULL; +} + +/* + * Do a normal reporting pass for engines interested in @event. + * @callback is the name of the member in the ops vector, and remaining + * args are the extras it takes after the standard three args. + */ +#define REPORT(task, utrace, report, event, callback, ...) \ + do { \ + start_report(utrace); \ + REPORT_CALLBACKS(task, utrace, report, event, callback, \ + (report)->action, engine, current, \ + ## __VA_ARGS__); \ + finish_report(report, task, utrace); \ + } while (0) +#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...) \ + do { \ + struct utrace_engine *engine; \ + const struct utrace_engine_ops *ops; \ + list_for_each_entry(engine, &utrace->attached, entry) { \ + ops = start_callback(utrace, report, engine, task, \ + event); \ + if (!ops) \ + continue; \ + finish_callback(utrace, report, engine, \ + (*ops->callback)(__VA_ARGS__)); \ + } \ + } while (0) + +/* + * Called iff UTRACE_EVENT(EXEC) flag is set. + */ +void utrace_report_exec(struct linux_binfmt *fmt, struct linux_binprm *bprm, + struct pt_regs *regs) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + + REPORT(task, utrace, &report, UTRACE_EVENT(EXEC), + report_exec, fmt, bprm, regs); +} + +/* + * Called iff UTRACE_EVENT(SYSCALL_ENTRY) flag is set. + * Return true to prevent the system call. + */ +bool utrace_report_syscall_entry(struct pt_regs *regs) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + + start_report(utrace); + REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY), + report_syscall_entry, report.result | report.action, + engine, current, regs); + finish_report(&report, task, utrace); + + if (report.action == UTRACE_STOP && + unlikely(utrace_stop(task, utrace, false))) + /* + * We are continuing despite UTRACE_STOP because of a + * SIGKILL. Don't let the system call actually proceed. + */ + return true; + + if (unlikely(report.result == UTRACE_SYSCALL_ABORT)) + return true; + + if (signal_pending(task)) { + /* + * Clear TIF_SIGPENDING if it no longer needs to be set. + * It may have been set as part of quiescence, and won't + * ever have been cleared by another thread. For other + * reports, we can just leave it set and will go through + * utrace_get_signal() to reset things. But here we are + * about to enter a syscall, which might bail out with an + * -ERESTART* error if it's set now. + */ + spin_lock_irq(&task->sighand->siglock); + recalc_sigpending(); + spin_unlock_irq(&task->sighand->siglock); + } + + return false; +} + +/* + * Called iff UTRACE_EVENT(SYSCALL_EXIT) flag is set. + */ +void utrace_report_syscall_exit(struct pt_regs *regs) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + + REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT), + report_syscall_exit, regs); +} + +/* + * Called iff UTRACE_EVENT(CLONE) flag is set. + * This notification call blocks the wake_up_new_task call on the child. + * So we must not quiesce here. tracehook_report_clone_complete will do + * a quiescence check momentarily. + */ +void utrace_report_clone(unsigned long clone_flags, struct task_struct *child) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + + /* + * We don't use the REPORT() macro here, because we need + * to clear utrace->cloning before finish_report(). + * After finish_report(), utrace can be a stale pointer + * in cases when report.action is still UTRACE_RESUME. + */ + start_report(utrace); + utrace->cloning = child; + + REPORT_CALLBACKS(task, utrace, &report, + UTRACE_EVENT(CLONE), report_clone, + report.action, engine, task, clone_flags, child); + + utrace->cloning = NULL; + finish_report(&report, task, utrace); + + /* + * For a vfork, we will go into an uninterruptible block waiting + * for the child. We need UTRACE_STOP to happen before this, not + * after. For CLONE_VFORK, utrace_finish_vfork() will be called. + */ + if (report.action == UTRACE_STOP && (clone_flags & CLONE_VFORK)) { + spin_lock(&utrace->lock); + utrace->vfork_stop = 1; + spin_unlock(&utrace->lock); + } +} + +/* + * We're called after utrace_report_clone() for a CLONE_VFORK. + * If UTRACE_STOP was left from the clone report, we stop here. + * After this, we'll enter the uninterruptible wait_for_completion() + * waiting for the child. + */ +void utrace_finish_vfork(struct task_struct *task) +{ + struct utrace *utrace = task_utrace_struct(task); + + spin_lock(&utrace->lock); + if (!utrace->vfork_stop) + spin_unlock(&utrace->lock); + else { + utrace->vfork_stop = 0; + spin_unlock(&utrace->lock); + utrace_stop(task, utrace, false); + } +} + +/* + * Called iff UTRACE_EVENT(JCTL) flag is set. + * + * Called with siglock held. + */ +void utrace_report_jctl(int notify, int what) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + bool stop = task_is_stopped(task); + + /* + * We have to come out of TASK_STOPPED in case the event report + * hooks might block. Since we held the siglock throughout, it's + * as if we were never in TASK_STOPPED yet at all. + */ + if (stop) { + __set_current_state(TASK_RUNNING); + task->signal->flags &= ~SIGNAL_STOP_STOPPED; + ++task->signal->group_stop_count; + } + spin_unlock_irq(&task->sighand->siglock); + + /* + * We get here with CLD_STOPPED when we've just entered + * TASK_STOPPED, or with CLD_CONTINUED when we've just come + * out but not yet been through utrace_get_signal() again. + * + * While in TASK_STOPPED, we can be considered safely + * stopped by utrace_do_stop() and detached asynchronously. + * If we woke up and checked task->utrace_flags before that + * was finished, we might be here with utrace already + * removed or in the middle of being removed. + * + * If we are indeed attached, then make sure we are no + * longer considered stopped while we run callbacks. + */ + spin_lock(&utrace->lock); + utrace->stopped = 0; + /* + * Do start_report()'s work too since we already have the lock anyway. + */ + utrace->report = 0; + splice_attaching(utrace); + spin_unlock(&utrace->lock); + + REPORT(task, utrace, &report, UTRACE_EVENT(JCTL), + report_jctl, what, notify); + + /* + * Retake the lock, and go back into TASK_STOPPED + * unless the stop was just cleared. + */ + spin_lock_irq(&task->sighand->siglock); + if (stop && task->signal->group_stop_count > 0) { + __set_current_state(TASK_STOPPED); + if (--task->signal->group_stop_count == 0) + task->signal->flags |= SIGNAL_STOP_STOPPED; + } +} + +/* + * Called iff UTRACE_EVENT(EXIT) flag is set. + */ +void utrace_report_exit(long *exit_code) +{ + struct task_struct *task = current; + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + long orig_code = *exit_code; + + REPORT(task, utrace, &report, UTRACE_EVENT(EXIT), + report_exit, orig_code, exit_code); + + if (report.action == UTRACE_STOP) + utrace_stop(task, utrace, false); +} + +/* + * Called iff UTRACE_EVENT(DEATH) or UTRACE_EVENT(QUIESCE) flag is set. + * + * It is always possible that we are racing with utrace_release_task here. + * For this reason, utrace_release_task checks for the event bits that get + * us here, and delays its cleanup for us to do. + */ +void utrace_report_death(struct task_struct *task, struct utrace *utrace, + bool group_dead, int signal) +{ + INIT_REPORT(report); + + BUG_ON(!task->exit_state); + + /* + * We are presently considered "quiescent"--which is accurate + * inasmuch as we won't run any more user instructions ever again. + * But for utrace_control and utrace_set_events to be robust, they + * must be sure whether or not we will run any more callbacks. If + * a call comes in before we do, taking the lock here synchronizes + * us so we don't run any callbacks just disabled. Calls that come + * in while we're running the callbacks will see the exit.death + * flag and know that we are not yet fully quiescent for purposes + * of detach bookkeeping. + */ + spin_lock(&utrace->lock); + BUG_ON(utrace->death); + utrace->death = 1; + utrace->report = 0; + utrace->interrupt = 0; + spin_unlock(&utrace->lock); + + REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH), + report_death, engine, task, group_dead, signal); + + spin_lock(&utrace->lock); + + /* + * After we unlock (possibly inside utrace_reap for callbacks) with + * this flag clear, competing utrace_control/utrace_set_events calls + * know that we've finished our callbacks and any detach bookkeeping. + */ + utrace->death = 0; + + if (utrace->reap) + /* + * utrace_release_task() was already called in parallel. + * We must complete its work now. + */ + utrace_reap(task, utrace); + else + utrace_reset(task, utrace, &report.action); +} + +/* + * Finish the last reporting pass before returning to user mode. + */ +static void finish_resume_report(struct utrace_report *report, + struct task_struct *task, + struct utrace *utrace) +{ + if (report->detaches || !report->takers) { + spin_lock(&utrace->lock); + utrace_reset(task, utrace, &report->action); + } + + switch (report->action) { + case UTRACE_STOP: + report->killed = utrace_stop(task, utrace, report->reports); + break; + + case UTRACE_INTERRUPT: + if (!signal_pending(task)) + set_tsk_thread_flag(task, TIF_SIGPENDING); + break; + + case UTRACE_SINGLESTEP: + user_enable_single_step(task); + break; + + case UTRACE_BLOCKSTEP: + user_enable_block_step(task); + break; + + case UTRACE_REPORT: + case UTRACE_RESUME: + default: + user_disable_single_step(task); + break; + } +} + +/* + * This is called when TIF_NOTIFY_RESUME had been set (and is now clear). + * We are close to user mode, and this is the place to report or stop. + * When we return, we're going to user mode or into the signals code. + */ +void utrace_resume(struct task_struct *task, struct pt_regs *regs) +{ + struct utrace *utrace = task_utrace_struct(task); + INIT_REPORT(report); + struct utrace_engine *engine; + + /* + * Some machines get here with interrupts disabled. The same arch + * code path leads to calling into get_signal_to_deliver(), which + * implicitly reenables them by virtue of spin_unlock_irq. + */ + local_irq_enable(); + + /* + * If this flag is still set it's because there was a signal + * handler setup done but no report_signal following it. Clear + * the flag before we get to user so it doesn't confuse us later. + */ + if (unlikely(utrace->signal_handler)) { + int skip; + spin_lock(&utrace->lock); + utrace->signal_handler = 0; + skip = !utrace->report; + spin_unlock(&utrace->lock); + if (skip) + return; + } + + /* + * If UTRACE_INTERRUPT was just used, we don't bother with a + * report here. We will report and stop in utrace_get_signal(). + */ + if (unlikely(utrace->interrupt)) + return; + + /* + * Do a simple reporting pass, with no callback after report_quiesce. + */ + start_report(utrace); + + list_for_each_entry(engine, &utrace->attached, entry) + start_callback(utrace, &report, engine, task, 0); + + /* + * Finish the report and either stop or get ready to resume. + */ + finish_resume_report(&report, task, utrace); +} + +/* + * Return true if current has forced signal_pending(). + * + * This is called only when current->utrace_flags is nonzero, so we know + * that current->utrace must be set. It's not inlined in tracehook.h + * just so that struct utrace can stay opaque outside this file. + */ +bool utrace_interrupt_pending(void) +{ + return task_utrace_struct(current)->interrupt; +} + +/* + * Take the siglock and push @info back on our queue. + * Returns with @task->sighand->siglock held. + */ +static void push_back_signal(struct task_struct *task, siginfo_t *info) + __acquires(task->sighand->siglock) +{ + struct sigqueue *q; + + if (unlikely(!info->si_signo)) { /* Oh, a wise guy! */ + spin_lock_irq(&task->sighand->siglock); + return; + } + + q = sigqueue_alloc(); + if (likely(q)) { + q->flags = 0; + copy_siginfo(&q->info, info); + } + + spin_lock_irq(&task->sighand->siglock); + + sigaddset(&task->pending.signal, info->si_signo); + if (likely(q)) + list_add(&q->list, &task->pending.list); + + set_tsk_thread_flag(task, TIF_SIGPENDING); +} + +/* + * This is the hook from the signals code, called with the siglock held. + * Here is the ideal place to stop. We also dequeue and intercept signals. + */ +int utrace_get_signal(struct task_struct *task, struct pt_regs *regs, + siginfo_t *info, struct k_sigaction *return_ka) + __releases(task->sighand->siglock) + __acquires(task->sighand->siglock) +{ + struct utrace *utrace; + struct k_sigaction *ka; + INIT_REPORT(report); + struct utrace_engine *engine; + const struct utrace_engine_ops *ops; + unsigned long event, want; + u32 ret; + int signr; + + utrace = &task->utrace; + if (utrace->interrupt || utrace->report || utrace->signal_handler) { + /* + * We've been asked for an explicit report before we + * even check for pending signals. + */ + + spin_unlock_irq(&task->sighand->siglock); + + spin_lock(&utrace->lock); + + splice_attaching(utrace); + + if (unlikely(!utrace->interrupt) && unlikely(!utrace->report)) + report.result = UTRACE_SIGNAL_IGN; + else if (utrace->signal_handler) + report.result = UTRACE_SIGNAL_HANDLER; + else + report.result = UTRACE_SIGNAL_REPORT; + + /* + * We are now making the report and it's on the + * interrupt path, so clear the flags asking for those. + */ + utrace->interrupt = utrace->report = utrace->signal_handler = 0; + utrace->stopped = 0; + + /* + * Make sure signal_pending() only returns true + * if there are real signals pending. + */ + if (signal_pending(task)) { + spin_lock_irq(&task->sighand->siglock); + recalc_sigpending(); + spin_unlock_irq(&task->sighand->siglock); + } + + spin_unlock(&utrace->lock); + + if (unlikely(report.result == UTRACE_SIGNAL_IGN)) + /* + * We only got here to clear utrace->signal_handler. + */ + return -1; + + /* + * Do a reporting pass for no signal, just for EVENT(QUIESCE). + * The engine callbacks can fill in *info and *return_ka. + * We'll pass NULL for the @orig_ka argument to indicate + * that there was no original signal. + */ + event = 0; + ka = NULL; + memset(return_ka, 0, sizeof *return_ka); + } else if ((task->utrace_flags & UTRACE_EVENT_SIGNAL_ALL) == 0 && + !utrace->stopped) { + /* + * If no engine is interested in intercepting signals, + * let the caller just dequeue them normally. + */ + return 0; + } else { + if (unlikely(utrace->stopped)) { + spin_unlock_irq(&task->sighand->siglock); + spin_lock(&utrace->lock); + utrace->stopped = 0; + spin_unlock(&utrace->lock); + spin_lock_irq(&task->sighand->siglock); + } + + /* + * Steal the next signal so we can let tracing engines + * examine it. From the signal number and sigaction, + * determine what normal delivery would do. If no + * engine perturbs it, we'll do that by returning the + * signal number after setting *return_ka. + */ + signr = dequeue_signal(task, &task->blocked, info); + if (signr == 0) + return signr; + BUG_ON(signr != info->si_signo); + + ka = &task->sighand->action[signr - 1]; + *return_ka = *ka; + + /* + * We are never allowed to interfere with SIGKILL. + * Just punt after filling in *return_ka for our caller. + */ + if (signr == SIGKILL) + return signr; + + if (ka->sa.sa_handler == SIG_IGN) { + event = UTRACE_EVENT(SIGNAL_IGN); + report.result = UTRACE_SIGNAL_IGN; + } else if (ka->sa.sa_handler != SIG_DFL) { + event = UTRACE_EVENT(SIGNAL); + report.result = UTRACE_SIGNAL_DELIVER; + } else if (sig_kernel_coredump(signr)) { + event = UTRACE_EVENT(SIGNAL_CORE); + report.result = UTRACE_SIGNAL_CORE; + } else if (sig_kernel_ignore(signr)) { + event = UTRACE_EVENT(SIGNAL_IGN); + report.result = UTRACE_SIGNAL_IGN; + } else if (signr == SIGSTOP) { + event = UTRACE_EVENT(SIGNAL_STOP); + report.result = UTRACE_SIGNAL_STOP; + } else if (sig_kernel_stop(signr)) { + event = UTRACE_EVENT(SIGNAL_STOP); + report.result = UTRACE_SIGNAL_TSTP; + } else { + event = UTRACE_EVENT(SIGNAL_TERM); + report.result = UTRACE_SIGNAL_TERM; + } + + /* + * Now that we know what event type this signal is, we + * can short-circuit if no engines care about those. + */ + if ((task->utrace_flags & (event | UTRACE_EVENT(QUIESCE))) == 0) + return signr; + + /* + * We have some interested engines, so tell them about + * the signal and let them change its disposition. + */ + spin_unlock_irq(&task->sighand->siglock); + } + + /* + * This reporting pass chooses what signal disposition we'll act on. + */ + list_for_each_entry(engine, &utrace->attached, entry) { + /* + * See start_callback() comment about this barrier. + */ + utrace->reporting = engine; + smp_mb(); + + /* + * This pairs with the barrier in mark_engine_detached(), + * see start_callback() comments. + */ + want = engine->flags; + smp_rmb(); + ops = engine->ops; + + if ((want & (event | UTRACE_EVENT(QUIESCE))) == 0) { + utrace->reporting = NULL; + continue; + } + + if (ops->report_signal) + ret = (*ops->report_signal)( + report.result | report.action, engine, task, + regs, info, ka, return_ka); + else + ret = (report.result | (*ops->report_quiesce)( + report.action, engine, task, event)); + + /* + * Avoid a tight loop reporting again and again if some + * engine is too stupid. + */ + switch (utrace_resume_action(ret)) { + default: + break; + case UTRACE_INTERRUPT: + case UTRACE_REPORT: + ret = (ret & ~UTRACE_RESUME_MASK) | UTRACE_RESUME; + break; + } + + finish_callback(utrace, &report, engine, ret); + } + + /* + * We express the chosen action to the signals code in terms + * of a representative signal whose default action does it. + * Our caller uses our return value (signr) to decide what to + * do, but uses info->si_signo as the signal number to report. + */ + switch (utrace_signal_action(report.result)) { + case UTRACE_SIGNAL_TERM: + signr = SIGTERM; + break; + + case UTRACE_SIGNAL_CORE: + signr = SIGQUIT; + break; + + case UTRACE_SIGNAL_STOP: + signr = SIGSTOP; + break; + + case UTRACE_SIGNAL_TSTP: + signr = SIGTSTP; + break; + + case UTRACE_SIGNAL_DELIVER: + signr = info->si_signo; + + if (return_ka->sa.sa_handler == SIG_DFL) { + /* + * We'll do signr's normal default action. + * For ignore, we'll fall through below. + * For stop/death, break locks and returns it. + */ + if (likely(signr) && !sig_kernel_ignore(signr)) + break; + } else if (return_ka->sa.sa_handler != SIG_IGN && + likely(signr)) { + /* + * Complete the bookkeeping after the report. + * The handler will run. If an engine wanted to + * stop or step, then make sure we do another + * report after signal handler setup. + */ + if (report.action != UTRACE_RESUME) + report.action = UTRACE_INTERRUPT; + finish_report(&report, task, utrace); + + if (unlikely(report.result & UTRACE_SIGNAL_HOLD)) + push_back_signal(task, info); + else + spin_lock_irq(&task->sighand->siglock); + + /* + * We do the SA_ONESHOT work here since the + * normal path will only touch *return_ka now. + */ + if (unlikely(return_ka->sa.sa_flags & SA_ONESHOT)) { + return_ka->sa.sa_flags &= ~SA_ONESHOT; + if (likely(valid_signal(signr))) { + ka = &task->sighand->action[signr - 1]; + ka->sa.sa_handler = SIG_DFL; + } + } + + return signr; + } + + /* Fall through for an ignored signal. */ + + case UTRACE_SIGNAL_IGN: + case UTRACE_SIGNAL_REPORT: + default: + /* + * If the signal is being ignored, then we are on the way + * directly back to user mode. We can stop here, or step, + * as in utrace_resume(), above. After we've dealt with that, + * our caller will relock and come back through here. + */ + finish_resume_report(&report, task, utrace); + + if (unlikely(report.killed)) { + /* + * The only reason we woke up now was because of a + * SIGKILL. Don't do normal dequeuing in case it + * might get a signal other than SIGKILL. That would + * perturb the death state so it might differ from + * what the debugger would have allowed to happen. + * Instead, pluck out just the SIGKILL to be sure + * we'll die immediately with nothing else different + * from the quiescent state the debugger wanted us in. + */ + sigset_t sigkill_only; + siginitsetinv(&sigkill_only, sigmask(SIGKILL)); + spin_lock_irq(&task->sighand->siglock); + signr = dequeue_signal(task, &sigkill_only, info); + BUG_ON(signr != SIGKILL); + *return_ka = task->sighand->action[SIGKILL - 1]; + return signr; + } + + if (unlikely(report.result & UTRACE_SIGNAL_HOLD)) { + push_back_signal(task, info); + spin_unlock_irq(&task->sighand->siglock); + } + + return -1; + } + + /* + * Complete the bookkeeping after the report. + * This sets utrace->report if UTRACE_STOP was used. + */ + finish_report(&report, task, utrace); + + return_ka->sa.sa_handler = SIG_DFL; + + if (unlikely(report.result & UTRACE_SIGNAL_HOLD)) + push_back_signal(task, info); + else + spin_lock_irq(&task->sighand->siglock); + + if (sig_kernel_stop(signr)) + task->signal->flags |= SIGNAL_STOP_DEQUEUED; + + return signr; +} + +/* + * This gets called after a signal handler has been set up. + * We set a flag so the next report knows it happened. + * If we're already stepping, make sure we do a report_signal. + * If not, make sure we get into utrace_resume() where we can + * clear the signal_handler flag before resuming. + */ +void utrace_signal_handler(struct task_struct *task, int stepping) +{ + struct utrace *utrace = task_utrace_struct(task); + + spin_lock(&utrace->lock); + + utrace->signal_handler = 1; + if (stepping) { + utrace->interrupt = 1; + set_tsk_thread_flag(task, TIF_SIGPENDING); + } else { + set_tsk_thread_flag(task, TIF_NOTIFY_RESUME); + } + + spin_unlock(&utrace->lock); +} + +/** + * utrace_prepare_examine - prepare to examine thread state + * @target: thread of interest, a &struct task_struct pointer + * @engine: engine pointer returned by utrace_attach_task() + * @exam: temporary state, a &struct utrace_examiner pointer + * + * This call prepares to safely examine the thread @target using + * &struct user_regset calls, or direct access to thread-synchronous fields. + * + * When @target is current, this call is superfluous. When @target is + * another thread, it must held stopped via %UTRACE_STOP by @engine. + * + * This call may block the caller until @target stays stopped, so it must + * be called only after the caller is sure @target is about to unschedule. + * This means a zero return from a utrace_control() call on @engine giving + * %UTRACE_STOP, or a report_quiesce() or report_signal() callback to + * @engine that used %UTRACE_STOP in its return value. + * + * Returns -%ESRCH if @target is dead or -%EINVAL if %UTRACE_STOP was + * not used. If @target has started running again despite %UTRACE_STOP + * (for %SIGKILL or a spurious wakeup), this call returns -%EAGAIN. + * + * When this call returns zero, it's safe to use &struct user_regset + * calls and task_user_regset_view() on @target and to examine some of + * its fields directly. When the examination is complete, a + * utrace_finish_examine() call must follow to check whether it was + * completed safely. + */ +int utrace_prepare_examine(struct task_struct *target, + struct utrace_engine *engine, + struct utrace_examiner *exam) +{ + int ret = 0; + + if (unlikely(target == current)) + return 0; + + rcu_read_lock(); + if (unlikely(!engine_wants_stop(engine))) + ret = -EINVAL; + else if (unlikely(target->exit_state)) + ret = -ESRCH; + else { + exam->state = target->state; + if (unlikely(exam->state == TASK_RUNNING)) + ret = -EAGAIN; + else + get_task_struct(target); + } + rcu_read_unlock(); + + if (likely(!ret)) { + exam->ncsw = wait_task_inactive(target, exam->state); + put_task_struct(target); + if (unlikely(!exam->ncsw)) + ret = -EAGAIN; + } + + return ret; +} +EXPORT_SYMBOL_GPL(utrace_prepare_examine); + +/** + * utrace_finish_examine - complete an examination of thread state + * @target: thread of interest, a &struct task_struct pointer + * @engine: engine pointer returned by utrace_attach_task() + * @exam: pointer passed to utrace_prepare_examine() call + * + * This call completes an examination on the thread @target begun by a + * paired utrace_prepare_examine() call with the same arguments that + * returned success (zero). + * + * When @target is current, this call is superfluous. When @target is + * another thread, this returns zero if @target has remained unscheduled + * since the paired utrace_prepare_examine() call returned zero. + * + * When this returns an error, any examination done since the paired + * utrace_prepare_examine() call is unreliable and the data extracted + * should be discarded. The error is -%EINVAL if @engine is not + * keeping @target stopped, or -%EAGAIN if @target woke up unexpectedly. + */ +int utrace_finish_examine(struct task_struct *target, + struct utrace_engine *engine, + struct utrace_examiner *exam) +{ + int ret = 0; + + if (unlikely(target == current)) + return 0; + + rcu_read_lock(); + if (unlikely(!engine_wants_stop(engine))) + ret = -EINVAL; + else if (unlikely(target->state != exam->state)) + ret = -EAGAIN; + else + get_task_struct(target); + rcu_read_unlock(); + + if (likely(!ret)) { + unsigned long ncsw = wait_task_inactive(target, exam->state); + if (unlikely(ncsw != exam->ncsw)) + ret = -EAGAIN; + put_task_struct(target); + } + + return ret; +} +EXPORT_SYMBOL_GPL(utrace_finish_examine); + +/* + * This is declared in linux/regset.h and defined in machine-dependent + * code. We put the export here to ensure no machine forgets it. + */ +EXPORT_SYMBOL_GPL(task_user_regset_view); + +/* + * Called with rcu_read_lock() held. + */ +void task_utrace_proc_status(struct seq_file *m, struct task_struct *p) +{ + struct utrace *utrace = &p->utrace; + seq_printf(m, "Utrace: %lx%s%s%s\n", + p->utrace_flags, + utrace->stopped ? " (stopped)" : "", + utrace->report ? " (report)" : "", + utrace->interrupt ? " (interrupt)" : ""); +} From roland at redhat.com Sat Mar 21 01:42:44 2009 From: roland at redhat.com (Roland McGrath) Date: Fri, 20 Mar 2009 18:42:44 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Roland McGrath's message of Friday, 20 March 2009 18:39:46 -0700 <20090321013946.890F4FC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> Message-ID: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> From: Frank Ch. Eigler This is v2 of the prototype utrace-ftrace interface. This code is based on Roland McGrath's utrace API, which provides programmatic hooks to the in-tree tracehook layer. This new patch interfaces many of those events to ftrace, as configured by a small number of debugfs controls. Here's the /debugfs/tracing/process_trace_README: process event tracer mini-HOWTO 1. Select process hierarchy to monitor. Other processes will be completely unaffected. Leave at 0 for system-wide tracing. % echo NNN > process_follow_pid 2. Determine which process event traces are potentially desired. syscall and signal tracing slow down monitored processes. % echo 0 > process_trace_{syscalls,signals,lifecycle} 3. Add any final uid- or taskcomm-based filtering. Non-matching processes will skip trace messages, but will still be slowed. % echo NNN > process_trace_uid_filter # -1: unrestricted % echo ls > process_trace_taskcomm_filter # empty: unrestricted 4. Start tracing. % echo process > current_tracer 5. Examine trace. % cat trace 6. Stop tracing. % echo nop > current_tracer Signed-off-by: Frank Ch. Eigler --- include/linux/processtrace.h | 41 +++ kernel/trace/Kconfig | 9 + kernel/trace/Makefile | 1 + kernel/trace/trace.h | 8 + kernel/trace/trace_process.c | 601 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 660 insertions(+), 0 deletions(-) diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h new file mode 100644 index ...f2b7d94 100644 --- /dev/null +++ b/include/linux/processtrace.h @@ -0,0 +1,41 @@ +#ifndef PROCESSTRACE_H +#define PROCESSTRACE_H + +#include +#include + +struct process_trace_entry { + unsigned char opcode; /* one of _UTRACE_EVENT_* */ + char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */ + union { + struct { + pid_t child; + unsigned long flags; + } trace_clone; + struct { + long code; + } trace_exit; + struct { + } trace_exec; + struct { + int si_signo; + int si_errno; + int si_code; + } trace_signal; + struct { + long callno; + unsigned long args[6]; + } trace_syscall_entry; + struct { + long rc; + long error; + } trace_syscall_exit; + }; +}; + +/* in kernel/trace/trace_process.c */ + +extern void enable_process_trace(void); +extern void disable_process_trace(void); + +#endif /* PROCESSTRACE_H */ diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 34e707e..8a92d6f 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -150,6 +150,15 @@ config CONTEXT_SWITCH_TRACER This tracer gets called from the context switch and records all switching of tasks. +config PROCESS_TRACER + bool "Trace process events via utrace" + depends on DEBUG_KERNEL + select TRACING + select UTRACE + help + This tracer provides trace records from process events + accessible to utrace: lifecycle, system calls, and signals. + config BOOT_TRACER bool "Trace boot initcalls" depends on DEBUG_KERNEL diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 349d5a9..a774db2 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -33,5 +33,6 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += t obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o obj-$(CONFIG_HW_BRANCH_TRACER) += trace_hw_branches.o obj-$(CONFIG_POWER_TRACER) += trace_power.o +obj-$(CONFIG_PROCESS_TRACER) += trace_process.o libftrace-y := ftrace.o diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4d3d381..c4d2e7f 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -30,6 +31,7 @@ enum trace_type { TRACE_USER_STACK, TRACE_HW_BRANCHES, TRACE_POWER, + TRACE_PROCESS, __TRACE_LAST_TYPE }; @@ -170,6 +172,11 @@ struct trace_power { struct power_trace state_data; }; +struct trace_process { + struct trace_entry ent; + struct process_trace_entry event; +}; + /* * trace_flag_type is an enumeration that holds different * states when a trace occurs. These are: @@ -280,6 +287,7 @@ extern void __ftrace_bad_type(void); TRACE_GRAPH_RET); \ IF_ASSIGN(var, ent, struct hw_branch_entry, TRACE_HW_BRANCHES);\ IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \ + IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \ __ftrace_bad_type(); \ } while (0) diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c new file mode 100644 index ...0820e56 100644 --- /dev/null +++ b/kernel/trace/trace_process.c @@ -0,0 +1,601 @@ +/* + * utrace-based process event tracing + * Copyright (C) 2009 Red Hat Inc. + * By Frank Ch. Eigler + * + * Based on mmio ftrace engine by Pekka Paalanen + * and utrace-syscall-tracing prototype by Ananth Mavinakayanahalli + */ + +/* #define DEBUG 1 */ + +#include +#include +#include +#include +#include + +#include "trace.h" + +/* A process must match these filters in order to be traced. */ +static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */ +static u32 trace_taskuid_filter = -1; /* -1: unrestricted */ +static u32 trace_lifecycle_p = 1; +static u32 trace_syscalls_p = 1; +static u32 trace_signals_p = 1; + +/* A process must be a direct child of given pid in order to be + followed. */ +static u32 process_follow_pid; /* 0: unrestricted/systemwide */ + +/* XXX: lock the above? */ + + +/* trace data collection */ + +static struct trace_array *process_trace_array; + +static void process_reset_data(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + tracing_reset_online_cpus(tr); +} + +static int process_trace_init(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + process_trace_array = tr; + process_reset_data(tr); + enable_process_trace(); + return 0; +} + +static void process_trace_reset(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + disable_process_trace(); + process_reset_data(tr); + process_trace_array = NULL; +} + +static void process_trace_start(struct trace_array *tr) +{ + pr_debug("in %s\n", __func__); + process_reset_data(tr); +} + +static void __trace_processtrace(struct trace_array *tr, + struct trace_array_cpu *data, + struct process_trace_entry *ent) +{ + struct ring_buffer_event *event; + struct trace_process *entry; + unsigned long irq_flags; + + event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry), + &irq_flags); + if (!event) + return; + entry = ring_buffer_event_data(event); + tracing_generic_entry_update(&entry->ent, 0, preempt_count()); + entry->ent.cpu = raw_smp_processor_id(); + entry->ent.type = TRACE_PROCESS; + strlcpy(ent->comm, current->comm, TASK_COMM_LEN); + entry->event = *ent; + ring_buffer_unlock_commit(tr->buffer, event, irq_flags); + + trace_wake_up(); +} + +void process_trace(struct process_trace_entry *ent) +{ + struct trace_array *tr = process_trace_array; + struct trace_array_cpu *data; + + preempt_disable(); + data = tr->data[smp_processor_id()]; + __trace_processtrace(tr, data, ent); + preempt_enable(); +} + + +/* trace data rendering */ + +static void process_pipe_open(struct trace_iterator *iter) +{ + struct trace_seq *s = &iter->seq; + pr_debug("in %s\n", __func__); + trace_seq_printf(s, "VERSION 200901\n"); +} + +static void process_close(struct trace_iterator *iter) +{ + iter->private = NULL; +} + +static ssize_t process_read(struct trace_iterator *iter, struct file *filp, + char __user *ubuf, size_t cnt, loff_t *ppos) +{ + ssize_t ret; + struct trace_seq *s = &iter->seq; + ret = trace_seq_to_user(s, ubuf, cnt); + return (ret == -EBUSY) ? 0 : ret; +} + +static enum print_line_t process_print(struct trace_iterator *iter) +{ + struct trace_entry *entry = iter->ent; + struct trace_process *field; + struct trace_seq *s = &iter->seq; + unsigned long long t = ns2usecs(iter->ts); + unsigned long usec_rem = do_div(t, 1000000ULL); + unsigned secs = (unsigned long)t; + int ret = 1; + + trace_assign_type(field, entry); + + /* XXX: If print_lat_fmt() were not static, we wouldn't have + to duplicate this. */ + trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ", + field->event.comm, + entry->pid, entry->cpu, + secs, + usec_rem); + + switch (field->event.opcode) { + case _UTRACE_EVENT_CLONE: + ret = trace_seq_printf(s, "fork %d flags 0x%lx\n", + field->event.trace_clone.child, + field->event.trace_clone.flags); + break; + case _UTRACE_EVENT_EXEC: + ret = trace_seq_printf(s, "exec\n"); + break; + case _UTRACE_EVENT_EXIT: + ret = trace_seq_printf(s, "exit %ld\n", + field->event.trace_exit.code); + break; + case _UTRACE_EVENT_SIGNAL: + ret = trace_seq_printf(s, "signal %d errno %d code 0x%x\n", + field->event.trace_signal.si_signo, + field->event.trace_signal.si_errno, + field->event.trace_signal.si_code); + break; + case _UTRACE_EVENT_SYSCALL_ENTRY: + ret = trace_seq_printf(s, "syscall %ld [0x%lx 0x%lx 0x%lx" + " 0x%lx 0x%lx 0x%lx]\n", + field->event.trace_syscall_entry.callno, + field->event.trace_syscall_entry.args[0], + field->event.trace_syscall_entry.args[1], + field->event.trace_syscall_entry.args[2], + field->event.trace_syscall_entry.args[3], + field->event.trace_syscall_entry.args[4], + field->event.trace_syscall_entry.args[5]); + break; + case _UTRACE_EVENT_SYSCALL_EXIT: + ret = trace_seq_printf(s, "syscall rc %ld error %ld\n", + field->event.trace_syscall_exit.rc, + field->event.trace_syscall_exit.error); + break; + default: + ret = trace_seq_printf(s, "process code %d?\n", + field->event.opcode); + break; + } + if (ret) + return TRACE_TYPE_HANDLED; + return TRACE_TYPE_HANDLED; +} + + +static enum print_line_t process_print_line(struct trace_iterator *iter) +{ + switch (iter->ent->type) { + case TRACE_PROCESS: + return process_print(iter); + default: + return TRACE_TYPE_HANDLED; /* ignore unknown entries */ + } +} + +static struct tracer process_tracer = { + .name = "process", + .init = process_trace_init, + .reset = process_trace_reset, + .start = process_trace_start, + .pipe_open = process_pipe_open, + .close = process_close, + .read = process_read, + .print_line = process_print_line, +}; + + + +/* utrace backend */ + +/* Should tracing apply to given task? Compare against filter + values. */ +static int trace_test(struct task_struct *tsk) +{ + if (trace_taskcomm_filter[0] + && strncmp(trace_taskcomm_filter, tsk->comm, TASK_COMM_LEN)) + return 0; + + if (trace_taskuid_filter != (u32)-1 + && trace_taskuid_filter != task_uid(tsk)) + return 0; + + return 1; +} + + +static const struct utrace_engine_ops process_trace_ops; + +static void process_trace_tryattach(struct task_struct *tsk) +{ + struct utrace_engine *engine; + + pr_debug("in %s\n", __func__); + engine = utrace_attach_task(tsk, + UTRACE_ATTACH_CREATE | + UTRACE_ATTACH_EXCLUSIVE, + &process_trace_ops, NULL); + if (IS_ERR(engine) || (engine == NULL)) { + pr_warning("utrace_attach_task %d (rc %p)\n", + tsk->pid, engine); + } else { + int rc; + + /* We always hook cost-free events. */ + unsigned long events = + UTRACE_EVENT(CLONE) | + UTRACE_EVENT(EXEC) | + UTRACE_EVENT(EXIT); + + /* Penalizing events are individually controlled, so that + utrace doesn't even take the monitored threads off their + fast paths, nor bother call our callbacks. */ + if (trace_syscalls_p) + events |= UTRACE_EVENT_SYSCALL; + if (trace_signals_p) + events |= UTRACE_EVENT_SIGNAL_ALL; + + rc = utrace_set_events(tsk, engine, events); + if (rc == -EINPROGRESS) + rc = utrace_barrier(tsk, engine); + if (rc) + pr_warning("utrace_set_events/barrier rc %d\n", rc); + + utrace_engine_put(engine); + pr_debug("attached in %s to %s(%d)\n", __func__, + tsk->comm, tsk->pid); + } +} + + +u32 process_trace_report_clone(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *parent, + unsigned long clone_flags, + struct task_struct *child) +{ + if (trace_lifecycle_p && trace_test(parent)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_CLONE; + ent.trace_clone.child = child->pid; + ent.trace_clone.flags = clone_flags; + process_trace(&ent); + } + + process_trace_tryattach(child); + + return UTRACE_RESUME; +} + + +u32 process_trace_report_syscall_entry(u32 action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs) +{ + if (trace_syscalls_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SYSCALL_ENTRY; + ent.trace_syscall_entry.callno = syscall_get_nr(task, regs); + syscall_get_arguments(task, regs, 0, 6, + ent.trace_syscall_entry.args); + process_trace(&ent); + } + + return UTRACE_RESUME; +} + + +u32 process_trace_report_syscall_exit(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs) +{ + if (trace_syscalls_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SYSCALL_EXIT; + ent.trace_syscall_exit.rc = + syscall_get_return_value(task, regs); + ent.trace_syscall_exit.error = syscall_get_error(task, regs); + process_trace(&ent); + } + + return UTRACE_RESUME; +} + + +u32 process_trace_report_exec(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs) +{ + if (trace_lifecycle_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXEC; + process_trace(&ent); + } + + /* We're already attached; no need for a new tryattach. */ + + return UTRACE_RESUME; +} + + +u32 process_trace_report_signal(u32 action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka) +{ + if (trace_signals_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_SIGNAL; + ent.trace_signal.si_signo = info->si_signo; + ent.trace_signal.si_errno = info->si_errno; + ent.trace_signal.si_code = info->si_code; + process_trace(&ent); + } + + /* We're already attached, so no need for a new tryattach. */ + + return UTRACE_RESUME | utrace_signal_action(action); +} + + +u32 process_trace_report_exit(enum utrace_resume_action action, + struct utrace_engine *engine, + struct task_struct *task, + long orig_code, long *code) +{ + if (trace_lifecycle_p && trace_test(task)) { + struct process_trace_entry ent; + ent.opcode = _UTRACE_EVENT_EXIT; + ent.trace_exit.code = orig_code; + process_trace(&ent); + } + + /* There is no need to explicitly attach or detach here. */ + + return UTRACE_RESUME; +} + + +void enable_process_trace() +{ + struct task_struct *grp, *tsk; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + /* Skip over kernel threads. */ + if (tsk->flags & PF_KTHREAD) + continue; + + if (process_follow_pid) { + if (tsk->tgid == process_follow_pid || + tsk->parent->tgid == process_follow_pid) + process_trace_tryattach(tsk); + } else { + process_trace_tryattach(tsk); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + +void disable_process_trace() +{ + struct utrace_engine *engine; + struct task_struct *grp, *tsk; + int rc; + + pr_debug("in %s\n", __func__); + rcu_read_lock(); + do_each_thread(grp, tsk) { + /* Find matching engine, if any. Returns -ENOENT for + unattached threads. */ + engine = utrace_attach_task(tsk, UTRACE_ATTACH_MATCH_OPS, + &process_trace_ops, 0); + if (IS_ERR(engine)) { + if (PTR_ERR(engine) != -ENOENT) + pr_warning("utrace_attach_task %d (rc %ld)\n", + tsk->pid, -PTR_ERR(engine)); + } else if (engine == NULL) { + pr_warning("utrace_attach_task %d (null engine)\n", + tsk->pid); + } else { + /* Found one of our own engines. Detach. */ + rc = utrace_control(tsk, engine, UTRACE_DETACH); + switch (rc) { + case 0: /* success */ + break; + case -ESRCH: /* REAP callback already begun */ + case -EALREADY: /* DEATH callback already begun */ + break; + default: + rc = -rc; + pr_warning("utrace_detach %d (rc %d)\n", + tsk->pid, rc); + break; + } + utrace_engine_put(engine); + pr_debug("detached in %s from %s(%d)\n", __func__, + tsk->comm, tsk->pid); + } + } while_each_thread(grp, tsk); + rcu_read_unlock(); +} + + +static const struct utrace_engine_ops process_trace_ops = { + .report_clone = process_trace_report_clone, + .report_exec = process_trace_report_exec, + .report_exit = process_trace_report_exit, + .report_signal = process_trace_report_signal, + .report_syscall_entry = process_trace_report_syscall_entry, + .report_syscall_exit = process_trace_report_syscall_exit, +}; + + + +/* control interfaces */ + + +static ssize_t +trace_taskcomm_filter_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return simple_read_from_buffer(ubuf, cnt, ppos, + trace_taskcomm_filter, TASK_COMM_LEN); +} + + +static ssize_t +trace_taskcomm_filter_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *fpos) +{ + char *end; + + if (cnt > TASK_COMM_LEN) + cnt = TASK_COMM_LEN; + + if (copy_from_user(trace_taskcomm_filter, ubuf, cnt)) + return -EFAULT; + + /* Cut from the first nil or newline. */ + trace_taskcomm_filter[cnt] = '\0'; + end = strchr(trace_taskcomm_filter, '\n'); + if (end) + *end = '\0'; + + *fpos += cnt; + return cnt; +} + + +static const struct file_operations trace_taskcomm_filter_fops = { + .open = tracing_open_generic, + .read = trace_taskcomm_filter_read, + .write = trace_taskcomm_filter_write, +}; + + + +static char README_text[] = + "process event tracer mini-HOWTO\n" + "\n" + "1. Select process hierarchy to monitor. Other processes will be\n" + " completely unaffected. Leave at 0 for system-wide tracing.\n" + "# echo NNN > process_follow_pid\n" + "\n" + "2. Determine which process event traces are potentially desired.\n" + " syscall and signal tracing slow down monitored processes.\n" + "# echo 0 > process_trace_{syscalls,signals,lifecycle}\n" + "\n" + "3. Add any final uid- or taskcomm-based filtering. Non-matching\n" + " processes will skip trace messages, but will still be slowed.\n" + "# echo NNN > process_trace_uid_filter # -1: unrestricted \n" + "# echo ls > process_trace_taskcomm_filter # empty: unrestricted\n" + "\n" + "4. Start tracing.\n" + "# echo process > current_tracer\n" + "\n" + "5. Examine trace.\n" + "# cat trace\n" + "\n" + "6. Stop tracing.\n" + "# echo nop > current_tracer\n" + ; + +static struct debugfs_blob_wrapper README_blob = { + .data = README_text, + .size = sizeof(README_text), +}; + + +static __init int init_process_trace(void) +{ + struct dentry *d_tracer; + struct dentry *entry; + + d_tracer = tracing_init_dentry(); + + entry = debugfs_create_blob("process_trace_README", 0444, d_tracer, + &README_blob); + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_README' entry\n"); + + /* Control for scoping process following. */ + entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer, + &process_follow_pid); + if (!entry) + pr_warning("Could not create debugfs " + "'process_follow_pid' entry\n"); + + /* Process-level filters */ + entry = debugfs_create_file("process_trace_taskcomm_filter", 0644, + d_tracer, NULL, + &trace_taskcomm_filter_fops); + /* XXX: it'd be nice to have a read/write debugfs_create_blob. */ + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_taskcomm_filter' entry\n"); + + entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer, + &trace_taskuid_filter); + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_uid_filter' entry\n"); + + /* Event-level filters. */ + entry = debugfs_create_u32("process_trace_lifecycle", 0644, d_tracer, + &trace_lifecycle_p); + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_lifecycle' entry\n"); + + entry = debugfs_create_u32("process_trace_syscalls", 0644, d_tracer, + &trace_syscalls_p); + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_syscalls' entry\n"); + + entry = debugfs_create_u32("process_trace_signals", 0644, d_tracer, + &trace_signals_p); + if (!entry) + pr_warning("Could not create debugfs " + "'process_trace_signals' entry\n"); + + return register_tracer(&process_tracer); +} + +device_initcall(init_process_trace); From mingo at elte.hu Sat Mar 21 07:43:01 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 21 Mar 2009 08:43:01 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> Message-ID: <20090321074301.GA19384@elte.hu> * Roland McGrath wrote: > From: Frank Ch. Eigler > > This is v2 of the prototype utrace-ftrace interface. This code is > based on Roland McGrath's utrace API, which provides programmatic > hooks to the in-tree tracehook layer. This new patch interfaces > many of those events to ftrace, as configured by a small number of > debugfs controls. Here's the > /debugfs/tracing/process_trace_README: Please submit changes/enhancements to kernel/trace/* to the tracing tree maintainers (Steve and me) for review, testing and integration. Please also post patches against the latest tracing tree: http://people.redhat.com/mingo/tip.git/README As this patch does not apply: Applying patch patches/utrace-based-ftrace-process-engine-v2.patch patching file include/linux/processtrace.h patching file kernel/trace/Kconfig Hunk #1 succeeded at 186 with fuzz 2 (offset 36 lines). patching file kernel/trace/Makefile Hunk #1 FAILED at 33. 1 out of 1 hunk FAILED -- rejects in file kernel/trace/Makefile patching file kernel/trace/trace.h Hunk #1 succeeded at 7 with fuzz 1. Hunk #2 FAILED at 31. Hunk #3 succeeded at 215 with fuzz 2 (offset 43 lines). Hunk #4 FAILED at 330. 2 out of 4 hunks FAILED -- rejects in file kernel/trace/trace.h patching file kernel/trace/trace_process.c Patch patches/utrace-based-ftrace-process-engine-v2.patch does not apply (enforce with -f) Thanks, Ingo From akpm at linux-foundation.org Sat Mar 21 08:39:12 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Sat, 21 Mar 2009 01:39:12 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321074301.GA19384@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> Message-ID: <20090321013912.ed6039c9.akpm@linux-foundation.org> On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar wrote: > > * Roland McGrath wrote: > > > From: Frank Ch. Eigler > > > > This is v2 of the prototype utrace-ftrace interface. This code is > > based on Roland McGrath's utrace API, which provides programmatic > > hooks to the in-tree tracehook layer. This new patch interfaces > > many of those events to ftrace, as configured by a small number of > > debugfs controls. Here's the > > /debugfs/tracing/process_trace_README: > > Please submit changes/enhancements to kernel/trace/* to the tracing > tree maintainers (Steve and me) for review, testing and integration. > > Please also post patches against the latest tracing tree: > > http://people.redhat.com/mingo/tip.git/README uhm, this patch depends on the (large) utrace patch, which is not kernel/trace material. > As this patch does not apply: > > Applying patch patches/utrace-based-ftrace-process-engine-v2.patch > patching file include/linux/processtrace.h > patching file kernel/trace/Kconfig > Hunk #1 succeeded at 186 with fuzz 2 (offset 36 lines). > patching file kernel/trace/Makefile > Hunk #1 FAILED at 33. > 1 out of 1 hunk FAILED -- rejects in file kernel/trace/Makefile > patching file kernel/trace/trace.h > Hunk #1 succeeded at 7 with fuzz 1. > Hunk #2 FAILED at 31. > Hunk #3 succeeded at 215 with fuzz 2 (offset 43 lines). > Hunk #4 FAILED at 330. > 2 out of 4 hunks FAILED -- rejects in file kernel/trace/trace.h > patching file kernel/trace/trace_process.c > Patch patches/utrace-based-ftrace-process-engine-v2.patch does not apply (enforce with -f) The rejects are trivial. From akpm at linux-foundation.org Sat Mar 21 08:49:09 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Sat, 21 Mar 2009 01:49:09 -0700 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> Message-ID: <20090321014909.6b654f55.akpm@linux-foundation.org> On Fri, 20 Mar 2009 18:41:40 -0700 (PDT) Roland McGrath wrote: > This adds the utrace facility, a new modular interface in the kernel for > implementing user thread tracing and debugging. This fits on top of the > tracehook_* layer, so the new code is well-isolated. > > The new interface is in and the DocBook utrace book > describes it. It allows for multiple separate tracing engines to work in > parallel without interfering with each other. Higher-level tracing > facilities can be implemented as loadable kernel modules using this layer. > > The new facility is made optional under CONFIG_UTRACE. > When this is not enabled, no new code is added. > It can only be enabled on machines that have all the > prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK. > > In this initial version, utrace and ptrace do not play together at all. > If ptrace is attached to a thread, the attach calls in the utrace kernel > API return -EBUSY. If utrace is attached to a thread, the PTRACE_ATTACH > or PTRACE_TRACEME request will return EBUSY to userland. The old ptrace > code is otherwise unchanged and nothing using ptrace should be affected > by this patch as long as utrace is not used at the same time. In the > future we can clean up the ptrace implementation and rework it to use > the utrace API. I'd be interested in seeing a bit of discussion regarding the overall value of utrace - it has been quite a while since it floated past. I assume that redoing ptrace to be a client of utrace _will_ happen, and that this is merely a cleanup exercise with no new user-visible features? The "prototype utrace-ftrace interface" seems to be more a cool toy rather than a serious new kernel feature (yes?) If so, what are the new killer utrace clients which would justify all these changes? Also, is it still the case that RH are shipping utrace? If so, for what reasons and what benefits are users seeing from it? And I recall that there were real problems wiring up the Feb 2007 version of utrace to the ARM architecture. Have those issues been resolved? Are any problems expected for any architectures? Thanks. From mingo at elte.hu Sat Mar 21 09:12:35 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 21 Mar 2009 10:12:35 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321013912.ed6039c9.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> Message-ID: <20090321091235.GA29678@elte.hu> * Andrew Morton wrote: > On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar wrote: > > > > > * Roland McGrath wrote: > > > > > From: Frank Ch. Eigler > > > > > > This is v2 of the prototype utrace-ftrace interface. This code is > > > based on Roland McGrath's utrace API, which provides programmatic > > > hooks to the in-tree tracehook layer. This new patch interfaces > > > many of those events to ftrace, as configured by a small number of > > > debugfs controls. Here's the > > > /debugfs/tracing/process_trace_README: > > > > Please submit changes/enhancements to kernel/trace/* to the tracing > > tree maintainers (Steve and me) for review, testing and integration. > > > > Please also post patches against the latest tracing tree: > > > > http://people.redhat.com/mingo/tip.git/README > > uhm, this patch depends on the (large) utrace patch, which is not > kernel/trace material. The thing is, utrace crashes in Fedora have dominated kerneloops.org for many months, so i'm not sure what to make of the idea of posting a 4000+ lines of core kernel code patchset on the last day of the development cycle, a posting that has carefully avoided the Cc:-ing of affected maintainers ;-) Utrace is very much tracing material - without the ftrace plugin the whole utrace machinery is just something that provides a _ton_ of hooks to something entirely external: SystemTap mainly. kernel/utrace.c should probably be introduced as kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending work in the tracing tree and cooperation would be nice and desired. The ftrace/utrace plugin is the only real connection utrace has to the mainline kernel, so proper review by the tracing folks and cooperation with the tracing folks is very much needed for the whole thing. Ingo From akpm at linux-foundation.org Sat Mar 21 11:19:54 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Sat, 21 Mar 2009 04:19:54 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321091235.GA29678@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> Message-ID: <20090321041954.72b99e69.akpm@linux-foundation.org> On Sat, 21 Mar 2009 10:12:35 +0100 Ingo Molnar wrote: > > * Andrew Morton wrote: > > > On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar wrote: > > > > > > > > * Roland McGrath wrote: > > > > > > > From: Frank Ch. Eigler > > > > > > > > This is v2 of the prototype utrace-ftrace interface. This code is > > > > based on Roland McGrath's utrace API, which provides programmatic > > > > hooks to the in-tree tracehook layer. This new patch interfaces > > > > many of those events to ftrace, as configured by a small number of > > > > debugfs controls. Here's the > > > > /debugfs/tracing/process_trace_README: > > > > > > Please submit changes/enhancements to kernel/trace/* to the tracing > > > tree maintainers (Steve and me) for review, testing and integration. > > > > > > Please also post patches against the latest tracing tree: > > > > > > http://people.redhat.com/mingo/tip.git/README > > > > uhm, this patch depends on the (large) utrace patch, which is not > > kernel/trace material. > > The thing is, utrace crashes in Fedora have dominated kerneloops.org > for many months, so i'm not sure what to make of the idea of posting > a 4000+ lines of core kernel code patchset on the last day of the > development cycle, a posting that has carefully avoided the Cc:-ing > of affected maintainers ;-) > > Utrace is very much tracing material - without the ftrace plugin the > whole utrace machinery is just something that provides a _ton_ of > hooks to something entirely external: SystemTap mainly. Roland's changelogs don't mention systemtap at all afacit. That was, umm, major information lossage. > kernel/utrace.c should probably be introduced as > kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending > work in the tracing tree and cooperation would be nice and desired. > > The ftrace/utrace plugin is the only real connection utrace has to > the mainline kernel, so proper review by the tracing folks and > cooperation with the tracing folks is very much needed for the whole > thing. Actually it seems that the whole utrace-ftrace thing is a big distraction and could/should just be omitted. This is a systemtap feature and should be viewed as such. This is all a bit weird. From fche at redhat.com Sat Mar 21 11:51:41 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 07:51:41 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321041954.72b99e69.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> Message-ID: <20090321115141.GA3566@redhat.com> Hi - On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: > [...] > > Utrace is very much tracing material - without the ftrace plugin the > > whole utrace machinery is just something that provides a _ton_ of > > hooks to something entirely external: SystemTap mainly. > > Roland's changelogs don't mention systemtap at all afacit. > That was, umm, major information lossage. There have been many mixed messages from LKML on the topic - sometimes mentioning systemtap is forbidden, other times necessary. Sorry about that. There are several non-systemtap clients in existence or under development. You've may have heard of the ptrace cleanup, a multi-client ptrace replacement, an on-the-fly core dumper, the ftrace widget, user-space probes. All of these should have somewhat compelling non-systemtap uses, if that's an important criterion. > Actually it seems that the whole utrace-ftrace thing is a big > distraction and could/should just be omitted. This is a systemtap > feature and should be viewed as such. [...] utrace is a better way to perform user thread management than what is there now, and the utrace-ftrace widget shows how to *hook* thread events such as syscalls in a lighter weight / more managed way than the first one proposed. (That's one reason we've been participating in the ftrace discussions.) Of course it can be made to use the fine syscall pretty-printing code recently added. - FChE From akpm at linux-foundation.org Sat Mar 21 12:04:22 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Sat, 21 Mar 2009 05:04:22 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321115141.GA3566@redhat.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> Message-ID: <20090321050422.d1d99eec.akpm@linux-foundation.org> On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" wrote: > Hi - > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: > > [...] > > > Utrace is very much tracing material - without the ftrace plugin the > > > whole utrace machinery is just something that provides a _ton_ of > > > hooks to something entirely external: SystemTap mainly. > > > > Roland's changelogs don't mention systemtap at all afacit. > > That was, umm, major information lossage. > > There have been many mixed messages from LKML on the topic - sometimes > mentioning systemtap is forbidden, other times necessary. Sorry about > that. heh. We all love systemtap and want it to get better. > There are several non-systemtap clients in existence or under > development. You've may have heard of the ptrace cleanup, a > multi-client ptrace replacement, an on-the-fly core dumper, the ftrace > widget, user-space probes. All of these should have somewhat > compelling non-systemtap uses, if that's an important criterion. Well I dunno. You guys are closer to this than I am, but I'd have thought that systemtap is the main game here, and most/all of the above is just fluff. IOW, "this helps systemtap" is sufficient reason for merging a kernel change. For sufficiently large values of "help", and sufficiently small values of "eww", of course. I have strong memories of being traumatised by reading the uprobes code. What's the story on all of that nowadays? > > > Actually it seems that the whole utrace-ftrace thing is a big > > distraction and could/should just be omitted. This is a systemtap > > feature and should be viewed as such. [...] > > utrace is a better way to perform user thread management than what is > there now, and the utrace-ftrace widget shows how to *hook* thread > events such as syscalls in a lighter weight / more managed way than > the first one proposed. (That's one reason we've been participating > in the ftrace discussions.) Of course it can be made to use the fine > syscall pretty-printing code recently added. eh. Boring. Let's fix systemtap? From fche at redhat.com Sat Mar 21 12:57:06 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 08:57:06 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> Message-ID: <20090321125706.GB3566@redhat.com> Hi - On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: > [...] > > There have been many mixed messages from LKML on the topic - sometimes > > mentioning systemtap is forbidden, other times necessary. Sorry about > > that. > > heh. We all love systemtap and want it to get better. Great! > [...] > I have strong memories of being traumatised by reading the uprobes > code. What's the story on all of that nowadays? uprobes, being a layer upon utrace that provides a kprobes-like breakpointing API for user threads, is being refactored into several parts. I don't know about the aesthetics of it all, but I believe the general future plan is this: One piece would perform machine code analysis (to classify instructions for ideal/safe placement of breakpoints or for code patching), and another thin layer that uses this and utrace to manage user-space breakpoints. (Systemtap would interface at this point.) Then a user-space syscallish interface could come along to expose this to a super-ptrace client (to speed up gdb; perhaps to allow multiple debuggers). Plus one might as well add an ftrace-engine for it (directly analogous to the recent kprobe-based one that ftrace people found "cool".) > > > Actually it seems that the whole utrace-ftrace thing is a big > > > distraction and could/should just be omitted. This is a systemtap > > > feature and should be viewed as such. [...] > > > > utrace is a better way to perform user thread management than what is > > there now, and the utrace-ftrace widget shows how to *hook* thread > > events such as syscalls in a lighter weight / more managed way than > > the first one proposed. (That's one reason we've been participating > > in the ftrace discussions.) Of course it can be made to use the fine > > syscall pretty-printing code recently added. > > eh. Boring. Let's fix systemtap? There are several constituencies here, some of which find the above exciting. That's OK and we'd like to help them too. - FChE From renzo at cs.unibo.it Sat Mar 21 14:08:22 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Sat, 21 Mar 2009 15:08:22 +0100 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321014909.6b654f55.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> Message-ID: <20090321140822.GE18690@cs.unibo.it> Tracing does not mean only debug. Some tracing facilities can be used for virtualization. For example User-Mode Linux is based on ptrace. I have a prototype of kernel module for virtualization (kmview) based on utrace. Using kmview (module+VMM) it is possible for a user (not root) to mount a filesystem just for a process (or a hierarchy of processes), or it is possible for some processes to use different networking stacks or virtual devices. It is something like user-mode containers. kmview provides the same features of umview, based on ptrace, in a (very) faster way. (umview is in Debian lenny,squeeze,sid if you want to test it) *Utrace is really what I wanted* to support kmview (apart from some minor issues about the support of nested virtualizations). Other virtualizations now based on ptrace could move part of their implementation at kernel level by utrace and several speedups become possible. For example kmview is a partial virtual machine monitor: some system calls are forwarded to the kernel, some others virtualized. When a user mounts a filesystem, all the system calls which use pathnames inside the mountpoint subtree get virtualized while the others are forwarded to the kernel. With utrace the kmview kernel module handles many system calls at kernel level. I mean, if an "open" system call was sent to the kernel because the path is outside the virtualized part of the file system, all the system calls on the same file descriptors can be forwarded to the kernel without any request to the VMM at user level. This is just one example of speedup, several others are possible. Other virtualizations like user-mode linux or fakeroot-ng could use utrace to speedup their virtualization, too. As far as I have seen, systemtap is a wonderful tool for debugging, expecially for kernel debugging but it has not been designed for virtualization. Ptrace provide a standard set of features and all the implementations of VMM must be in userland. Utrace provides the flexibility to split a VMM and move part of it to a kernel module. Utrace provides a unified interface to kernel modules for tracing/virtualization. kmview can be implemented as a client of utrace or by spreading code around the kernel and like kmview other virtualizations based on ptrace could need to move some of their logic to the kernel to speedup their execution. These VMMs will use utrace based modules instead of kernel patches. renzo On Sat, Mar 21, 2009 at 01:49:09AM -0700, Andrew Morton wrote: > I'd be interested in seeing a bit of discussion regarding the overall value > of utrace - it has been quite a while since it floated past. > > I assume that redoing ptrace to be a client of utrace _will_ happen, and > that this is merely a cleanup exercise with no new user-visible features? > > The "prototype utrace-ftrace interface" seems to be more a cool toy rather > than a serious new kernel feature (yes?) > > If so, what are the new killer utrace clients which would justify all these > changes? > > Also, is it still the case that RH are shipping utrace? If so, for what > reasons and what benefits are users seeing from it? > > And I recall that there were real problems wiring up the Feb 2007 version > of utrace to the ARM architecture. Have those issues been resolved? Are > any problems expected for any architectures? From mingo at elte.hu Sat Mar 21 14:34:57 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 21 Mar 2009 15:34:57 +0100 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321140822.GE18690@cs.unibo.it> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> <20090321140822.GE18690@cs.unibo.it> Message-ID: <20090321143457.GA24254@elte.hu> * Renzo Davoli wrote: > Tracing does not mean only debug. Some tracing facilities can be > used for virtualization. For example User-Mode Linux is based on > ptrace. > > I have a prototype of kernel module for virtualization (kmview) > based on utrace. [...] Hm, i cannot find the source code. Can it be downloaded from somewhere? Ingo From mingo at elte.hu Sat Mar 21 15:45:01 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 21 Mar 2009 16:45:01 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> Message-ID: <20090321154501.GA2707@elte.hu> * Andrew Morton wrote: > [...] Let's fix systemtap? Yes, it needs to be fixed. The main issue i see is that no kernel developer i work with on a daily basis uses SystemTap - and i work with a lot of people. Yes, i could perhaps name two or three people from lkml using it, but its average penetration amongst kernel folks is essentially zero. Was any critical analysis done why that penetration is so absymally low for a tool with such a promise and with years of availability, and what are the measures planned to address those problems? To me personally there are two big direct usability issues with SystemTap: 1) It relies on DEBUG_INFO for any reasonable level of utility. Yes, it will limp along otherwise as well, but most of the actual novel capabilities depend on debuginfo. Which is an acceptable constraint for enterprise usage where kernels are switched every few months and having a debuginfo package is not a big issue. Not acceptable for upstream kernel development. It also puts way too trust into the compiler generating 1GB+ of debuginfo correctly. I want to be able to rely on tools all the time and thus i want tools to have some really simple and predictable foundations. 2) It's not upstream and folks using it seem to insist on not having it upstream ;-) This 'distance' to upstream seems to have grown during the past few years - instead of shrinking. As a result it simply does not matter and there's no know-how and no visibility of it upstream. If these fundamental problems are addressed then i'd even argue for the totality of SystemTap to be aimed upstreamed (including the scripting language, etc.), because for something this fundamental there's just no good reason not to have a turn-key solution there. Plus then there should be a (steadily growing) library of utility scripts in the kernel proper as well. Anything less does not make much sense IMO. Having a separate tool will reduce efficiency, increases the latency of fixes and enhancements and creates ABI-like expectations - which are all counter-productive to good instrumentation. These are the aspects of SystemTap that i have to say were never done right, and these are the aspects of SystemTap that need to change most. Putting utrace upstream now will just make it more convenient to have SystemTap as a separate entity - without any of the benefits. Do we want to do that? Maybe, but we could do better i think. Ingo From renzo at cs.unibo.it Sat Mar 21 16:37:00 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Sat, 21 Mar 2009 17:37:00 +0100 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321143457.GA24254@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> <20090321140822.GE18690@cs.unibo.it> <20090321143457.GA24254@elte.hu> Message-ID: <20090321163700.GA22292@cs.unibo.it> On Sat, Mar 21, 2009 at 03:34:57PM +0100, Ingo Molnar wrote: > > * Renzo Davoli wrote: > > > Tracing does not mean only debug. Some tracing facilities can be > > used for virtualization. For example User-Mode Linux is based on > > ptrace. > > > > I have a prototype of kernel module for virtualization (kmview) > > based on utrace. [...] > > Hm, i cannot find the source code. Can it be downloaded from > somewhere? Sure! kmview is not included in our Debian packages yet as it relies on (still) non mainstream features (utrace), but the code is available on our view-os svn repository. Check out: svn co https://view-os.svn.sourceforge.net/svnroot/view-os view-os More specifically to browse the code/specifications: The kmview device protocol is here: http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications The kernel module itself is here: http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/ The VMM userland application share most of the code with umview, the source code for both is here: http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/xmview-os/xmview/ kmview kernel module (current version) needs the following patches: utrace http://www.mail-archive.com/utrace-devel at redhat.com/msg00654.html http://www.mail-archive.com/utrace-devel at redhat.com/msg00655.html I am trying to keep everything up to date, but the whole stuff is evolving in a quite fast way. Everything has been released under GPLv2. renzo From mingo at elte.hu Sat Mar 21 16:44:31 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 21 Mar 2009 17:44:31 +0100 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321163700.GA22292@cs.unibo.it> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> <20090321140822.GE18690@cs.unibo.it> <20090321143457.GA24254@elte.hu> <20090321163700.GA22292@cs.unibo.it> Message-ID: <20090321164431.GK11183@elte.hu> * Renzo Davoli wrote: > On Sat, Mar 21, 2009 at 03:34:57PM +0100, Ingo Molnar wrote: > > > > * Renzo Davoli wrote: > > > > > Tracing does not mean only debug. Some tracing facilities can be > > > used for virtualization. For example User-Mode Linux is based on > > > ptrace. > > > > > > I have a prototype of kernel module for virtualization (kmview) > > > based on utrace. [...] > > > > Hm, i cannot find the source code. Can it be downloaded from > > somewhere? > > Sure! kmview is not included in our Debian packages yet as it > relies on (still) non mainstream features (utrace), but the code > is available on our view-os svn repository. > > Check out: > svn co https://view-os.svn.sourceforge.net/svnroot/view-os view-os > > More specifically to browse the code/specifications: > The kmview device protocol is here: > http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications > The kernel module itself is here: > http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/ Looks really interesting. That's btw. what i see as the biggest value of utrace: it's a comprehesive, all-encompassing framework all around process state events and process state manipulation. Utrace came from Frysk (generic debugger), but the fact that you were able to build a completely unanticipated usecase and virtualization module on top of it is a very good sign of a robust and complete design. I'm impressed. Ingo From troma at villacaritas.edu.pe Sat Mar 21 19:03:29 2009 From: troma at villacaritas.edu.pe (Tybalt) Date: Sat, 21 Mar 2009 21:03:29 +0200 Subject: Are you all right? Message-ID: <20090321210329.3040902@villacaritas.edu.pe> Are you in the city now? http://liatyf.themostrateblog.com/save.php From diegocg at gmail.com Sat Mar 21 20:35:21 2009 From: diegocg at gmail.com (Diego Calleja) Date: Sat, 21 Mar 2009 21:35:21 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321154501.GA2707@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> Message-ID: <200903212135.21457.diegocg@gmail.com> On S?bado 21 Marzo 2009 16:45:01 Ingo Molnar escribi?: > The main issue i see is that no kernel developer i work with on a > daily basis uses SystemTap - and i work with a lot of people. Yes, i > could perhaps name two or three people from lkml using it, but its > average penetration amongst kernel folks is essentially zero. What about userspace developers? People always talks of systemtap as a kernel thing, but my (humble) impression is that kernel hackers don't seem to need it that much (maybe for the same reasons they didn't a kernel debugger ;), but userspace developers do. There're many userspace projects that offer optional compile options to enable dtrace probes (some people like apple even ship executables of python, perl and ruby with probes by default). There're several firefox hackers that switched to dtrace-capable systems just because the dtrace-javascript probes enabled them to debug javashit code in ways they weren't able in linux or windows. In my humble opinion a better development environment for linux userspace programmers is way more important than whether kernel hackers like systemtap or not. So maybe the discussion should be less about "does it help kernel hackers?" and more about "does it help userspace hackers?". My 2?... From akpm at linux-foundation.org Sat Mar 21 21:34:13 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Sat, 21 Mar 2009 14:34:13 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321154501.GA2707@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> Message-ID: <20090321143413.75ead1aa.akpm@linux-foundation.org> On Sat, 21 Mar 2009 16:45:01 +0100 Ingo Molnar wrote: > > [...] > useful, thanks. > Putting utrace upstream now will just make it more > convenient to have SystemTap as a separate entity - without any of > the benefits. Do we want to do that? Maybe, but we could do better i > think. It would not be good to merge a large kernel feature which kernel developers and testers cannot test, and regression test. If testing utrace against its main application requires installation of a complete enterprise distro from a distro which the particular developer might not prefer to use then that's quite a problem. So it is desirable for this reason (and, I suspect, for other reasons) that systemtap (or a part thereof) be dragged out in some standalone form which is usable by random mortals. IOW: I agree. From fche at redhat.com Sat Mar 21 21:48:52 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 17:48:52 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321154501.GA2707@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> Message-ID: <20090321214852.GA5262@redhat.com> Hi - On Sat, Mar 21, 2009 at 04:45:01PM +0100, Ingo Molnar wrote: > [...] > To me personally there are two big direct usability issues with > SystemTap: > > 1) It relies on DEBUG_INFO for any reasonable level of utility. > Yes, it will limp along otherwise as well, but most of the > actual novel capabilities depend on debuginfo. Which is an > acceptable constraint for enterprise usage where kernels are > switched every few months and having a debuginfo package is not > a big issue. Not acceptable for upstream kernel development. In my own limited kernel-building experience, I find the debuginfo data conveniently and instantly available after every "make". Can you elaborate how is it harder for you to incidentally make it than for someone to download it? > It also puts way too trust into the compiler generating 1GB+ of > debuginfo correctly. I want to be able to rely on tools all the > time and thus i want tools to have some really simple and > predictable foundations. Well, the data has to come from *somewhere*. We know several shortcomings (and have staff working on gcc debuginfo improvements), but there is little alternative. If not from the compiler, where are you going to get detailed type/structure layouts? Stack slot to variable mappings? Statement-level PC addresses? Unwind data? > 2) It's not upstream and folks using it seem to insist on not > having it upstream ;-) This 'distance' to upstream seems to have > grown during the past few years - instead of shrinking. [...] Considering our upstream-bound assistance with foundation technologies like markers, tracepoints, kprobes, utrace, and several other bits, this does not seem entirely fair. > If these fundamental problems are addressed then i'd even argue for > the totality of SystemTap to be aimed upstreamed (including the > scripting language, etc.), [...] If consensus on this were plausible, we could seriously discuss it. But I don't buy the package-deal that utrace must not attempt merging on its own merits, just because it makes systemtap (as it is today) useful to more people. - FChE From fche at redhat.com Sat Mar 21 21:51:45 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 17:51:45 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321143413.75ead1aa.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> Message-ID: <20090321215145.GB5262@redhat.com> Hi - On Sat, Mar 21, 2009 at 02:34:13PM -0700, Andrew Morton wrote: > [...] > It would not be good to merge a large kernel feature which kernel > developers and testers cannot test, and regression test. It does not. Other kernel self-sufficient utrace clients are on their way, and of course one was just (re)posted. > If testing utrace against its main application requires installation > of a complete enterprise distro from a distro [...] This has *never* been a requirement. - FChE From torvalds at linux-foundation.org Sat Mar 21 22:02:59 2009 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sat, 21 Mar 2009 15:02:59 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321215145.GB5262@redhat.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> Message-ID: On Sat, 21 Mar 2009, Frank Ch. Eigler wrote: > > > If testing utrace against its main application requires installation > > of a complete enterprise distro from a distro [...] > > This has *never* been a requirement. You guys are getting off a tangent. Let's go back to the post that started this all. > The thing is, utrace crashes in Fedora have dominated kerneloops.org > for many months, so i'm not sure what to make of the idea of posting > a 4000+ lines of core kernel code patchset on the last day of the > development cycle, a posting that has carefully avoided the Cc:-ing > of affected maintainers ;-) .. and dammit, I agree 100%. If utrace really shows up in _any_ way on kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. I'm not going to take known-bad crap. It's that simple. Don't bother posting it, don't bother discussing it, don't bother making excuses for it. Linus From fche at redhat.com Sat Mar 21 22:20:30 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 18:20:30 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: References: <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> Message-ID: <20090321222030.GA5157@redhat.com> Hi - On Sat, Mar 21, 2009 at 03:02:59PM -0700, Linus Torvalds wrote: > [...] > > The thing is, utrace crashes in Fedora have dominated kerneloops.org > > for many months [...] > > .. and dammit, I agree 100%. If utrace really shows up in _any_ way on > kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. There was a short span of time during last fall, when Roland was on vacation. That bug (in 2.6.26.3) was fixed during the kernel summit. So this is a six-month obsolete grievance. - FChE From adobriyan at gmail.com Sat Mar 21 22:37:59 2009 From: adobriyan at gmail.com (Alexey Dobriyan) Date: Sun, 22 Mar 2009 01:37:59 +0300 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321222030.GA5157@redhat.com> References: <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090321222030.GA5157@redhat.com> Message-ID: <20090321223759.GA22770@x200.localdomain> On Sat, Mar 21, 2009 at 06:20:30PM -0400, Frank Ch. Eigler wrote: > On Sat, Mar 21, 2009 at 03:02:59PM -0700, Linus Torvalds wrote: > > [...] > > > The thing is, utrace crashes in Fedora have dominated kerneloops.org > > > for many months [...] > > > > .. and dammit, I agree 100%. If utrace really shows up in _any_ way on > > kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. > > There was a short span of time during last fall, when Roland was on > vacation. That bug (in 2.6.26.3) was fixed during the kernel summit. > So this is a six-month obsolete grievance. struct task_struct::utrace became embedded struct. This is good and should remove quite a few of utrace bugs. Better late than never. However, "rewrite-ptrace-via-utrace" patch was omitted, so almost noone can easily see by how much situation improved. I see this patch was dropped in Fedora. Will ptrace(2) will be rewritten through utrace? From fche at redhat.com Sat Mar 21 23:38:39 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 21 Mar 2009 19:38:39 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321223759.GA22770@x200.localdomain> References: <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090321222030.GA5157@redhat.com> <20090321223759.GA22770@x200.localdomain> Message-ID: <20090321233839.GB5157@redhat.com> Hi - On Sun, Mar 22, 2009 at 01:37:59AM +0300, Alexey Dobriyan wrote: > [...] > struct task_struct::utrace became embedded struct. This is good and > should remove quite a few of utrace bugs. Better late than never. Yeah. > However, "rewrite-ptrace-via-utrace" patch was omitted, so almost > noone can easily see by how much situation improved. [...] Will > ptrace(2) will be rewritten through utrace? Yes, I believe that is Roland's intent. I believe it was separated from the current suite of patches for staging purposes, to merge the most solid code up first. The code is available from the utrace git tree in the utrace-ptrace branch. - FChE From mingo at elte.hu Sun Mar 22 10:25:34 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 22 Mar 2009 11:25:34 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321233839.GB5157@redhat.com> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090321222030.GA5157@redhat.com> <20090321223759.GA22770@x200.localdomain> <20090321233839.GB5157@redhat.com> Message-ID: <20090322102534.GC19826@elte.hu> * Frank Ch. Eigler wrote: > Hi - > > On Sun, Mar 22, 2009 at 01:37:59AM +0300, Alexey Dobriyan wrote: > > [...] > > struct task_struct::utrace became embedded struct. This is good and > > should remove quite a few of utrace bugs. Better late than never. > > Yeah. > > > However, "rewrite-ptrace-via-utrace" patch was omitted, so > > almost noone can easily see by how much situation improved. > > [...] Will ptrace(2) will be rewritten through utrace? > > Yes, I believe that is Roland's intent. I believe it was > separated from the current suite of patches for staging purposes, > to merge the most solid code up first. The code is available from > the utrace git tree in the utrace-ptrace branch. i think they should be submitted together. Here's the histogram of utrace bugs on kerneloops.org: 2.6.27.5 1 x 2.6.27.15 1 x 2.6.27.12 2 x 2.6.27-rc4 2 x 2.6.26.6 1 x 2.6.26.5 43 x 2.6.26.3 1102 x 2.6.26.2 2 x 2.6.26.1 3 x 2.6.26 1 x 2.6.25 3 x That peak in 2.6.26.3 is what i referred to. The latest F10 kernel rpm is kernel-2.6.27.12-170.2.5.fc10, and it does include the utrace-ptrace engine as well: # grep UTRACE /boot/config-2.6.27.19-170.2.35.fc10.i686 CONFIG_UTRACE=y CONFIG_UTRACE_PTRACE=y So the bug i referred to was fixed and the bug count has gone down - but still we have the utrace core submission here without any (tested) mainline kernel usage of the core code. My suggestion would be to: - submit the ptrace-on-utrace engine as well (with Oleg's signoff?) - perhaps also submit with a well-tested ftrace plugin that tries to utilize _all_ aspects of utrace and ftrace (and hence gives good and continuous burn-in testing via the ftrace bootup self-tests, etc.) ideally we want both, because: - tracing corner-case bugs tend to be found much faster than ptrace corner case bugs - partly because tracing is much more invasive when activated system-wide. - ptrace-over-utrace on the other hand utilizes utrace more deeply than passive tracing ever can. (for example UML does full, active virtualization via ptrace - this depth of functional utrace usage is not possible via a tracing plugin.) And i think the ptrace-via-utrace engine is actually fully ready, just perhaps it was not submitted out of caution to keep the logistics simple. So i do think we've still got a shot at merging it, in this merge window. Ingo From mingo at elte.hu Sun Mar 22 12:08:11 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 22 Mar 2009 13:08:11 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321214852.GA5262@redhat.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com> Message-ID: <20090322120811.GD19826@elte.hu> * Frank Ch. Eigler wrote: > Hi - > > On Sat, Mar 21, 2009 at 04:45:01PM +0100, Ingo Molnar wrote: > > [...] > > To me personally there are two big direct usability issues with > > SystemTap: > > > > 1) It relies on DEBUG_INFO for any reasonable level of utility. > > Yes, it will limp along otherwise as well, but most of the > > actual novel capabilities depend on debuginfo. Which is an > > acceptable constraint for enterprise usage where kernels are > > switched every few months and having a debuginfo package is not > > a big issue. Not acceptable for upstream kernel development. > > In my own limited kernel-building experience, I find the debuginfo > data conveniently and instantly available after every "make". Can > you elaborate how is it harder for you to incidentally make it > than for someone to download it? Four reasons: 1) I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, because it slows down the kernel build times significantly: without: 4343.31 user 416.39 system 6:09.97 elapsed 1286%CPU with: 4871.07 user 501.90 system 7:43.22 elapsed 1159 %CPU ( x86 allyesconfig. On an obscenely overpowered Nehalem box with 12 GB of RAM. ) 2) When the kernel build becomes IO-bound, for example when i build over a distcc cluster (which is how i generally build my kernels) - or when others with less RAM build a debuginfo kernel, the ratio becomes even worse: without: 870.36 user 292.79 system 3:32.10 elapsed 548% CPU with: 929.65 user 384.55 system 8:28.70 elapsed 258% CPU 3) Another metric. Here's an x86 defconfig (i.e. fairly regular config - not allyesconfig) build's size: with: 1645 MB without: 211 MB Try to build 1.6 GB of dirty data on ext3 and run into an fsync() in your editor ... you'll sit there twiddling thumbs for a minute or more. 4) Or yet another metric - Linux distro package overhead. Try installing a debuginfo package: # yum install kernel-debuginfo ========================================== Package Arch Version ========================================== Installing: kernel-debuginfo x86_64 2.6.29-0.258.rc8.git2.fc11 rawhide-debuginfo 294 M Installing for dependencies: kernel-debuginfo-common x86_64 2.6.29-0.258.rc8.git2.fc11 rawhide-debuginfo 35 M Total download size: 329 M That size of a _compressed_ debuginfo kernel package is obscene. We can fit 4 years of full Linux kernel Git history into that size - 60,000+ commits, full metadata and full 20 million lines of code flux included! Uncompressed it blows up to gigabytes of on-disk data. And this download has to be repeated for _every_ minor kernel upgrade. So when i come into a situation where i could use some debugging help ... i'd have to rebuild the kernel with DEBUG_INFO=y and i'll always notice when i have a debuginfo kernel because it's inconvenient. The solution?) Dunno - but i definitely think we should think bigger: The fundamental disconnect i believe seems to come from the fact that most user-space projects are relatively small, so debuginfo bloat is a secondary issue there. But for a project with the size of the kernel, even for moderate builds (not allyesconfig), it's a _much_ bigger deal. This has been known for a long time and the situation has become worse over the last two years, not better. (last time i checked the debuginfo package overhead it was below 150 MB) A few random ideas: Instead of trying to cache 2+GB of debuginfo for a 50 MB kernel source repo (+50 MB of genuine .o output) - just to be able to debug one or two source files [which is the typical scope of a debugging session], why not build debuginfo on the fly, when a debugging session requires it? Rarely do we need debuginfo for more than a fraction of the whole kernel. ( Yes, it needs a few smarts like knowing the SHA1 of the source code module that a particular kernel portion got built with, to make sure the debuginfo is fresh and relevant - but nothing major. ) I mean, lets _use_ the fact that we have source code available, more intelligently. It takes zero time to build detailed debuginfo for a portion of a tree. If 'download debuginfo' can be replaced with: 'have a recent Git repository of the distro kernel source', we'll have a _much_ more efficient use of resources all around. Ingo From mingo at elte.hu Sun Mar 22 12:17:48 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 22 Mar 2009 13:17:48 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <200903212135.21457.diegocg@gmail.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <200903212135.21457.diegocg@gmail.com> Message-ID: <20090322121748.GE19826@elte.hu> * Diego Calleja wrote: > On S?bado 21 Marzo 2009 16:45:01 Ingo Molnar escribi?: > > > The main issue i see is that no kernel developer i work with on a > > daily basis uses SystemTap - and i work with a lot of people. Yes, i > > could perhaps name two or three people from lkml using it, but its > > average penetration amongst kernel folks is essentially zero. > > What about userspace developers? People always talks of systemtap > as a kernel thing, but my (humble) impression is that kernel > hackers don't seem to need it that much (maybe for the same > reasons they didn't a kernel debugger ;), but userspace developers > do. There're many userspace projects that offer optional compile > options to enable dtrace probes (some people like apple even ship > executables of python, perl and ruby with probes by default). > There're several firefox hackers that switched to dtrace-capable > systems just because the dtrace-javascript probes enabled them to > debug javashit code in ways they weren't able in linux or windows. > > In my humble opinion a better development environment for linux > userspace programmers is way more important than whether kernel > hackers like systemtap or not. So maybe the discussion should be > less about "does it help kernel hackers?" and more about "does it > help userspace hackers?". My 2?... Well, i consider kernel development to be just another form of software development, so i dont subscribe to the view that it is intrinsically different. (Yes, the kernel has many unique aspects - but most software projects have unique aspects.) In terms of development methodology and tools, in fact i claim that the kernel workflow and style of development can be applied to most user-space software projects with great success. So ... if a new development tool is apparently not (yet?) suited to a very large and sanely developed software project like the Linux kernel, i dont take that as an encouraging sign. Also, there's practical aspects: the kernel is what we know best so if we can make it work well for the kernel, hopes are that other large projects can use it too. If we _only_ make the tool good for non-kernel purposes, who else will fix it for the kernel? The icentive to fix it for the kernel will be significantly lower. Ingo From mingo at elte.hu Sun Mar 22 12:37:49 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 22 Mar 2009 13:37:49 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: References: <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> Message-ID: <20090322123749.GF19826@elte.hu> * Linus Torvalds wrote: > On Sat, 21 Mar 2009, Frank Ch. Eigler wrote: > > > > > If testing utrace against its main application requires installation > > > of a complete enterprise distro from a distro [...] > > > > This has *never* been a requirement. > > You guys are getting off a tangent. > > Let's go back to the post that started this all. > > > The thing is, utrace crashes in Fedora have dominated kerneloops.org > > for many months, so i'm not sure what to make of the idea of posting > > a 4000+ lines of core kernel code patchset on the last day of the > > development cycle, a posting that has carefully avoided the Cc:-ing > > of affected maintainers ;-) > > .. and dammit, I agree 100%. If utrace really shows up in _any_ > way on kerneloops.org, then I think THE ENTIRE DISCUSSION in this > thread is moot. > > I'm not going to take known-bad crap. It's that simple. Don't > bother posting it, don't bother discussing it, don't bother making > excuses for it. The kerneloops stats on utrace crashes are way down currently, after that peak last fall. So i didnt want to suggest that it's known-broken now - i only wanted to point out that it's a known-risky area and that the submission of it should involve the affected maintainers/developers. Regarding current stability, Roland, Frank, is the utrace patch in latest (today's) Fedora rawhide: -rw-r--r-- 1 root root 176555 2009-01-08 05:42 linux-2.6-utrace.patch a bug fixed equivalent of the utrace bits that crashed in the 2.6.26.3 kernel? In that case it is certainly known-good. Or is it a slimmed-down version? The ptrace bits and signoffs from Oleg and Alexey would certainly help (me) in trusting it. (I've Cc:-ed Oleg and Alexey) The ftrace bits could certainly be staged to go in via the tracing tree (in .31 or so) after the utrace-core+ptrace bits went upstream. Ingo From mingo at elte.hu Sun Mar 22 12:53:20 2009 From: mingo at elte.hu (Ingo Molnar) Date: Sun, 22 Mar 2009 13:53:20 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090322120811.GD19826@elte.hu> References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com> <20090322120811.GD19826@elte.hu> Message-ID: <20090322125320.GA14171@elte.hu> * Ingo Molnar wrote: > Total download size: 329 M > > That size of a _compressed_ debuginfo kernel package is obscene. > We can fit 4 years of full Linux kernel Git history into that size > - 60,000+ commits, full metadata and full 20 million lines of code > flux included! > > Uncompressed it blows up to gigabytes of on-disk data. > > And this download has to be repeated for _every_ minor kernel > upgrade. Have to correct my memories about how many commits the kernel repo has: 132,019 commits. That massive history fits into 298 MB. (!) Ingo From mike.gordon at primus.ca Sun Mar 22 18:46:02 2009 From: mike.gordon at primus.ca (mike gordon) Date: Sun, 22 Mar 2009 13:46:02 -0500 Subject: Microsoft Customer Lists Message-ID: <200903221847.n2MIlLiW031509@mx3.redhat.com> We are pleased to announce the availability of the following Microsoft customer lists: Sharepoint Dynamics SQL Exchange Biztalk FRX CRM System Center Visual Studio VAR If you would like more information or a sample off any of our lists, please contact us at (905) 721-8456 or email us at repharm1 at aol.com, Also we have the following lists as well Below are just some of the lists available: ERP (ENTERPRISE RESOURCE PLANNING): Baan JD Edwards Lawson Made2Manage Mapics Marcam Oracle Peoplesoft SAP SSA E-BUSINESS APPLICATIONS: Ariba BMC BroadVision Commerce One Webtrends MIDDLEWARE/CONNECTIVITY/APP SERVERS/WEB SERVERS: Bea Systems Iona Unisys OPERATING SYSTEMS/HARDWARE/SOFTWARE: COMPAQ HP 3000 HP 9000 HP-UX IBM AS/400 IBM OS/390 Lotus Notes Microsoft Sun Microsystems DATABASE: DB2 FileMaker Informix Oracle SQL SybaseCRM (CUSTOMER RELATIONSHIP MANAGEMENT): Clarify E.piphany HNC Onyx Pivotal Siebel Vantive Xchange SUPPLY CHAIN: Agile i2 Technologies Manugistics QAD Webplan COMMUNICATIONS: Nortel Cisco 3com Siemens Alcatel Telecom Vars ASP?s CLECS ISP?s E-COMMERCE: Dot Com Directory Consultant Directory Software Directory EXECUTIVE DIRECTORIES: Chief Executive Officer Chief Financial Officer Chief Information Officer Engineering Human Resources Purchasing Sales/Marketing INDUSTRY SPECIFIC LISTS: Agriculture, Forestry and Fishing, Communications, Construction, Finance, Insurance and Real Estate, Manufacturing, Mining, Public Administration, Retail Trade, Services, Transportation, Utilities, Wholesale Trade -------------- next part -------------- An HTML attachment was scrubbed... URL: From smothered at takstsenter.com Sun Mar 22 19:08:16 2009 From: smothered at takstsenter.com (Rudder Clozza) Date: Sun, 22 Mar 2009 19:08:16 +0000 Subject: Staying manhood is a capital Message-ID: <49C68BFF.8920771@takstsenter.com> Stick your tool for hours Before. He had written to them of his intended there it would suit her exactly and it was her one fellow, dressed gaudily in expensive silks of revenge, cut off, with a couple of broadheaded impression that a flying spark from the dying. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at redhat.com Mon Mar 23 04:34:48 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 21:34:48 -0700 (PDT) Subject: [PATCH 2/3] utrace core In-Reply-To: Ingo Molnar's message of Saturday, 21 March 2009 17:44:31 +0100 <20090321164431.GK11183@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> <20090321140822.GE18690@cs.unibo.it> <20090321143457.GA24254@elte.hu> <20090321163700.GA22292@cs.unibo.it> <20090321164431.GK11183@elte.hu> Message-ID: <20090323043448.B07D1FC3AB@magilla.sf.frob.com> > That's btw. what i see as the biggest value of utrace: it's a > comprehesive, all-encompassing framework all around process state > events and process state manipulation. Me too! And while we're on the btw's, I want to let everyone know that Ingo is the one who came up with the name "utrace". I had only completely dismal ideas for names, and nothing but the philosophy, "For the love of God, anything but [a-z]trace!" So that's one tiny piece of the whole mess that you can't blame on me. (Yes, I do believe I would be killed if we changed it again now.) ;-) ;-) ;-) > Utrace came from Frysk (generic debugger), but the fact that you > were able to build a completely unanticipated usecase and > virtualization module on top of it is a very good sign of a robust > and complete design. I'm impressed. Um, thanks, I guess. The antecedents of your statement are not really accurate, but I'll take the consequent as a compliment! :-) In fact, utrace came from my experience of maintaining the old ptrace code. Nor was this particular use "completely unanticipated". I was not aware of Renzo or his work before he got in touch about making use of utrace. But my imagined list of vaporware always included "specialized engines for UML or other syscall-interception type things". (e.g. seccomp is trivial with no need for per-arch asm work.) I swear, a third of the people who ever came to me complaining about ptrace being so hard to work with were doing things that to me are all "syscall interception and/or tracking", whether for some security-minded purpose or something more virtualization-like. Surely for many of those cases, it was really the wrong way to solve the problem they were tackling. Seems it's just the next stop after someone talks you out of LD_PRELOAD. But who am I to say? It was quite clear that people really wanted easier ways to experiment with doing this sort of thing. That said, I certainly have always hoped for completely unanticipated uses. (I will readily admit to succumbing to "Build it and they will come" mentality. I'm sure flames about my deep character flaws, moral turpitude, and dubious lineage will follow. The history of my career will show that I was not striving for the appearance of cogent planning.) I hatched the essential design of utrace when I'd recently spent a whole lot of time fixing the innards of ptrace and a whole lot of time helping userland implementors of debuggers and the like figure out how to work with ptrace (and hearing their complaints about it). At the same time, the group I'm in (still) was contemplating both the implementation issues of a generic debugger, how to make it tractable to work up to far smarter debuggers, and also the design of what became systemtap. It was clear to me that this whole space of problems and potential features would be an open-ended area where different approaches would need to be hashed out, and that there would not be one "ptrace killer" feature that would be the right fit for all uses. It has long been clear that the threshold of effort was far too high for people to experiment and innovate in this area. Hence the plan to make a new platform that lowered that threshold at least closer to "pretty easy" from "intractable", staying about as simple as what both brings that threshold down enough and lets unrelated developments in these things coexist well on the system. Thanks, Roland From roland at redhat.com Mon Mar 23 04:35:20 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 21:35:20 -0700 (PDT) Subject: [PATCH 2/3] utrace core In-Reply-To: Andrew Morton's message of Saturday, 21 March 2009 01:49:09 -0700 <20090321014909.6b654f55.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> Message-ID: <20090323043520.0B447FC3AB@magilla.sf.frob.com> > I'd be interested in seeing a bit of discussion regarding the overall value > of utrace - it has been quite a while since it floated past. Me too! > I assume that redoing ptrace to be a client of utrace _will_ happen, and > that this is merely a cleanup exercise with no new user-visible features? Yes. It's my expectation that Oleg and I will do that clean-up in several small stages, in the not-too-distant future. I think more of that work has to do with making the ptrace data structures clean and sane than with utrace details. > The "prototype utrace-ftrace interface" seems to be more a cool toy rather > than a serious new kernel feature (yes?) I don't personally have any experience with either Frank's utrace-ftrace widget or with using any ftrace-based things to debug user programs. I would guess it is more of a demonstration than a tool people will be using in the long run. > If so, what are the new killer utrace clients which would justify all these > changes? I hope I can leave those examples to the people who will write them. utrace will be a failure if it only serves to underlie the things I want to implement or can think up. My intent is to open up this area of new feature generation to the people who are killer hackers, but have been daunted or turned off by the prospect of becoming killer ptrace innards hackers. (Only Oleg seems to have taken to that opportunity, or perhaps he expected to wind up doing it as little as I did.) > Also, is it still the case that RH are shipping utrace? If so, for what > reasons and what benefits are users seeing from it? Fedora Rawhide has this current code, yes. The people trying to develop new features using utrace certainly like having it there. (There really truly are people who like to build novel kernel modules without compiling their own kernels from scratch.) I won't try to speak for them or their users. > And I recall that there were real problems wiring up the Feb 2007 version > of utrace to the ARM architecture. Have those issues been resolved? Are > any problems expected for any architectures? That was a misimpression. There were never real problems for ARM, only misunderstandings. I'm sure the way I tried to stage the changes at that time contributed to those misunderstandings arising as they did. Since then, all the arch requirements have been distilled into the HAVE_ARCH_TRACEHOOK set that is already merged for several architectures. It is in the hands of each arch maintainer to update their code to meet the HAVE_ARCH_TRACEHOOK requirements (I'm glad to give advice when asked), and there is no porting work that is actually specific to utrace itself. (You just can't turn it on without HAVE_ARCH_TRACEHOOK.) Of course it is never all that unlikely that some bits of the generic code will get some new tweaks brought to light by making it work with a particular arch. To my knowledge, the strangest arch for cleaning up any of this stuff has always been ia64, and sparc second; those arch maintainers have already done the HAVE_ARCH_TRACEHOOK work. Thanks, Roland From roland at redhat.com Mon Mar 23 05:09:26 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 22:09:26 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Andrew Morton's message of Saturday, 21 March 2009 05:04:22 -0700 <20090321050422.d1d99eec.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> Message-ID: <20090323050926.1ED1EFC3AB@magilla.sf.frob.com> > Well I dunno. You guys are closer to this than I am, but I'd have thought > that systemtap is the main game here, and most/all of the above is just > fluff. That is certainly not true for me. It is true that I hear plenty from systemtap developers, users, and boosters wanting utrace to be merged. But all that "fluff" you dismiss out of hand is what I would really like to see become reality. Pretty much the only people who ever tell me they would hack on those things are the ones who say, "I'm looking forward to utrace getting merged in so I can try to write something." > eh. Boring. [...] Since it's boring to you, it must be so boring to everyone that they have no interest in a platform they can use to do exciting things with. Great. Silly me trying to enable collaboration to produce things less boring than I'm capable of myself. Clearly there is no need for any such thing. Sorry I'm so out of touch, but I just thought it was cool. Thanks, Roland From roland at redhat.com Mon Mar 23 05:20:50 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 22:20:50 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Frank Ch. Eigler's message of Saturday, 21 March 2009 19:38:39 -0400 <20090321233839.GB5157@redhat.com> References: <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090321222030.GA5157@redhat.com> <20090321223759.GA22770@x200.localdomain> <20090321233839.GB5157@redhat.com> Message-ID: <20090323052050.D3F61FC3AB@magilla.sf.frob.com> > Yes, I believe that is Roland's intent. I believe it was separated > from the current suite of patches for staging purposes, to merge the > most solid code up first. The code is available from the utrace git > tree in the utrace-ptrace branch. More than just "staging". The utrace-ptrace code there today is really not very nice to look at, and it's not ready for prime time. As has been mentioned, it is a "pure clean-up exercise". As such, it's not the top priority. It also didn't seem to me like much of an argument for merging utrace: "Look, more code and now it still does the same thing!" Instead, I figured we should merge utrace in a way that lets everybody beat on it for new features and hash out details, without immediate risk of regressions in ptrace (which are severely annoying to everyone when they happen). The proper clean-ups for ptrace can proceed in parallel with work using utrace for things that are actually new and interesting, and at least the first half of that clean-up work is orthogonal to utrace. This seems like the normal way that new optional CONFIG_FOOBAR features (marked EXPERIMENTAL, even) are handled. We don't jump over ourselves to make existing crucial code paths subject to a new subsystem that is getting its first rounds of shake-out. Thanks, Roland From roland at redhat.com Mon Mar 23 04:49:40 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 21:49:40 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Ingo Molnar's message of Saturday, 21 March 2009 10:12:35 +0100 <20090321091235.GA29678@elte.hu> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> Message-ID: <20090323044940.870ECFC3AB@magilla.sf.frob.com> > kernel/utrace.c should probably be introduced as > kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending > work in the tracing tree and cooperation would be nice and desired. Of course I would like to cooperate with everyone. And of course it does not really matter much where a source file lives. But IMHO utrace really does not fit in with the kernel/trace/ code much at all. Sure, its hooks can be used by tracer implementations that use CONFIG_TRACING stuff. But it is a general API about user thread state. It belongs in kernel/trace/ "naturally" far less than, say, kprobes. utrace will in future be used to implement userland features (ptrace et al) that are just aspects of the basics of what an operating system does: mediate userland for userland. Those uses will have nothing to do with "kernel tracing". Thanks, Roland From roland at redhat.com Mon Mar 23 05:33:23 2009 From: roland at redhat.com (Roland McGrath) Date: Sun, 22 Mar 2009 22:33:23 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Ingo Molnar's message of Sunday, 22 March 2009 11:25:34 +0100 <20090322102534.GC19826@elte.hu> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090321222030.GA5157@redhat.com> <20090321223759.GA22770@x200.localdomain> <20090321233839.GB5157@redhat.com> <20090322102534.GC19826@elte.hu> Message-ID: <20090323053323.2F28DFC3AB@magilla.sf.frob.com> > And i think the ptrace-via-utrace engine is actually fully ready, > just perhaps it was not submitted out of caution to keep the > logistics simple. That's not so. There is a clumsy prototype version. Much of the work to do it properly is really just plain ptrace clean-up and not specifically about using utrace. Oleg and I are ready to work on it as soon as our time is not monopolized by trying to get the core utrace code to be accepted. This ptrace work really buys nothing with immediate pay-off at all. It's a real shame if its lack keeps people from actually looking at utrace itself. (This has been a long conversation so far with zero discussion of the code.) A collaboration with focus on what new things can be built, rather than on reasons not to let the foundations be poured, would be a lovely thing. Thanks, Roland From mingo at elte.hu Mon Mar 23 06:34:56 2009 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 23 Mar 2009 07:34:56 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323044940.870ECFC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090323044940.870ECFC3AB@magilla.sf.frob.com> Message-ID: <20090323063456.GA7752@elte.hu> * Roland McGrath wrote: > > kernel/utrace.c should probably be introduced as > > kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending > > work in the tracing tree and cooperation would be nice and desired. > > Of course I would like to cooperate with everyone. And of course > it does not really matter much where a source file lives. But > IMHO utrace really does not fit in with the kernel/trace/ code > much at all. Sure, its hooks can be used by tracer > implementations that use CONFIG_TRACING stuff. But it is a > general API about user thread state. It belongs in kernel/trace/ > "naturally" far less than, say, kprobes. utrace will in future be > used to implement userland features (ptrace et al) that are just > aspects of the basics of what an operating system does: mediate > userland for userland. Those uses will have nothing to do with > "kernel tracing". But it is fitting if you think of kernel/trace/ as kernel/instrumentation/. The virtualization-alike uses for utrace are in essence using system call instrumentation callbacks to inject extra functionality into the system. That's possible not because it's primarily geared at doing that, but because the instrumentation callbacks are generic and complete enough. It's still correct to think of it as an instrumentation tool and to maintain it as such. That also makes it clear that none of these APIs are to be regarded permanent ABIs. Anyway ... placement is no big deal, and kernel/utrace.c is certainly a good way of avoiding the tracing tree ;-) Ingo From casadocampo at netcabo.pt Mon Mar 23 08:18:49 2009 From: casadocampo at netcabo.pt (=?iso-8859-1?Q?Casa=20do=20Campo?=) Date: Mon, 23 Mar 2009 04:18:49 -0400 Subject: Linda Quinta Message-ID: <20090323081846.ADF0AC2F.5FFC1CF0@127.0.0.1> MAIL ERROR -------------- next part -------------- An HTML attachment was scrubbed... URL: From dvlasenk at redhat.com Mon Mar 23 09:25:04 2009 From: dvlasenk at redhat.com (Denys Vlasenko) Date: Mon, 23 Mar 2009 10:25:04 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> Message-ID: <1237800304.3716.3.camel@localhost> On Fri, 2009-03-20 at 18:42 -0700, Roland McGrath wrote: > From: Frank Ch. Eigler > Here's the /debugfs/tracing/process_trace_README: > > process event tracer mini-HOWTO > > 1. Select process hierarchy to monitor. Other processes will be > completely unaffected. Leave at 0 for system-wide tracing. > % echo NNN > process_follow_pid > > 2. Determine which process event traces are potentially desired. > syscall and signal tracing slow down monitored processes. > % echo 0 > process_trace_{syscalls,signals,lifecycle} > > 3. Add any final uid- or taskcomm-based filtering. Non-matching > processes will skip trace messages, but will still be slowed. > % echo NNN > process_trace_uid_filter # -1: unrestricted > % echo ls > process_trace_taskcomm_filter # empty: unrestricted > > 4. Start tracing. > % echo process > current_tracer > > 5. Examine trace. > % cat trace > > 6. Stop tracing. > % echo nop > current_tracer > > Signed-off-by: Frank Ch. Eigler ... > +static char README_text[] = > + "process event tracer mini-HOWTO\n" > + "\n" > + "1. Select process hierarchy to monitor. Other processes will be\n" > + " completely unaffected. Leave at 0 for system-wide tracing.\n" > + "# echo NNN > process_follow_pid\n" > + "\n" > + "2. Determine which process event traces are potentially desired.\n" > + " syscall and signal tracing slow down monitored processes.\n" > + "# echo 0 > process_trace_{syscalls,signals,lifecycle}\n" > + "\n" > + "3. Add any final uid- or taskcomm-based filtering. Non-matching\n" > + " processes will skip trace messages, but will still be slowed.\n" > + "# echo NNN > process_trace_uid_filter # -1: unrestricted \n" > + "# echo ls > process_trace_taskcomm_filter # empty: unrestricted\n" > + "\n" > + "4. Start tracing.\n" > + "# echo process > current_tracer\n" > + "\n" > + "5. Examine trace.\n" > + "# cat trace\n" > + "\n" > + "6. Stop tracing.\n" > + "# echo nop > current_tracer\n" > + ; A HOWTO text in the kernel binary? Shouldn't it be in Documentation/* instead? But then, I am a well known miniaturization freak... -- vda From will.newton at gmail.com Mon Mar 23 10:57:11 2009 From: will.newton at gmail.com (Will Newton) Date: Mon, 23 Mar 2009 10:57:11 +0000 Subject: [PATCH 2/3] utrace core In-Reply-To: <20090321014909.6b654f55.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014140.AA4F5FC3AB@magilla.sf.frob.com> <20090321014909.6b654f55.akpm@linux-foundation.org> Message-ID: <87a5b0800903230357n3eedaac1u6c70c22fedea5ffc@mail.gmail.com> On Sat, Mar 21, 2009 at 8:49 AM, Andrew Morton wrote: > On Fri, 20 Mar 2009 18:41:40 -0700 (PDT) Roland McGrath wrote: > >> This adds the utrace facility, a new modular interface in the kernel for >> implementing user thread tracing and debugging. ?This fits on top of the >> tracehook_* layer, so the new code is well-isolated. >> >> The new interface is in and the DocBook utrace book >> describes it. ?It allows for multiple separate tracing engines to work in >> parallel without interfering with each other. ?Higher-level tracing >> facilities can be implemented as loadable kernel modules using this layer. >> >> The new facility is made optional under CONFIG_UTRACE. >> When this is not enabled, no new code is added. >> It can only be enabled on machines that have all the >> prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK. >> >> In this initial version, utrace and ptrace do not play together at all. >> If ptrace is attached to a thread, the attach calls in the utrace kernel >> API return -EBUSY. ?If utrace is attached to a thread, the PTRACE_ATTACH >> or PTRACE_TRACEME request will return EBUSY to userland. ?The old ptrace >> code is otherwise unchanged and nothing using ptrace should be affected >> by this patch as long as utrace is not used at the same time. ?In the >> future we can clean up the ptrace implementation and rework it to use >> the utrace API. > > I'd be interested in seeing a bit of discussion regarding the overall value > of utrace - it has been quite a while since it floated past. > > I assume that redoing ptrace to be a client of utrace _will_ happen, and > that this is merely a cleanup exercise with no new user-visible features? > > The "prototype utrace-ftrace interface" seems to be more a cool toy rather > than a serious new kernel feature (yes?) > > If so, what are the new killer utrace clients which would justify all these > changes? It looks like utrace could provide a nice way to do low latency tracing of userspace processes via a hardware interface (e.g. JTAG or custom trace hardware). The only way to do that at present is to scatter bits of instrumentation throughout the kernel. I would like to see utrace merged so I can work on that type of feature. From adobriyan at gmail.com Mon Mar 23 13:48:13 2009 From: adobriyan at gmail.com (Alexey Dobriyan) Date: Mon, 23 Mar 2009 16:48:13 +0300 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090322123749.GF19826@elte.hu> References: <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> Message-ID: <20090323134813.GA18219@x200.localdomain> On Sun, Mar 22, 2009 at 01:37:49PM +0100, Ingo Molnar wrote: > The ptrace bits and signoffs from Oleg and Alexey would certainly > help (me) in trusting it. (I've Cc:-ed Oleg and Alexey) The further utrace stays away from mainline, the better. That's from my experience with this code. But let's see how ptrace(2) rewrite will look like because this is the biggest thing that matters. All those cool virtual machines, fancy tracers and what not aren't even comparable. Right now with ptrace(2) rewrite the following is easily triggerable: WARNING: at kernel/ptrace.c:515 ptrace_report_signal+0x2c1/0x2d0() Pid: 4814, comm: exe Not tainted 2.6.29-rc8-utrace #1 Call Trace: [] warn_slowpath+0x81/0xa0 [] ? validate_chain+0xe9/0x1150 [] ? __lock_acquire+0x246/0xa50 [] ? __delay+0x9/0x10 [] ? mark_held_locks+0x6b/0x80 [] ? _spin_unlock_irq+0x22/0x50 [] ptrace_report_signal+0x2c1/0x2d0 [] ? ptrace_report_signal+0x0/0x2d0 [] utrace_get_signal+0x1c9/0x660 [] get_signal_to_deliver+0x288/0x330 [] do_notify_resume+0xb9/0x890 [] ? cache_free_debugcheck+0x232/0x2f0 [] ? trace_hardirqs_off+0xb/0x10 [] ? _spin_unlock_irqrestore+0x39/0x70 [] ? sys_execve+0x40/0x60 [] ? kmem_cache_free+0x89/0xc0 [] ? trace_hardirqs_on_caller+0xfd/0x190 [] ? trace_hardirqs_on+0xb/0x10 [] work_notifysig+0x13/0x19 It looks like WARN_ON is just bogus, but who knows. From aoredor.aoredor at sapo.pt Mon Mar 23 14:14:39 2009 From: aoredor.aoredor at sapo.pt (aoredor.aoredor at sapo.pt) Date: Mon, 23 Mar 2009 10:14:39 -0400 Subject: =?iso-8859-1?q?Novidade!_SABER_COMANDAR_=28V=EDdeo+Book=29=2E_In?= =?iso-8859-1?q?strumento_de_mudan=E7a=2E?= Message-ID: <20090323141434.672D644F.E1C3590@192.168.1.100> MAIL ERROR -------------- next part -------------- An HTML attachment was scrubbed... URL: From fche at redhat.com Mon Mar 23 14:31:43 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 23 Mar 2009 10:31:43 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <1237800304.3716.3.camel@localhost> (Denys Vlasenko's message of "Mon, 23 Mar 2009 10:25:04 +0100") References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <1237800304.3716.3.camel@localhost> Message-ID: Denys Vlasenko writes: > [...] >> Here's the /debugfs/tracing/process_trace_README: >> process event tracer mini-HOWTO [...] > > A HOWTO text in the kernel binary? Shouldn't it be in > Documentation/* instead? [...] It parallels the debugfs/tracing/README file. - FChE From oleg at redhat.com Mon Mar 23 15:14:00 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Mon, 23 Mar 2009 16:14:00 +0100 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323134813.GA18219@x200.localdomain> References: <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> Message-ID: <20090323151400.GA3413@redhat.com> On 03/23, Alexey Dobriyan wrote: > > Right now with ptrace(2) rewrite the following is easily triggerable: > > WARNING: at kernel/ptrace.c:515 ptrace_report_signal+0x2c1/0x2d0() Yes, ptrace-over-utrace needs more work. But your message looks as if utrace core is buggy, imho this is a bit unfair ;) As Roland said, ptrace-over-utrace is not ready yet. If you mean that utrace core should not be merged alone - this is another story. But personally I understand why Roland sends utrace core before changing ptrace. Oleg. From shunt at recordsreduction.com Mon Mar 23 14:59:37 2009 From: shunt at recordsreduction.com (Shane Hunt) Date: Mon, 23 Mar 2009 07:59:37 -0700 Subject: Document Imaging/Scanning to eliminate paper problems Message-ID: <200903231524.n2NF8jRr023412@mx1.redhat.com> Records Reduction, Inc. has been providing document imaging/scanning services throughout the Southeast US since 1998. We provide following services: * File pickup * Prepping files - removing staples, unfolding paper, moving sticky notes, etc. * Scan files (saved to PDF or Tif) * Index documents for easy retrieval * OCRing available for full text searching * Images returned on disc or uploaded to web for retrieval * Shredding files And we provide these services for much less than the large, national companies! Benefits of Document Imaging/Scanning * Recover Valuable Office Space * Find any file within seconds * Eliminate Lost Files * Save money on costly file cabinets, paper, copying, filing time * Increase worker productivity Benefits of Outsourcing * You do not have to purchase and maintain expensive imaging equipment * You do not have to spend time prepping and scanning documents * Provide a backup CD for offsite storage * Proven quality process already in place * Experts in digital storage and retrieval * We'll do EVERYTHING for you - box the files, scan them, index them, etc. We make your life easier! * We have many real world examples proving we can scan cheaper than you can in house. It's basic Business 101. We buy the best software and scanners on the market. This gives us extreme efficiencies and speed - which means less money to you! * We require no commitment. If you don't like our services, quit using us. You lose nothing for trying! Please respond with your Name, Company Name & Address and we will send you a FREE Sample Imaging CD and Document Imaging Report. There are no strings attached to this offer. It's simply the most effective way to show you how you can save time, space & money using our document management services. Call or email to get more information, or to schedule an appointment. We will scan in a sample at no charge. Shane Hunt 704-724-3313 shunt at recordsreduction.com PO Box 3322, Matthews, NC 28106 http://app.streamsend.com/private/tF8d/2bm/cAm25g7/unsubscribe/3353212 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu.desnoyers at polymtl.ca Mon Mar 23 16:42:08 2009 From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers) Date: Mon, 23 Mar 2009 12:42:08 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090316221800.GE12974@redhat.com> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090316221800.GE12974@redhat.com> Message-ID: <20090323164208.GB22501@Krystal> * Frank Ch. Eigler (fche at redhat.com) wrote: > Hi - > > > On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote: > > > [...] > > > As far as I know, utrace supports multiple trace-engines on a process. > > > Since ptrace is just an engine of utrace, you can add another engine on utrace. > > > > > > utrace-+-ptrace_engine---owner_process > > > | > > > +-systemtap_module > > > | > > > +-ftrace_plugin > > Right. In this way, utrace is simply a multiplexing intermediary. > > > > > Here, Frank had posted an example of utrace->ftrace engine. > > > http://lkml.org/lkml/2009/1/27/294 > > > > > > And here is the latest his patch(which seems to support syscall tracing...) > > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124 > > > > > > > Reminder : we are looking at system-wide tracing here. Here are some > > comments about the current utrace implementation. > > > > Looking at include/linux/utrace.h from the tree > > > > 17 * A tracing engine starts by calling utrace_attach_task() or > > 18 * utrace_attach_pid() on the chosen thread, passing in a set of hooks > > 19 * (&struct utrace_engine_ops), and some associated data. This produces a > > 20 * &struct utrace_engine, which is the handle used for all other > > 21 * operations. An attached engine has its ops vector, its data, and an > > 22 * event mask controlled by utrace_set_events(). > > > > So if the system has, say 3000 threads, then we have 3000 struct > > utrace_engine created ? I wonder what effet this could have one > > cachelines if this is used to trace hot paths like system call > > entry/exit. Have you benchmarked this kind of scenario under tbench ? > > It has not been a problem, since utrace_engines are designed to be > lightweight. Starting or stopping a systemtap script of the form > > probe process.syscall {} > > appears to have no noticable impact on a tbench suite. > Do you mean starting this script for a single process or for _all_ the processes and threads running on the system ? > > > 24 * For each event bit that is set, that engine will get the > > 25 * appropriate ops->report_*() callback when the event occurs. The > > 26 * &struct utrace_engine_ops need not provide callbacks for an event > > 27 * unless the engine sets one of the associated event bits. > > > > Looking at utrace_set_events(), we seem to be limited to 32 events on a > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small? > > There are only a few types of thread events that involve different > classes of treatment, or different degrees of freedom in terms of > interference with the uninstrumented fast path of the threads. > > For example, it does not make sense to have different flag bits for > different system calls, since choosing to trace *any* system call > involves taking the thread off of the fast path with the TIF_ flag. > Once it's off the fast path, it doesn't matter whether the utrace core > or some client performs syscall discrimination, so it is left to the > client. > If we limit ourself to thread-interaction events, I agree that they are limited. But in the system-wide tracing scenario, the criterions for filtering can apply to many more event categories. Referring to Roland's reply, I think using utrace to enable system-wide collection of data would just be a waste of resources. Going through a more lightweight system-wide activation seems more appropriate to me. Utrace is still a very promising tool to trace process-specific activity though. Mathieu > > > 682 /** > > 683 * utrace_set_events_pid - choose which event reports a tracing engine gets > > 684 * @pid: thread to affect > > 685 * @engine: attached engine to affect > > 686 * @eventmask: new event mask > > 687 * > > 688 * This is the same as utrace_set_events(), but takes a &struct pid > > 689 * pointer rather than a &struct task_struct pointer. The caller must > > 690 * hold a ref on @pid, but does not need to worry about the task > > 691 * staying valid. If it's been reaped so that @pid points nowhere, > > 692 * then this call returns -%ESRCH. > > > > > > Comments like "but does not need to worry about the task staying valid" > > does not make me feel safe and comfortable at all, could you explain > > how you can assume that derefencing an "invalid" pointer will return > > NULL ? > > (We're doing a final round of "internal" (pre-LKML) reviews of the > utrace implementation right now on utrace-devel at redhat.com, where such > comments get fastest attention from the experts.) > > For this particular issue, the utrace documentation file explains the > liveness rules for the various pointers that can be fed to or received > from utrace functions. This is not about "feeling" safe, it's about > what the mechanism is deliberately designed to permit. > > > > About the utrace_attach_task() : > > > > 244 if (unlikely(target->flags & PF_KTHREAD)) > > 245 /* > > 246 * Silly kernel, utrace is for users! > > 247 */ > > 248 return ERR_PTR(-EPERM); > > > > So we cannot trace kernel threads ? > > I'm not quite sure about all the reasons for this, but I believe that > kernel threads don't tend to engage in job control / signal / > system-call activities the same way as normal user threads do. > > > > 118 /* > > 119 * Called without locks, when we might be the first utrace engine to attach. > > 120 * If this is a newborn thread and we are not the creator, we have to wait > > 121 * for it. The creator gets the first chance to attach. The PF_STARTING > > 122 * flag is cleared after its report_clone hook has had a chance to run. > > 123 */ > > 124 static inline int utrace_attach_delay(struct task_struct *target) > > 125 { > > 126 if ((target->flags & PF_STARTING) && target->real_parent != current) > > 127 do { > > 128 schedule_timeout_interruptible(1); > > 129 if (signal_pending(current)) > > 130 return -ERESTARTNOINTR; > > 131 } while (target->flags & PF_STARTING); > > 132 > > 133 return 0; > > 134 } > > > > Why do we absolutely have to poll until the thread has started ? > > (I don't know off the top of my head - Roland?) > > > > utrace_add_engine() > > set_notify_resume(target); > > > > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice > > that it is set asynchronously with the execution of the target thread > > (as I do with my TIF_KERNEL_TRACE thread flag). > > > > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in > > entry_64.S > > > > int_signal: > > and > > retint_signal: > > > > code paths. However, if there is no syscall tracing to do upon syscall > > entry, the thread flags are not re-read at syscall exit and you will > > miss the syscall exit returning from your target thread if this thread > > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in > > some subtle way I did not figure out ? BTW re-reading the TIF flags from > > the thread_info at syscall exit on the fast path is out of question > > because it considerably degrades the kernel performances. entry_*.S is > > a very, very critical path. > > (I don't know off the top of my head - Roland?) > > > - FChE -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 From fche at redhat.com Mon Mar 23 16:52:42 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 23 Mar 2009 12:52:42 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090323164208.GB22501@Krystal> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090316221800.GE12974@redhat.com> <20090323164208.GB22501@Krystal> Message-ID: <20090323165242.GB18774@redhat.com> Hi - On Mon, Mar 23, 2009 at 12:42:08PM -0400, Mathieu Desnoyers wrote: > [...] (Please trim emails you're responding to.) > [...] > > > So if the system has, say 3000 threads, then we have 3000 struct > > > utrace_engine created ? I wonder what effet this could have one > > > cachelines if this is used to trace hot paths like system call > > > entry/exit. Have you benchmarked this kind of scenario under tbench ? > > > > It has not been a problem, since utrace_engines are designed to be > > lightweight. Starting or stopping a systemtap script of the form > > > > probe process.syscall {} > > > > appears to have no noticable impact on a tbench suite. > > Do you mean starting this script for a single process or for _all_ the > processes and threads running on the system ? The script above usually applies to all threads. > > > Looking at utrace_set_events(), we seem to be limited to 32 events on a > > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small? > > > > There are only a few types of thread events that involve different > > classes of treatment, or different degrees of freedom in terms of > > interference with the uninstrumented fast path of the threads. [...] > > If we limit ourself to thread-interaction events, I agree that they are > limited. But in the system-wide tracing scenario, the criterions for > filtering can apply to many more event categories. If those different criteria have equivalent impact on running threads, there is no need to differentiate them at the low (utrace event flag) level. Could you offer an example to clarify? > Referring to Roland's reply, I think using utrace to enable > system-wide collection of data would just be a waste of > resources. Going through a more lightweight system-wide activation > seems more appropriate to me. [...] Perhaps. OTOH it also makes sense to me to use (and improve) one general facility, if it can do the right thing almost as fast as a wholly separate facility that's specialized for one small purpose. The decision would probably rest with a more data-based comparison of performance & code size. - FChE From mathieu.desnoyers at polymtl.ca Mon Mar 23 17:03:56 2009 From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers) Date: Mon, 23 Mar 2009 13:03:56 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090323165242.GB18774@redhat.com> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090316221800.GE12974@redhat.com> <20090323164208.GB22501@Krystal> <20090323165242.GB18774@redhat.com> Message-ID: <20090323170356.GD24084@Krystal> * Frank Ch. Eigler (fche at redhat.com) wrote: > Hi - > > On Mon, Mar 23, 2009 at 12:42:08PM -0400, Mathieu Desnoyers wrote: > > [...] > > (Please trim emails you're responding to.) > > > [...] > > > > So if the system has, say 3000 threads, then we have 3000 struct > > > > utrace_engine created ? I wonder what effet this could have one > > > > cachelines if this is used to trace hot paths like system call > > > > entry/exit. Have you benchmarked this kind of scenario under tbench ? > > > > > > It has not been a problem, since utrace_engines are designed to be > > > lightweight. Starting or stopping a systemtap script of the form > > > > > > probe process.syscall {} > > > > > > appears to have no noticable impact on a tbench suite. > > > > Do you mean starting this script for a single process or for _all_ the > > processes and threads running on the system ? > > The script above usually applies to all threads. > Hrm, I already spent more time installing and benchmarking systemtap than I should, so I don't have time currently to run further systemtap benchmarks, but I seriously doubt about this. Have you run the following benchmark ? Baseline : vanilla kernel, without utrace Comparison with : utrace-enabled kernel, with the syscall probe activated ? If you are comparing a utrace-enabled kernel with and without the syscall probes activated, then you are probably missing some performance impact. Also make sure AUDIT SYSCALL, secure computing and frame pointers are disabled in your baseline kernel too. If this is what you did, I would really like to see the numbers. > > > > > Looking at utrace_set_events(), we seem to be limited to 32 events on a > > > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small? > > > > > > There are only a few types of thread events that involve different > > > classes of treatment, or different degrees of freedom in terms of > > > interference with the uninstrumented fast path of the threads. [...] > > > > If we limit ourself to thread-interaction events, I agree that they are > > limited. But in the system-wide tracing scenario, the criterions for > > filtering can apply to many more event categories. > > If those different criteria have equivalent impact on running threads, > there is no need to differentiate them at the low (utrace event flag) > level. Could you offer an example to clarify? > > > > Referring to Roland's reply, I think using utrace to enable > > system-wide collection of data would just be a waste of > > resources. Going through a more lightweight system-wide activation > > seems more appropriate to me. [...] > > Perhaps. OTOH it also makes sense to me to use (and improve) one > general facility, if it can do the right thing almost as fast as a > wholly separate facility that's specialized for one small purpose. > The decision would probably rest with a more data-based comparison of > performance & code size. > Sure. Mathieu > > - FChE -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 From mathieu.desnoyers at polymtl.ca Mon Mar 23 17:33:15 2009 From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers) Date: Mon, 23 Mar 2009 13:33:15 -0400 Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure In-Reply-To: <20090319103434.CBE69FC3AB@magilla.sf.frob.com> References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com> <1236401580-5758-2-git-send-email-fweisbec@gmail.com> <49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere> <49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal> <20090317052442.GA32674@redhat.com> <20090317160029.GD10092@Krystal> <20090319103434.CBE69FC3AB@magilla.sf.frob.com> Message-ID: <20090323173315.GG24084@Krystal> Hi Roland, * Roland McGrath (roland at redhat.com) wrote: > The utrace API itself is not a good fit for global tracing, since its > purpose is tracing and control of individual user threads. There is > no reason to allocate its per-task data structures when you are going > to treat all tasks the same anyway. The points that I think are being > missed are about the possibilities of overloading TIF_SYSCALL_TRACE. > > It's true that ptrace uses TIF_SYSCALL_TRACE as a flag for whether you are > in the middle of a PTRACE_SYSCALL, so it can be confused by setting it for > other purposes on a task that is also ptrace'd (but not with PTRACE_SYSCALL). > Until we are able to do away with these parts of the old ptrace innards, > you can't overload TIF_SYSCALL_TRACE without perturbing ptrace behavior. > Yes, this is why I went with a different thread flag in my TIF_KERNEL_TRACE implementation. > The utrace code does not have this problem. It keeps its own state bits, > so for it, TIF_SYSCALL_TRACE means exactly "the task will call > tracehook_report_syscall_*" and no more. To use TIF_SYSCALL_TRACE for > another purpose, just set it on all the tasks you like (and/or set it on > new tasks in fork.c) and add your code (tracepoints, whatever) to > tracehook_report_syscall_* alongside the calls there into utrace. There is > exactly one place in utrace code that clears TIF_SYSCALL_TRACE, and you > just add "&& !global_syscall_tracing_enabled" to the condition there. You > don't need to bother clearing TIF_SYSCALL_TRACE on any task when you're > done. If your "global_syscall_tracing_enabled" (or whatever it is) is > clear, each task will lazily fall into utrace at its next syscall > entry/exit and then utrace will reset TIF_SYSCALL_TRACE when it finds no > reason left to have it on. I wonder how racy enabling system-wide tracing and disabling utrace tracing on a specific thread would be ? How do you ensure that the global tracing flag and per-thread flags are updated consistently ? I also wonder about added performance impact caused by the tracehook_report_syscall_* call. Ideally, system-wide syscall tracing should call directly into a tracing callback, write to the trace buffers, and return. With utrace, we would have to call an intermediate callback, which would then call our tracer, then test utrace flags to check if utrace should be called, and then return. Function calls are quite costly nowadays :( > > I'm not really going to delve into utrace internals in this thread. Please > raise those questions in review of the utrace patches when current code is > actually posted, where they belong. Here I'll just mention the relevant > things that relate to the underlying issue you raised about synchronization. > As thoroughly documented, utrace_set_events() is a quick, asynchronous call > that itself makes no guarantees about how quickly a running task will start > to report the newly-requested events. For purposes relevant here, it just > sets TIF_SYSCALL_TRACE and nothing else. In utrace, if you want synchronous > assurance that a task misses no events you ask for, then you must first use > utrace_control (et al) to stop it and synchronize. That is not something > that makes much sense at all for a "flip on global tracing" operation, which > is not generally especially synchronous with anything else. If you want > best effort that a task will pick up newly-requested events Real Soon Now, > you can use utrace_control with just UTRACE_REPORT. For purposes relevant > here, this just does set_notify_resume(). That will send an IPI if the task > is running, and then it will start noticing before it returns to user mode. > So: > set_tsk_thread_flag(task, TIF_SYSCALL_TRACE); > set_notify_resume(task); > is what I would expect you to do on each task if you want to quickly get it > to start hitting tracehook_report_syscall_*. (I'm a bit dubious that there > is really any need to speed it up with set_notify_resume, but that's just me.) Ideally, when we start tracing, setting the flag can be asynchronous, but we need to have a way to figure out when tracing is actually active (e.g. rcu quiescent state). So this can be seen as synchronous activation. Stopping all tasks does not really make much sense for system-wide tracing, especially if there are alternatives. > > Finally, some broader points about TIF_SYSCALL_TRACE that I think > have been overlooked. The key special feature of TIF_SYSCALL_TRACE > is that it gets you to a place where full user_regset access is > available. Debuggers need this to read (and write) the full user > register state arbitrarily, which they also need to do user > backtraces and the like. If you do not need user_regset to work, > then you don't need to be on this (slowest) code path. LTTng had userspace backtraces on syscall entry and irq entry a while ago, and this way particularly useful. But I agree than if this is not needed, we should go for the warm path. > > If you are only interested in reading syscall arguments and results > (or even in changing syscall results in exit tracing) then you do > not need user_regset and you do not need to take the slowest syscall > path. (If you are doing backtraces but already rely on full kernel > stack unwinding to do it, you also do not need user_regset.) From > anywhere inside the kernel, you can use the asm/syscall.h calls to > read syscall args, whichever entry path the task took. > > The other mechanism to hook into every syscall entry/exit is > TIF_SYSCALL_AUDIT. On some machines (like x86), this takes a third, > "warm" code path that is faster than the TIF_SYSCALL_TRACE path > (though obviously still off the fastest direct code path). It can > be faster precisely because it doesn't need to allow for user_regset > access, nor for modification of syscall arguments in entry tracing. > For normal read-only tracing of just the actual syscall details, > it has all you need. > > Unfortunately the arch code all looks like: > > if (unlikely(current->audit_context)) > audit_syscall_{entry,exit}(...); > > So we need to change that to: > > if (unlikely(test_thread_flag(TIF_SYSCALL_AUDIT))) > audit_syscall_{entry,exit}(...); > > But that is pretty easy to get right, even doing it blind on arch's > you can't test. Far better than adding new asm hackery for each arch > that's almost identical to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT (and > finding out that some are fresh out of TIF bits in the range that > their asm code can handle). > > TIF_SYSCALL_AUDIT is only set when allocating audit_context, and its > paths already have !context tests so overloading is harmless today. > (Whereas with TIF_SYSCALL_TRACE, you have to wait for later ptrace > cleanups or write off using ptrace simultaneously.) > > Then you can do the lazy disable in audit_syscall_{entry,exit} with: > > if (unlikely(!context)) { > if (unlikely(!global_syscall_tracing_enabled)) > clear_thread_flag(TIF_SYSCALL_AUDIT); > return; > } > > Plus add there your tracepoint or whatnot. > > Unless you really plan to use user_regset in your tracepoints, then > I think this is a better plan for global syscall tracing than either > fiddling with TIF_SYSCALL_TRACE or adding new arch asm requirements. > (IMHO, the latter is the worst idea on the table.) > Thanks for this thorough review of TIF flags. Hrm, racing with other pieces of infrastructure is never fun, and given we might want to save the userspace stack in some probes, I think it could be a good idea to go with our own flag. Mathieu > > Thanks, > Roland -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 From fche at redhat.com Mon Mar 23 20:25:03 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 23 Mar 2009 16:25:03 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090322120811.GD19826@elte.hu> References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com> <20090322120811.GD19826@elte.hu> Message-ID: <20090323202503.GD18774@redhat.com> Hi - On Sun, Mar 22, 2009 at 01:08:11PM +0100, Ingo Molnar wrote: > [...] > > In my own limited kernel-building experience, I find the debuginfo > > data conveniently and instantly available after every "make". Can > > you elaborate how is it harder for you to incidentally make it > > than for someone to download it? > > Four reasons: > > 1) > > I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, > because it slows down the kernel build times significantly: [...] OK, 15% longer time. > 2) > > When the kernel build becomes IO-bound [...] > without: 870.36 user 292.79 system 3:32.10 elapsed 548% CPU > with: 929.65 user 384.55 system 8:28.70 elapsed 258% CPU OK, lots of network traffic. > 3) [...] > Try to build 1.6 GB of dirty data on ext3 and run into an fsync() in > your editor ... you'll sit there twiddling thumbs for a minute or > more. Now don't go blaming us for ext3 & fsync & not having a low enough /proc/sys/vm/dirty_background_ratio. > 4) > Or yet another metric - Linux distro package overhead. Try > installing a debuginfo package: [...] Same as 3). > And this download has to be repeated for _every_ minor kernel > upgrade. Actually, no. If you just want to run the newly built kernel somewhere else on your network, you can run a systemtap compile server on your build machine, and let the systemtap network client do ~RPCs to get at the data. > The solution?) > > Dunno - but i definitely think we should think bigger: > > The fundamental disconnect i believe seems to come from the fact > that most user-space projects are relatively small, so debuginfo > bloat is a secondary issue there. (Well, the fedora debuginfo archive shows a couple of packages of similar or greater heft than the kernel - openoffice.org, qt3, ...) > A few random ideas: > > [...] why not build debuginfo on the fly, when a debugging > session requires it? Rarely do we need debuginfo for more than a > fraction of the whole kernel. [...] > I mean, lets _use_ the fact that we have source code available, more > intelligently. It takes zero time to build detailed debuginfo for a > portion of a tree. [...] Something like that might be made to work. Note that this backs away from earlier claims that we can make do without debuginfo, or that the compiler can't be trusted to build the stuff. We all agree it'd be nice if made it better and made a little less. - FChE From torvalds at linux-foundation.org Mon Mar 23 20:39:22 2009 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 23 Mar 2009 13:39:22 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323202503.GD18774@redhat.com> References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com> <20090322120811.GD19826@elte.hu> <20090323202503.GD18774@redhat.com> Message-ID: On Mon, 23 Mar 2009, Frank Ch. Eigler wrote: > > I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, > > because it slows down the kernel build times significantly: [...] > > OK, 15% longer time. It's way more than that if you don't have tons of memory and excessive amounts of CPU power. > > 2) > > > > When the kernel build becomes IO-bound [...] > > without: 870.36 user 292.79 system 3:32.10 elapsed 548% CPU > > with: 929.65 user 384.55 system 8:28.70 elapsed 258% CPU > > OK, lots of network traffic. This is the _normal_ situation for a debug info build. If it's not network traffic (distcc), it's just disk IO. Have you tried it on a laptop? Ingo is not the only one that turns off DEBUG_INFO in disgust. It's totally useless for any sane kernel developer - the costs are excessive. Adn that's totally ignoring the disk usage of multiple debug info kernels. > Note that this backs away from earlier claims that we can make do > without debuginfo, or that the compiler can't be trusted to build the > stuff. We all agree it'd be nice if made it better and made a little > less. Gaah. I'd wish you all agreed that DEBUG_INFO is just TOTALLY UNREALISTIC. Linus From tytso at mit.edu Mon Mar 23 21:44:17 2009 From: tytso at mit.edu (Theodore Tso) Date: Mon, 23 Mar 2009 17:44:17 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323151400.GA3413@redhat.com> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> Message-ID: <20090323214417.GD5814@mit.edu> On Mon, Mar 23, 2009 at 04:14:00PM +0100, Oleg Nesterov wrote: > > Yes, ptrace-over-utrace needs more work. But your message looks as if > utrace core is buggy, imho this is a bit unfair ;) > > As Roland said, ptrace-over-utrace is not ready yet. If you mean that > utrace core should not be merged alone - this is another story. > > But personally I understand why Roland sends utrace core before changing > ptrace. Yes, but if it's going to be merged this during 2.6.x cycle, we need to have a user for the kernel interface along with the new kernel interface. This is true for any body trying to add some new infrastructure to the kernel; you have to have an in-tree user of said interface. I mean, if some device manufacturer were to go to Red Hat's kernel team, and say, "we need this interface for our uber expensive RDMA interface card", and there was no in-kernel user for the interface, we know what Red Hat would tell that device manufacturer, right? So why is the SystemTap team trying to get an exception for utrace? It just seems a little hypocritical. So what about the ftrace user of utrace? Is that ready to be merged? - Ted From renzo at cs.unibo.it Mon Mar 23 23:59:24 2009 From: renzo at cs.unibo.it (Renzo Davoli) Date: Tue, 24 Mar 2009 00:59:24 +0100 Subject: utrace-kmview contract Message-ID: <20090323235924.GD23807@cs.unibo.it> Dear Roland, You are right when you say that the interface specification is a contract between utrace and the module writers. My goal is to use utrace for my virtual machines, your goal is to design utrace as a support for a wide range of applications. I hope your "wide range of applications" will include kmview. In my perception utrace's support of multiple engines needs a supplement of investigation. I do not want my patches enter utrace code provided there is another fast/clean/easy to code way to reach the same results. It is not for kmview alone, I think this is an example for a range of virtualization application based on utrace. When utrace is used for debugging, "the faster, the better" invariant holds, but when you are dealing with virtualization the rule changes to "the slower, the useless!". Debugging is a temporary state of an application, while virtualization must be designed to be used as a standard environment. Sometimes a picture worth thousands of words. http://www.cs.unibo.it/~renzo/4roland20090323.pdf I have drawn some examples. This is actually a simplified view just to show the problems. The module unreal is a test module for kmview that virtualizes the /unreal subtree as a "copy" of the file system ("/unreal/x/y/z is the file /x/y/z). I know that a so simple transformation could have been implemented directly inside the report_syscall function but kmview is a general support for virtualization. unreal is just a simple test for it. kmview is composed by a kernel module and the "agent" in user space. In the first slide a user runs kmview and inside the vm he/she loads the unreal module and runs a cat command. When cat tries to open "/unreal/etc/passwd", unreal rewrites the path to /etc/passwd, the kernel runs an "open" system call but the arguments have been modified. The report_syscall_entry routine must send the path to kmview in userland and wait for the answer. The number on the arrows show the sequence of actions. The second slide shows a tracing/debugging tool used with virtualization. This is an example of multiple engines working on the same process. strace must read its data before the virtualization for report_syscall_entry. On the contrary the return value shown by strace must be the one returned by the kmview virtualization engine, thus the order for report_syscall_entry is the reverse of that used by report_syscall_exit. Note that if instead of "strace cat /unreal/etc/passwd" our user wrote "strace -f -o /tmp/xxx kmview bash" as the first command the order of the engine would have been inverted. strace in fact should show the system call trace as they appear "outside the virtualization" as one may expect from the command. The third slide shows a nested virtualization and the forth a debug tool running inside a nested virtualization. In all these examples I'd use UTRACE_STOP. Now let us discuss the details of the contract ;-) I set up two different implementations of kmview kernel module. In the standard one (#undefine KMVIEW_NEWSTOP) the report_syscall function returns UTRACE_STOP waiting for the answer from kmview application. The new one (#define KMVIEW_NEWSTOP) uses a semaphore to stop the execution inside the report_syscall function which always returns UTRACE_RESUME. -------------------------------------------------------------------- If you decide that the right implementation is the former (#undefine KMVIEW_NEWSTOP): - please tell me how to implement the example of page 3 if in the management of syscall_entry for kmview2 does not stop prior to call kmview1. Okay, you say kmview1's module receives a notification that another engine wants to stop reading its @action argument but it needs the state as modified by kmview2. - I could set up some kind of synchronization among kmview machines but the solution would be extremely weak. What about if kmview run nested with another virtualization/tracing application based on utracei e.g. strace? - You say "use UTRACE_REPORT" to wait for the other machines are done fiddling with it. The comment you wrote about UTRACE_REPORT says: * This is like %UTRACE_RESUME, but also ensures that there will be * a @report_quiesce or @report_signal callback made soon. If * @target had been stopped, then there will be a callback before it * resumes running normally. If another engine is keeping @target * stopped, then there might be no callbacks until all engines let * it resume. But if kmview1 and 2 have both stopped the report_syscall so no callback will be called until both finishes. Otherwise you may mean that kmview1 returns UTRACE_RESUME and when kmview1's report quiesce get called it returns UTRACE_STOP. In this way the management of the system call should be moved from the report_syscall_entry to report_quiesce but just for kmview1. Which one is the cleaner way to implement a service on utrace in you opinion? In my opinion the possibility to have the process blocked before calling the next report function leads to simpler code. Was this design choice chosen for efficiency? I feel that all this long sequence of report callbacks ends up slowing down the virtualization. "the slower the useless" I said.... Are you sure that each engine should examine by themselves what the other engines do, as utrace provides almost no synchronization rules between them? For sure you have in your mind examples where engines have to run concurrently when one or more return UTRACE_STOP. But there are other cases in which you need to stop a process before calling the next engine's report function. Instead of changing the semantics of UTRACE_STOP you could add a UTRACE_STOP_NOW return value to stop the engine before calling the next engine's report function ------------------------------------------------------------------- If you decide that the latter implementation is the right one (#define KMVIEW_NEWSTOP) - This means that I am not using UTRACE_STOP at all. I have implemented another way the support to stop a process. I don't think it is a good idea to stay in the report function for a long time, UTRACE_STOP was designed for that purpose. - The management of asynchronous events is harder as the process can be stopped in many "levels" of the architecture. - If you say that this is the right way to do it, I'll keep this code but I'll be wondering what is UTRACE_STOP for. ------------------------------------------------------------------- In both cases the order of report_syscall_entry report function must be reversed (with respect to all the other report functions) otherwise all the nested engine examples fail. ciao renzo Note: the actual kmview and unreal work in a slightly different way. This final note is useful if you want to read the code or run the examples otherwise it can be safely skipped. 1- kmview VMM (the agent) does not rewrite the path but open the file itself. 2- the nested kmview VMM itself runs in the space virtualized by the outer kmview. The drawings would have been more complex but the problem is the same, a process running in a nested kmview has one utrace engine for each kmview. 3- actual unreal provides two levels of /unreal. kmview + unreal provide /unreal and /unreal/unreal as copies of the file system kmview+unreal+kmview+unreal (nested) provide /unreal /unreal/unreal /unreal/unreal/unreal and /unreal/unreal/unreal/unreal. From ananth at in.ibm.com Tue Mar 24 05:29:26 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 24 Mar 2009 10:59:26 +0530 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> Message-ID: <20090324052926.GC24018@in.ibm.com> On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: > On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" wrote: > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: > > I have strong memories of being traumatised by reading the uprobes code. That was a long time ago wasn't it? :-) That approach was a carry over from an implementation from dprobes that used readdir hooks. Yes, that was not the most elegant approach, as such has long been shelved. > What's the story on all of that nowadays? Utrace makes implementing uprobes more cleaner. We have a prototype that implements uprobes over utrace. Its per process, doesn't use any in-kernel hooks, etc. It currently has a kprobes like interface (needs a kernel module), but it shouldn't be difficult to adapt it to use utrace's user interfaces (syscalls?) when those come around. The current generation of uprobes that has all the bells and whistles can be found at http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2 However, there are aspects of the current uprobes that can be useful to any other userspace tracer: instruction analysis, breakpoint insertion and removal, single-stepping support. With these layered on top of utrace, building userspace debug/trace tools that depend on utrace should be easier, outside of ptrace. Work is currently on to factor these layers out. The intention is to upstream all the bits required for userspace tracing once utrace gets in, along with an easy to use interface for userspace developers (a /proc or /debugfs interface?) -- one should be able to use it on its own or with SystemTap, whatever they prefer. Details are still hazy at the moment. But, utrace is the foundation to do all of that. Ananth From akpm at linux-foundation.org Tue Mar 24 05:54:09 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Mon, 23 Mar 2009 22:54:09 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090324052926.GC24018@in.ibm.com> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090324052926.GC24018@in.ibm.com> Message-ID: <20090323225409.07bdcbf7.akpm@linux-foundation.org> On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli wrote: > On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: > > On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" wrote: > > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: > > > > I have strong memories of being traumatised by reading the uprobes code. > > That was a long time ago wasn't it? :-) > > That approach was a carry over from an implementation from dprobes that > used readdir hooks. Yes, that was not the most elegant approach, as such > has long been shelved. > > > What's the story on all of that nowadays? > > Utrace makes implementing uprobes more cleaner. We have a prototype that > implements uprobes over utrace. Its per process, doesn't use any > in-kernel hooks, etc. It currently has a kprobes like interface (needs a > kernel module), but it shouldn't be difficult to adapt it to use > utrace's user interfaces (syscalls?) when those come around. The current > generation of uprobes that has all the bells and whistles can be found at > http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2 > > However, there are aspects of the current uprobes that can be useful to > any other userspace tracer: instruction analysis, breakpoint insertion > and removal, single-stepping support. With these layered on top of > utrace, building userspace debug/trace tools that depend on utrace > should be easier, outside of ptrace. > > Work is currently on to factor these layers out. The intention is to > upstream all the bits required for userspace tracing once utrace gets > in, along with an easy to use interface for userspace developers > (a /proc or /debugfs interface?) -- one should be able to use it on > its own or with SystemTap, whatever they prefer. Details are still hazy > at the moment. > > But, utrace is the foundation to do all of that. > The sticking point was uprobes's modification of live pagecache. We said "ick, COW the pages" and you said "too expensive". And there things remained. Is that all going to happen again? From ananth at in.ibm.com Tue Mar 24 06:10:24 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 24 Mar 2009 11:40:24 +0530 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323225409.07bdcbf7.akpm@linux-foundation.org> References: <20090321013946.890F4FC3AB@magilla.sf.frob.com> <20090321014244.9ADF1FC3AB@magilla.sf.frob.com> <20090321074301.GA19384@elte.hu> <20090321013912.ed6039c9.akpm@linux-foundation.org> <20090321091235.GA29678@elte.hu> <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090324052926.GC24018@in.ibm.com> <20090323225409.07bdcbf7.akpm@linux-foundation.org> Message-ID: <20090324061024.GD24018@in.ibm.com> On Mon, Mar 23, 2009 at 10:54:09PM -0700, Andrew Morton wrote: > On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli wrote: > > > On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote: > > > On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" wrote: > > > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote: > > > > > > I have strong memories of being traumatised by reading the uprobes code. > > > > That was a long time ago wasn't it? :-) > > > > That approach was a carry over from an implementation from dprobes that > > used readdir hooks. Yes, that was not the most elegant approach, as such > > has long been shelved. > > > > > What's the story on all of that nowadays? > > > > Utrace makes implementing uprobes more cleaner. We have a prototype that > > implements uprobes over utrace. Its per process, doesn't use any > > in-kernel hooks, etc. It currently has a kprobes like interface (needs a > > kernel module), but it shouldn't be difficult to adapt it to use > > utrace's user interfaces (syscalls?) when those come around. The current > > generation of uprobes that has all the bells and whistles can be found at > > http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2 > > > > However, there are aspects of the current uprobes that can be useful to > > any other userspace tracer: instruction analysis, breakpoint insertion > > and removal, single-stepping support. With these layered on top of > > utrace, building userspace debug/trace tools that depend on utrace > > should be easier, outside of ptrace. > > > > Work is currently on to factor these layers out. The intention is to > > upstream all the bits required for userspace tracing once utrace gets > > in, along with an easy to use interface for userspace developers > > (a /proc or /debugfs interface?) -- one should be able to use it on > > its own or with SystemTap, whatever they prefer. Details are still hazy > > at the moment. > > > > But, utrace is the foundation to do all of that. > > > > The sticking point was uprobes's modification of live pagecache. We said > "ick, COW the pages" and you said "too expensive". And there things > remained. > > Is that all going to happen again? No. All modifications are via access_process_vm(). Ananth From roland at redhat.com Tue Mar 24 10:34:16 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 24 Mar 2009 03:34:16 -0700 (PDT) Subject: seccomp via utrace Message-ID: <20090324103416.26687FC3AB@magilla.sf.frob.com> Here is a trivial module to implement the seccomp guts via utrace. I haven't tested it at all. (AFAIK it was only ever used by cpushare, and that project might be defunct now.) I'm not sure what Ingo had in mind for integrating this. If it's just to reimplement the existing prctl interface, then this is about all you need--just s/_xxx// and fiddle the config et al to build this and not the old stuff. If the approach would be incremental, to leave the old stuff in place, then it might make more sense just to do a fresh new thing not providing that prctl interface at all. A new thing could be a module, and define some /sys files or whatnot for its "constrain me now" hook. I think a sensible thing would not require asm/seccomp.h at all, and instead just let the userland setup feed in a set of syscall numbers. It could be that flexible while still being quite simple so that one could audit that setup code and be confident it has no holes. Then future versions of cpushare (or whatever) would not need any special kernel support for new arch's nor to change the syscall set it wants to allow. Thanks, Roland ===== #include #include #include #include #include #include #include #include #include MODULE_DESCRIPTION("secure computing"); MODULE_LICENSE("GPL"); static int insecure_signal; module_param_named(signal, insecure_signal, int, SIGKILL); /* * If it's an accepted syscall, run it normally. * If not, send ourselves a SIGKILL and abort the syscall. */ static u32 secure_syscall_entry(u32 action, struct utrace_engine *engine, struct task_struct *task, struct pt_regs *regs) { int callno = syscall_get_nr(task, regs); #ifdef CONFIG_COMPAT if (is_compat_task()) switch (callno) { case __NR_seccomp_read_32: case __NR_seccomp_write_32: case __NR_seccomp_exit_32: case __NR_seccomp_sigreturn_32: return UTRACE_RESUME | UTRACE_SYSCALL_RUN; } else #endif switch (callno) { case __NR_seccomp_read: case __NR_seccomp_write: case __NR_seccomp_exit: case __NR_seccomp_sigreturn: return UTRACE_RESUME | UTRACE_SYSCALL_RUN; } force_sig(insecure_signal, task); return UTRACE_RESUME | UTRACE_SYSCALL_ABORT; } static const struct utrace_engine_ops secure_syscall_ops = { .report_syscall_entry = secure_syscall_entry }; /* * Set up a utrace engine to call secure_syscall_entry() for each system call. * Also act like prctl(PR_SET_TSC, PR_TSC_SIGSEGV). */ static int enable_secure_syscall(void) { struct utrace_engine *engine; int ret; engine = utrace_attach_task(current, UTRACE_ATTACH_CREATE | UTRACE_ATTACH_EXCLUSIVE | UTRACE_ATTACH_MATCH_OPS, &secure_syscall_ops, NULL); if (IS_ERR(engine)) { ret = PTR_ERR(engine); return ret == -EEXIST ? -EPERM : ret; } ret = utrace_set_events(current, engine, UTRACE_EVENT(SYSCALL_ENTRY)); WARN_ON(ret); /* Should never happen on current. */ /* * This is the only outside ref on the engine. * The engine dies automatically when this task gets reaped. */ utrace_engine_put(engine); #ifdef SET_TSC_CTL if (!ret) SET_TSC_CTL(PR_TSC_SIGSEGV); #endif return ret; } long prctl_get_seccomp_xxx(void) { struct utrace_engine *engine = utrace_attach_task( current, UTRACE_ATTACH_MATCH_OPS, &secure_syscall_ops, NULL); if (engine == ERR_PTR(-ENOENT)) return 0; if (!IS_ERR(engine)) /* * I wonder how he managed to call prctl() with it enabled. * That should be impossible. */ return 1; return PTR_ERR(engine); } long prctl_set_seccomp_xxx(unsigned long seccomp_mode) { if (seccomp_mode != 1) return -EINVAL; return enable_secure_syscall(); } From roland at redhat.com Tue Mar 24 10:38:42 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 24 Mar 2009 03:38:42 -0700 (PDT) Subject: seccomp via utrace In-Reply-To: Roland McGrath's message of Tuesday, 24 March 2009 03:34:16 -0700 <20090324103416.26687FC3AB@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> Message-ID: <20090324103843.376AAFC3AB@magilla.sf.frob.com> Here is the "one swell foop" patch to cut out the old seccomp stuff, clean up the config, and replace it with the utrace-based one. The kernel/seccomp.c patch looks like a patch because it found some trivia in common, but actually it's wholly replaced with the file I posted before. I still haven't tested it in the slightest, and only compiled it on x86-64. This presumably actually ought to be done in several smaller patches. If it should even be done this way at all. (That is, eagerly cutting out the old seccomp and leaving no seccomp option without utrace.) But here's a completeish proof of concept. Maybe someone wants to pick it up. Thanks, Roland --- [PATCH] utraceify seccomp Signed-off-by: Roland McGrath --- arch/Kconfig | 4 + arch/mips/Kconfig | 18 +---- arch/mips/kernel/ptrace.c | 5 - arch/powerpc/Kconfig | 18 +---- arch/powerpc/include/asm/thread_info.h | 4 +- arch/powerpc/kernel/ptrace.c | 3 - arch/sh/Kconfig | 17 +---- arch/sh/include/asm/thread_info.h | 4 +- arch/sh/kernel/ptrace_32.c | 3 - arch/sh/kernel/ptrace_64.c | 3 - arch/sparc/include/asm/thread_info_64.h | 3 +- arch/x86/Kconfig | 17 +---- arch/x86/kernel/entry_32.S | 8 +- arch/x86/kernel/ptrace.c | 4 - include/linux/sched.h | 2 - include/linux/seccomp.h | 14 --- init/Kconfig | 18 ++++ kernel/seccomp.c | 146 ++++++++++++++++++------------- 18 files changed, 116 insertions(+), 175 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 550dab2..f809f07 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -78,6 +78,10 @@ config HAVE_KPROBES config HAVE_KRETPROBES bool +# select this if the arch has the asm/seccomp.h file. +config HAVE_SECCOMP + bool + # # An arch should select this if it provides all these things: # diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 206cb79..b7c124e 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -4,6 +4,7 @@ config MIPS select HAVE_IDE select HAVE_OPROFILE select HAVE_ARCH_KGDB + select HAVE_SECCOMP # Horrible source of confusion. Die, die, die ... select EMBEDDED select RTC_LIB @@ -1949,23 +1950,6 @@ config KEXEC support. As of this writing the exact hardware interface is strongly in flux, so no good recommendation can be made. -config SECCOMP - bool "Enable seccomp to safely compute untrusted bytecode" - depends on PROC_FS - default y - help - This kernel feature is useful for number crunching applications - that may need to compute untrusted bytecode during their - execution. By using pipes or other transports made available to - the process as file descriptors supporting the read/write - syscalls, it's possible to isolate those applications in - their own address space using seccomp. Once seccomp is - enabled via /proc//seccomp, it cannot be disabled - and the task is only allowed to execute a few safe syscalls - defined by each seccomp mode. - - If unsure, say Y. Only embedded should say N here. - endmenu config RWSEM_GENERIC_SPINLOCK diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c index 054861c..2c19cfd 100644 --- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -24,7 +24,6 @@ #include #include #include -#include #include #include @@ -564,10 +563,6 @@ static inline int audit_arch(void) */ asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit) { - /* do the secure computing check first */ - if (!entryexit) - secure_computing(regs->regs[0]); - if (unlikely(current->audit_context) && entryexit) audit_syscall_exit(AUDITSC_RESULT(regs->regs[2]), regs->regs[2]); diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 74cc312..c71ac02 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -119,6 +119,7 @@ config PPC select HAVE_ARCH_KGDB select HAVE_KRETPROBES select HAVE_ARCH_TRACEHOOK + select HAVE_SECCOMP select HAVE_LMB select HAVE_DMA_ATTRS if PPC64 select USE_GENERIC_SMP_HELPERS if SMP @@ -531,23 +532,6 @@ config ARCH_WANTS_FREEZER_CONTROL source kernel/power/Kconfig endif -config SECCOMP - bool "Enable seccomp to safely compute untrusted bytecode" - depends on PROC_FS - default y - help - This kernel feature is useful for number crunching applications - that may need to compute untrusted bytecode during their - execution. By using pipes or other transports made available to - the process as file descriptors supporting the read/write - syscalls, it's possible to isolate those applications in - their own address space using seccomp. Once seccomp is - enabled via /proc//seccomp, it cannot be disabled - and the task is only allowed to execute a few safe syscalls - defined by each seccomp mode. - - If unsure, say Y. Only embedded should say N here. - endmenu config ISA_DMA_API diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 9665a26..4d30be8 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -105,7 +105,6 @@ static inline struct thread_info *current_thread_info(void) #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SINGLESTEP 8 /* singlestepping active */ #define TIF_MEMDIE 9 -#define TIF_SECCOMP 10 /* secure computing */ #define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ #define TIF_NOERROR 12 /* Force successful syscall return */ #define TIF_NOTIFY_RESUME 13 /* callback before returning to user */ @@ -123,14 +122,13 @@ static inline struct thread_info *current_thread_info(void) #define _TIF_PERFMON_CTXSW (1< #include #include -#include #include #ifdef CONFIG_PPC32 #include @@ -1021,8 +1020,6 @@ long do_syscall_trace_enter(struct pt_regs *regs) { long ret = 0; - secure_computing(regs->gpr[0]); - if (test_thread_flag(TIF_SYSCALL_TRACE) && tracehook_report_syscall_entry(regs)) /* diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index ebabe51..5786e77 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -14,6 +14,7 @@ config SUPERH select HAVE_GENERIC_DMA_COHERENT select HAVE_IOREMAP_PROT if MMU select HAVE_ARCH_TRACEHOOK + select HAVE_SECCOMP help The SuperH is a RISC processor targeted for use in embedded systems and consumer electronics; it was also used in the Sega Dreamcast @@ -521,22 +522,6 @@ config CRASH_DUMP For more details see Documentation/kdump/kdump.txt -config SECCOMP - bool "Enable seccomp to safely compute untrusted bytecode" - depends on PROC_FS - help - This kernel feature is useful for number crunching applications - that may need to compute untrusted bytecode during their - execution. By using pipes or other transports made available to - the process as file descriptors supporting the read/write - syscalls, it's possible to isolate those applications in - their own address space using seccomp. Once seccomp is - enabled via prctl, it cannot be disabled and the task is only - allowed to execute a few safe syscalls defined by each seccomp - mode. - - If unsure, say N. - config SMP bool "Symmetric multi-processing support" depends on SYS_SUPPORTS_SMP diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h index f09ac48..e1da51a 100644 --- a/arch/sh/include/asm/thread_info.h +++ b/arch/sh/include/asm/thread_info.h @@ -114,7 +114,6 @@ extern void free_thread_info(struct thread_info *ti); #define TIF_RESTORE_SIGMASK 3 /* restore signal mask in do_signal() */ #define TIF_SINGLESTEP 4 /* singlestepping active */ #define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ -#define TIF_SECCOMP 6 /* secure computing */ #define TIF_NOTIFY_RESUME 7 /* callback before returning to user */ #define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */ #define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling TIF_NEED_RESCHED */ @@ -127,7 +126,6 @@ extern void free_thread_info(struct thread_info *ti); #define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK) #define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) -#define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_USEDFPU (1 << TIF_USEDFPU) #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG) @@ -141,7 +139,7 @@ extern void free_thread_info(struct thread_info *ti); /* work to do in syscall trace */ #define _TIF_WORK_SYSCALL_MASK (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP | \ - _TIF_SYSCALL_AUDIT | _TIF_SECCOMP) + _TIF_SYSCALL_AUDIT) /* work to do on any return to u-space */ #define _TIF_ALLWORK_MASK (_TIF_SYSCALL_TRACE | _TIF_SIGPENDING | \ diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c index 29ca09d..c83d0fe 100644 --- a/arch/sh/kernel/ptrace_32.c +++ b/arch/sh/kernel/ptrace_32.c @@ -22,7 +22,6 @@ #include #include #include -#include #include #include #include @@ -438,8 +437,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs) { long ret = 0; - secure_computing(regs->regs[0]); - if (test_thread_flag(TIF_SYSCALL_TRACE) && tracehook_report_syscall_entry(regs)) /* diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c index 6950974..e65dbe0 100644 --- a/arch/sh/kernel/ptrace_64.c +++ b/arch/sh/kernel/ptrace_64.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include #include @@ -427,8 +426,6 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs) { long long ret = 0; - secure_computing(regs->regs[9]); - if (test_thread_flag(TIF_SYSCALL_TRACE) && tracehook_report_syscall_entry(regs)) /* diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h index 639ac80..b303b93 100644 --- a/arch/sparc/include/asm/thread_info_64.h +++ b/arch/sparc/include/asm/thread_info_64.h @@ -227,7 +227,7 @@ register struct thread_info *current_thread_info_reg asm("g6"); /* flag bit 6 is available */ #define TIF_32BIT 7 /* 32-bit binary */ /* flag bit 8 is available */ -#define TIF_SECCOMP 9 /* secure computing */ +/* flag bit 9 is available */ #define TIF_SYSCALL_AUDIT 10 /* syscall auditing active */ /* flag bit 11 is available */ /* NOTE: Thread flags >= 12 should be ones we have no interest @@ -246,7 +246,6 @@ register struct thread_info *current_thread_info_reg asm("g6"); #define _TIF_PERFCTR (1< #include #include -#include #include #include @@ -1411,9 +1410,6 @@ asmregparm long syscall_trace_enter(struct pt_regs *regs) if (test_thread_flag(TIF_SINGLESTEP)) regs->flags |= X86_EFLAGS_TF; - /* do the secure computing check first */ - secure_computing(regs->orig_ax); - if (unlikely(test_thread_flag(TIF_SYSCALL_EMU))) ret = -1L; diff --git a/include/linux/sched.h b/include/linux/sched.h index 786ef2d..4a22d98 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -76,7 +76,6 @@ struct sched_param { #include #include #include -#include #include #include @@ -1286,7 +1285,6 @@ struct task_struct { uid_t loginuid; unsigned int sessionid; #endif - seccomp_t seccomp; #ifdef CONFIG_UTRACE struct utrace utrace; diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 262a8dc..02d7adb 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -4,27 +4,13 @@ #ifdef CONFIG_SECCOMP -#include #include -typedef struct { int mode; } seccomp_t; - -extern void __secure_computing(int); -static inline void secure_computing(int this_syscall) -{ - if (unlikely(test_thread_flag(TIF_SECCOMP))) - __secure_computing(this_syscall); -} - extern long prctl_get_seccomp(void); extern long prctl_set_seccomp(unsigned long); #else /* CONFIG_SECCOMP */ -typedef struct { } seccomp_t; - -#define secure_computing(x) do { } while (0) - static inline long prctl_get_seccomp(void) { return -EINVAL; diff --git a/init/Kconfig b/init/Kconfig index 4b5ab3e..bc90ad3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1069,6 +1069,24 @@ menuconfig UTRACE kernel interface exported to kernel modules, to track events in user threads, extract and change user thread state. +config SECCOMP + bool "Enable seccomp to safely compute untrusted bytecode" + default y if UTRACE + depends on UTRACE + depends on HAVE_SECCOMP + help + This kernel feature is useful for number crunching applications + that may need to compute untrusted bytecode during their + execution. By using pipes or other transports made available to + the process as file descriptors supporting the read/write + syscalls, it's possible to isolate those applications in + their own address space using seccomp. Once seccomp is + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled + and the task is only allowed to execute a few safe syscalls + defined by each seccomp mode. + + If unsure, say Y. Only embedded should say N here. + source "block/Kconfig" config PREEMPT_NOTIFIERS diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 57d4b13..f14d1fd 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1,86 +1,108 @@ -/* - * linux/kernel/seccomp.c - * - * Copyright 2004-2005 Andrea Arcangeli - * - * This defines a simple but solid secure-computing mode. - */ - #include -#include +#include +#include +#include #include - -/* #define SECCOMP_DEBUG 1 */ -#define NR_SECCOMP_MODES 1 +#include +#include /* - * Secure computing mode 1 allows only read/write/exit/sigreturn. - * To be fully secure this must be combined with rlimit - * to limit the stack allocations too. + * If it's an accepted syscall, run it normally. + * If not, send ourselves a SIGKILL and abort the syscall. */ -static int mode1_syscalls[] = { - __NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn, - 0, /* null terminated */ -}; +static u32 secure_syscall_entry(u32 action, + struct utrace_engine *engine, + struct task_struct *task, + struct pt_regs *regs) +{ + int callno = syscall_get_nr(task, regs); #ifdef CONFIG_COMPAT -static int mode1_syscalls_32[] = { - __NR_seccomp_read_32, __NR_seccomp_write_32, __NR_seccomp_exit_32, __NR_seccomp_sigreturn_32, - 0, /* null terminated */ -}; + if (is_compat_task()) + switch (callno) { + case __NR_seccomp_read_32: + case __NR_seccomp_write_32: + case __NR_seccomp_exit_32: + case __NR_seccomp_sigreturn_32: + return UTRACE_RESUME | UTRACE_SYSCALL_RUN; + } + else #endif + switch (callno) { + case __NR_seccomp_read: + case __NR_seccomp_write: + case __NR_seccomp_exit: + case __NR_seccomp_sigreturn: + return UTRACE_RESUME | UTRACE_SYSCALL_RUN; + } -void __secure_computing(int this_syscall) + force_sig(SIGKILL, task); + return UTRACE_RESUME | UTRACE_SYSCALL_ABORT; +} + +static const struct utrace_engine_ops secure_syscall_ops = { - int mode = current->seccomp.mode; - int * syscall; + .report_syscall_entry = secure_syscall_entry +}; - switch (mode) { - case 1: - syscall = mode1_syscalls; -#ifdef CONFIG_COMPAT - if (is_compat_task()) - syscall = mode1_syscalls_32; -#endif - do { - if (*syscall == this_syscall) - return; - } while (*++syscall); - break; - default: - BUG(); +/* + * Set up a utrace engine to call secure_syscall_entry() for each system call. + * Also act like prctl(PR_SET_TSC, PR_TSC_SIGSEGV). + */ +static int enable_secure_syscall(void) +{ + struct utrace_engine *engine; + int ret; + + engine = utrace_attach_task(current, + UTRACE_ATTACH_CREATE | + UTRACE_ATTACH_EXCLUSIVE | + UTRACE_ATTACH_MATCH_OPS, + &secure_syscall_ops, NULL); + if (IS_ERR(engine)) { + ret = PTR_ERR(engine); + return ret == -EEXIST ? -EPERM : ret; } -#ifdef SECCOMP_DEBUG - dump_stack(); + ret = utrace_set_events(current, engine, UTRACE_EVENT(SYSCALL_ENTRY)); + WARN_ON(ret); /* Should never happen on current. */ + + /* + * This is the only outside ref on the engine. + * The engine dies automatically when this task gets reaped. + */ + utrace_engine_put(engine); + +#ifdef SET_TSC_CTL + if (!ret) + SET_TSC_CTL(PR_TSC_SIGSEGV); #endif - do_exit(SIGKILL); + + return ret; } long prctl_get_seccomp(void) { - return current->seccomp.mode; + struct utrace_engine *engine = utrace_attach_task( + current, UTRACE_ATTACH_MATCH_OPS, &secure_syscall_ops, NULL); + + if (engine == ERR_PTR(-ENOENT)) + return 0; + + if (!IS_ERR(engine)) + /* + * I wonder how he managed to call prctl() with it enabled. + * That should be impossible. + */ + return 1; + + return PTR_ERR(engine); } long prctl_set_seccomp(unsigned long seccomp_mode) { - long ret; - - /* can set it only once to be even more secure */ - ret = -EPERM; - if (unlikely(current->seccomp.mode)) - goto out; - - ret = -EINVAL; - if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) { - current->seccomp.mode = seccomp_mode; - set_thread_flag(TIF_SECCOMP); -#ifdef TIF_NOTSC - disable_TSC(); -#endif - ret = 0; - } + if (seccomp_mode != 1) + return -EINVAL; - out: - return ret; + return enable_secure_syscall(); } From mingo at elte.hu Tue Mar 24 10:48:49 2009 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 24 Mar 2009 11:48:49 +0100 Subject: seccomp via utrace In-Reply-To: <20090324103416.26687FC3AB@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> Message-ID: <20090324104849.GA32357@elte.hu> * Roland McGrath wrote: > Here is a trivial module to implement the seccomp guts via utrace. > I haven't tested it at all. (AFAIK it was only ever used by > cpushare, and that project might be defunct now.) > > I'm not sure what Ingo had in mind for integrating this. If it's > just to reimplement the existing prctl interface, then this is > about all you need--just s/_xxx// and fiddle the config et al to > build this and not the old stuff. > > If the approach would be incremental, to leave the old stuff in > place, then it might make more sense just to do a fresh new thing > not providing that prctl interface at all. A new thing could be a > module, and define some /sys files or whatnot for its "constrain > me now" hook. I think a sensible thing would not require > asm/seccomp.h at all, and instead just let the userland setup feed > in a set of syscall numbers. It could be that flexible while still > being quite simple so that one could audit that setup code and be > confident it has no holes. Then future versions of cpushare (or > whatever) would not need any special kernel support for new arch's > nor to change the syscall set it wants to allow. nice! The simplification factor is already significant: 18 files changed, 116 insertions(+), 175 deletions(-) That is what we want - to remove special TIF flag uses and replace them with utrace driven machinery. Another future target could be to replace TIF_SYSCALL_FTRACE [in the latest tracing tree] with a similar utrace driven solution. Regarding ptrace-via-utrace. What is the plan there? Am i looking the right branch: | earth4:~/linux.trees.git> git diff --stat | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c | kernel/ptrace.c | 803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- | 1 files changed, 794 insertions(+), 9 deletions(-) dc43527: Merge branch 'utrace' into utrace-ptrace I'd have (perhaps foolishly) expected ptrace.c to get reduced in size and arch/x86/kernel/ptrace.c eliminated - but that does not seem to be direction of movement. What am i missing? Ingo From ananth at in.ibm.com Tue Mar 24 11:00:00 2009 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 24 Mar 2009 16:30:00 +0530 Subject: seccomp via utrace In-Reply-To: <20090324104849.GA32357@elte.hu> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> Message-ID: <20090324110000.GA12841@in.ibm.com> On Tue, Mar 24, 2009 at 11:48:49AM +0100, Ingo Molnar wrote: > > * Roland McGrath wrote: > > > Here is a trivial module to implement the seccomp guts via utrace. > > I haven't tested it at all. (AFAIK it was only ever used by > > cpushare, and that project might be defunct now.) > > > > I'm not sure what Ingo had in mind for integrating this. If it's > > just to reimplement the existing prctl interface, then this is > > about all you need--just s/_xxx// and fiddle the config et al to > > build this and not the old stuff. > > > > If the approach would be incremental, to leave the old stuff in > > place, then it might make more sense just to do a fresh new thing > > not providing that prctl interface at all. A new thing could be a > > module, and define some /sys files or whatnot for its "constrain > > me now" hook. I think a sensible thing would not require > > asm/seccomp.h at all, and instead just let the userland setup feed > > in a set of syscall numbers. It could be that flexible while still > > being quite simple so that one could audit that setup code and be > > confident it has no holes. Then future versions of cpushare (or > > whatever) would not need any special kernel support for new arch's > > nor to change the syscall set it wants to allow. > > nice! The simplification factor is already significant: > > 18 files changed, 116 insertions(+), 175 deletions(-) > > That is what we want - to remove special TIF flag uses and replace > them with utrace driven machinery. > > Another future target could be to replace TIF_SYSCALL_FTRACE [in the > latest tracing tree] with a similar utrace driven solution. > > Regarding ptrace-via-utrace. What is the plan there? Am i looking > the right branch: > > | earth4:~/linux.trees.git> git diff --stat > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c > | kernel/ptrace.c | 803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > | 1 files changed, 794 insertions(+), 9 deletions(-) > > dc43527: Merge branch 'utrace' into utrace-ptrace > > I'd have (perhaps foolishly) expected ptrace.c to get reduced in > size and arch/x86/kernel/ptrace.c eliminated - but that does not > seem to be direction of movement. What am i missing? Thats because the version of ptrace.c you are looking at has both the legacy implementation and the ptrace over utrace implementation with #ifdefs to separate them out. I guess Roland wanted to keep the legacy stuff around till the ptrace/utrace becomes stable enough. Ananth From roland at redhat.com Tue Mar 24 11:05:34 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 24 Mar 2009 04:05:34 -0700 (PDT) Subject: seccomp via utrace In-Reply-To: Ingo Molnar's message of Tuesday, 24 March 2009 11:48:49 +0100 <20090324104849.GA32357@elte.hu> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> Message-ID: <20090324110534.BF76DFC3AB@magilla.sf.frob.com> > Regarding ptrace-via-utrace. What is the plan there? Am i looking > the right branch: > > | earth4:~/linux.trees.git> git diff --stat > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c > | kernel/ptrace.c | 803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > | 1 files changed, 794 insertions(+), 9 deletions(-) > > dc43527: Merge branch 'utrace' into utrace-ptrace That is the branch that there is, yes. Its comparison vs its baseline is: include/linux/ptrace.h | 21 ++ include/linux/sched.h | 1 + include/linux/tracehook.h | 19 +- init/Kconfig | 18 + kernel/ptrace.c | 785 ++++++++++++++++++++++++++++++++++++++++++++- kernel/signal.c | 14 +- kernel/utrace.c | 23 ++ 7 files changed, 870 insertions(+), 11 deletions(-) > I'd have (perhaps foolishly) expected ptrace.c to get reduced in > size and arch/x86/kernel/ptrace.c eliminated - but that does not > seem to be direction of movement. What am i missing? Expecting that arch file to go away is just a complete misunderstanding on your part. Look at what is actually in that file. arch_ptrace() and compat_arch_ptrace() are the only things there that are actually part of ptrace per se. I'm not sure how much smaller you expect those to get. Firstly, this branch now is hack-and-slash code. As I've said a few times, the bulk of the work is ptrace clean-up that is not directly related to utrace. (It's necessary stuff to do the utrace version sanely, but it's independent clean-up that will go in ahead of any ptrace changes involving utrace.) That will make it cleaner, but probably not smaller in line counts. You get some more lines when you start using sane data structures instead of all kludges. Moreover, that branch does not remove any code at all. Everything is left the same with CONFIG_UTRACE turned off. All the utrace-based ptrace code is new code on the other side of an #else from some old code. None of this, of course, has anything whatsoever to do with the seccomp thread. I don't know why so many people insist on hijacking every thread for every other thing instead of posting a proper thread on a new subject they raise. I suppose it goes along with verbosely reviewing the diffstats while never looking at the actual code, which also seems to be popular. Thanks, Roland From mingo at elte.hu Tue Mar 24 11:10:56 2009 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 24 Mar 2009 12:10:56 +0100 Subject: seccomp via utrace In-Reply-To: <20090324110000.GA12841@in.ibm.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110000.GA12841@in.ibm.com> Message-ID: <20090324111056.GA6386@elte.hu> * Ananth N Mavinakayanahalli wrote: > > nice! The simplification factor is already significant: > > > > 18 files changed, 116 insertions(+), 175 deletions(-) > > > > That is what we want - to remove special TIF flag uses and replace > > them with utrace driven machinery. > > > > Another future target could be to replace TIF_SYSCALL_FTRACE [in the > > latest tracing tree] with a similar utrace driven solution. > > > > Regarding ptrace-via-utrace. What is the plan there? Am i looking > > the right branch: > > > > | earth4:~/linux.trees.git> git diff --stat > > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c > > | kernel/ptrace.c | 803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > > | 1 files changed, 794 insertions(+), 9 deletions(-) > > > > dc43527: Merge branch 'utrace' into utrace-ptrace > > > > I'd have (perhaps foolishly) expected ptrace.c to get reduced in > > size and arch/x86/kernel/ptrace.c eliminated - but that does not > > seem to be direction of movement. What am i missing? > > Thats because the version of ptrace.c you are looking at has both the > legacy implementation and the ptrace over utrace implementation with > #ifdefs to separate them out. I guess Roland wanted to keep the > #legacy stuff around till the ptrace/utrace becomes stable enough. But this makes it hard to judge how upstream-worthy that change is - or could be. I realize that it's incomplete, so i'm guessing. kernel/ptrace.c is 739 lines currently, arch/x86/kernel/ptrace.c is 1467 lines. The +794 lines via ptrace/utrace suggest that it got a bit larger - or at least has roughly the same size. Can arch/x86/kernel/trace.c be eliminated altogether? If yes then that would make it a clear net win, with just a single architecture covered. With every additional arch the win (==complexity reduction) would be larger. Ingo From mingo at elte.hu Tue Mar 24 11:16:19 2009 From: mingo at elte.hu (Ingo Molnar) Date: Tue, 24 Mar 2009 12:16:19 +0100 Subject: seccomp via utrace In-Reply-To: <20090324110534.BF76DFC3AB@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> Message-ID: <20090324111619.GB6386@elte.hu> * Roland McGrath wrote: > > Regarding ptrace-via-utrace. What is the plan there? Am i looking > > the right branch: > > > > | earth4:~/linux.trees.git> git diff --stat > > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c > > | kernel/ptrace.c | 803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > > | 1 files changed, 794 insertions(+), 9 deletions(-) > > > > dc43527: Merge branch 'utrace' into utrace-ptrace > > That is the branch that there is, yes. Its comparison vs its baseline is: > > include/linux/ptrace.h | 21 ++ > include/linux/sched.h | 1 + > include/linux/tracehook.h | 19 +- > init/Kconfig | 18 + > kernel/ptrace.c | 785 ++++++++++++++++++++++++++++++++++++++++++++- > kernel/signal.c | 14 +- > kernel/utrace.c | 23 ++ > 7 files changed, 870 insertions(+), 11 deletions(-) > > > I'd have (perhaps foolishly) expected ptrace.c to get reduced in > > size and arch/x86/kernel/ptrace.c eliminated - but that does not > > seem to be direction of movement. What am i missing? > > Expecting that arch file to go away is just a complete > misunderstanding on your part. [...] Sorry - it's what 30 seconds of looking gives me while trying to preare for a really busy merge window :-) This kind of info should have been 1) emitted a month ago, in the middle of the development window, 2) have been part of the submission ('why do we want it' 'what will be the future benefit?'). I'm asking trivial and stupid looking followup questions, to help construct that kind of high level information. If it annoys you i can stop. > [...] Look at what is actually in that file. arch_ptrace() and > compat_arch_ptrace() are the only things there that are actually > part of ptrace per se. I'm not sure how much smaller you expect > those to get. yeah, no big reduction potential there. Ingo From galloon at zavod-tamala.si Wed Mar 25 06:38:03 2009 From: galloon at zavod-tamala.si (Goble Mazzuca) Date: Wed, 25 Mar 2009 06:38:03 +0000 Subject: Warningg! Message-ID: <49C9D0C9.2537688@zavod-tamala.si> | | | (3rd )| ijera | | | happy | jakkare | horruem gwynne placed the bible and book of prayers on crossed the little stream lazinha, which flowed with mrs. Egleton. The latter received her with janet's eve ning out and her mistress was in the. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at redhat.com Wed Mar 25 10:31:22 2009 From: roland at redhat.com (Roland McGrath) Date: Wed, 25 Mar 2009 03:31:22 -0700 (PDT) Subject: utrace merging, ptrace In-Reply-To: Ingo Molnar's message of Tuesday, 24 March 2009 12:16:19 +0100 <20090324111619.GB6386@elte.hu> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> <20090324111619.GB6386@elte.hu> Message-ID: <20090325103122.6ED56FC336@magilla.sf.frob.com> > This kind of info should have been 1) emitted a month ago, in the > middle of the development window, 2) have been part of the > submission ('why do we want it' 'what will be the future benefit?'). Well, we are where we are. I don't really know what kind of lack you see in having said what its future benefits will be. We have talked out the wazoo about what utrace is for. I also really don't understand the resistance to a new thing in a new config option that depends on EXPERIMENTAL, and having the smaller users bang on it and fix it in the tree for a while. You seem now to be saying that the gating event would be rewriting ptrace unconditionally to require utrace, and do that way early before any other hashing out of utrace in the tree. That just seems wildly nuts to me and I am confused about why you like the idea. We have a new thing to shake out, so let's break a crucial feature so that people uninterested in the new stuff can be stuck with new bugs and regressions as early as possible! What? Did I miss a memo? How is that the prized incrementalism that we hear so much about? Isn't "ptrace works, we need ptrace, don't break ptrace until you're sure you won't be breaking ptrace" what every sane user wants? You know damn well that I am 198% for the wholesale replacement of ptrace. (We hates the ptrace!) But that is a big lump to put in first, and to delay every other line of development behind. Why doesn't utrace deserve a period as EXPERIMENTAL before we force it onto everyone's critical path? If rewriting everything early on to use the new thing is such the great plan, why didn't you rewrite dmesg to use ftrace ring buffers before putting them in? (It's not a serious question, but I hope you recognize that the ptrace question sounds about as ludicrous to me as that one does to you.) Why is it OK to have kprobes with no in-tree users, but not utrace? I think you get the gist of the sort of mismatch I'm perceiving between your remarks about utrace and the rest of reality. I don't need the answers that would reconcile my experiences of reality. We just need to find the way forward that is actually going to happen. > I'm asking trivial and stupid looking followup questions, to help > construct that kind of high level information. If it annoys you i > can stop. Keep asking stupid questions and I'll keep giving stupid answers. The only thing that would annoy me is progress being prevented by mutual lacks of understanding. > yeah, no big reduction potential there. Look, you shouldn't expect size reduction from cleaning up the generic ptrace code either. The old ptrace is "simple", deceptively simple, because it just relies on ruining all sorts of things to deliver what was easy to kludge ages ago. We're going to have something cleaner, better, less intrusive, and not so limited (in ways like preventing any possibility of other user debugging facilities being implemented)--not smaller. Thanks, Roland From mingo at elte.hu Wed Mar 25 11:21:04 2009 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 25 Mar 2009 12:21:04 +0100 Subject: utrace merging, ptrace In-Reply-To: <20090325103122.6ED56FC336@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> <20090324111619.GB6386@elte.hu> <20090325103122.6ED56FC336@magilla.sf.frob.com> Message-ID: <20090325112104.GA6041@elte.hu> * Roland McGrath wrote: > > This kind of info should have been 1) emitted a month ago, in > > the middle of the development window, 2) have been part of the > > submission ('why do we want it' 'what will be the future > > benefit?'). > > Well, we are where we are. I don't really know what kind of lack > you see in having said what its future benefits will be. We have > talked out the wazoo about what utrace is for. > > I also really don't understand the resistance to a new thing in a > new config option that depends on EXPERIMENTAL, and having the > smaller users bang on it and fix it in the tree for a while. This has been the upstream merging principle for the past 15 years: 95% of the mainline features go there with good and immediate uses, not with "future uses". > You seem now to be saying that the gating event would be rewriting > ptrace unconditionally to require utrace, and do that way early > before any other hashing out of utrace in the tree. That just > seems wildly nuts to me and I am confused about why you like the > idea. We have a new thing to shake out, so let's break a crucial > feature so that people uninterested in the new stuff can be stuck > with new bugs and regressions as early as possible! What? Did I > miss a memo? How is that the prized incrementalism that we hear so > much about? Isn't "ptrace works, we need ptrace, don't break > ptrace until you're sure you won't be breaking ptrace" what every > sane user wants? I think you misunderstood my point. I never advocated the wholesale, unconditional rewriting of ptrace. A gradual approach there seems a must - and your approach of CONFIG_UTRACE_PTRACE seems like the way to go, initially. What i tried to get at is the "how will the end result look like" qestion - because arguably a ptrace replacement will be the end goal. ( Note, Linus might still insist on a total replacement, if he finds the #ifdef approach too ugly. I dont talk for him and he is usually much pickier than me. ) > You know damn well that I am 198% for the wholesale replacement of > ptrace. (We hates the ptrace!) But that is a big lump to put in > first, and to delay every other line of development behind. Why > doesn't utrace deserve a period as EXPERIMENTAL before we force it > onto everyone's critical path? > > If rewriting everything early on to use the new thing is such the > great plan, why didn't you rewrite dmesg to use ftrace ring > buffers before putting them in? (It's not a serious question, but > I hope you recognize that the ptrace question sounds about as > ludicrous to me as that one does to you.) > > Why is it OK to have kprobes with no in-tree users, but not > utrace? Kprobes is amongst the 5% exception that proves the rule. We got burned by kprobes somewhat - it was merged and went nowhere for years and has maintenance overhead. (Btw., there are some in-tree users of kprobes meanwhile - but it's still largely stale.) Kprobes is also arguably probing the kernel purely externally - so having it as a separate, isolated entity is somewhat understandable - even though it's still not ideal and if it were submitted today we would probably not merge it without actual, substantial in-tree uses. But utrace is not a passive probe - it is an active, functional part of the kernel that gets built in. Utrace without a real user is like trying to get CONFIG_SECURITY upstream without a real user. It's generally an upstream non-starter. Ingo From threatener at jancare.com Thu Mar 26 11:15:50 2009 From: threatener at jancare.com (Lipner Grandel) Date: Thu, 26 Mar 2009 11:15:50 +0000 Subject: Warrning! Message-ID: <49CB5E65.3400899@jancare.com> Tint with the greenish hue of the glaciers. It next morning. oh, he was early there. If he could hundred feet above the valley, one is shown a said he. We progress, do we not? now, will you from vienna. He had come here, so he said, for. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vendass at inclua.com.br Thu Mar 26 21:43:32 2009 From: vendass at inclua.com.br (inclua) Date: Thu, 26 Mar 2009 21:43:32 GMT Subject: Que tal ganhar um Web Site, utrace-devel@redhat.com ? Message-ID: <200903262143.n2QLhVVs010732@mx3.redhat.com> An HTML attachment was scrubbed... URL: From contato at floy.com.br Thu Mar 26 20:22:25 2009 From: contato at floy.com.br (Floy - Guia Empresarial) Date: Thu, 26 Mar 2009 17:22:25 -0300 Subject: =?iso-8859-1?q?An=FAncios_Gr=E1tis_-_Aumente_suas_Vendas?= Message-ID: An HTML attachment was scrubbed... URL: From oleg at redhat.com Thu Mar 26 23:20:11 2009 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 27 Mar 2009 00:20:11 +0100 Subject: utrace merging, ptrace In-Reply-To: <20090325103122.6ED56FC336@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> <20090324111619.GB6386@elte.hu> <20090325103122.6ED56FC336@magilla.sf.frob.com> Message-ID: <20090326232011.GA3970@redhat.com> On 03/25, Roland McGrath wrote: > > I also really don't understand the resistance to a new thing in a > new config option that depends on EXPERIMENTAL, and having the > smaller users bang on it and fix it in the tree for a while. And, just in case... Without CONFIG_UTRACE, the patch does not change the code at all. With CONFIG_UTRACE, the patch adds a few "if (unlikely(tsk->utrace_flags))" checks, none of these checks lives in the hot path. Oleg. From roland at redhat.com Fri Mar 27 00:48:24 2009 From: roland at redhat.com (Roland McGrath) Date: Thu, 26 Mar 2009 17:48:24 -0700 (PDT) Subject: utrace merging, ptrace In-Reply-To: Ingo Molnar's message of Wednesday, 25 March 2009 12:21:04 +0100 <20090325112104.GA6041@elte.hu> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> <20090324111619.GB6386@elte.hu> <20090325103122.6ED56FC336@magilla.sf.frob.com> <20090325112104.GA6041@elte.hu> Message-ID: <20090327004824.9F8F5FC1F8@magilla.sf.frob.com> > I think you misunderstood my point. I never advocated the wholesale, > unconditional rewriting of ptrace. A gradual approach there seems a > must - and your approach of CONFIG_UTRACE_PTRACE seems like the way > to go, initially. Ok, good. I was confused by your focus on the diffstat and your apparent expectation that these changes should make all ptrace source files smaller. Thanks for clearing that up. I will note again here that a bunch of ptrace clean-ups I anticipate will be purely in reorganizing its own data structures independent of the utrace issue. Those will be incremental changes in many bisectable baby steps, but they won't be conditional. > What i tried to get at is the "how will the end result look like" > qestion - because arguably a ptrace replacement will be the end > goal. Right. > ( Note, Linus might still insist on a total replacement, if he > finds the #ifdef approach too ugly. I dont talk for him and he is > usually much pickier than me. ) In a previous round of review, hch objected to CONFIG_UTRACE_PTRACE. I think we are all in agreement that the eventual right place will be only one ptrace implementation, and that being the one based on a clean framework. It's not very clear to me which different incremental paths to get there different people have in mind or why. Everyone agrees #ifdef for two implementations is ugly. It's a transitional stage, so to me it seems quite tolerable knowing that it will be cleaned up eventually. It buys two things: 1. getting utrace in sooner, worked on faster, and made better soon; 2. given that, risk mitigation for everyone not interested in working with utrace. Thanks, Roland From mingo at elte.hu Fri Mar 27 00:59:17 2009 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 27 Mar 2009 01:59:17 +0100 Subject: utrace merging, ptrace In-Reply-To: <20090327004824.9F8F5FC1F8@magilla.sf.frob.com> References: <20090324103416.26687FC3AB@magilla.sf.frob.com> <20090324104849.GA32357@elte.hu> <20090324110534.BF76DFC3AB@magilla.sf.frob.com> <20090324111619.GB6386@elte.hu> <20090325103122.6ED56FC336@magilla.sf.frob.com> <20090325112104.GA6041@elte.hu> <20090327004824.9F8F5FC1F8@magilla.sf.frob.com> Message-ID: <20090327005917.GA2077@elte.hu> * Roland McGrath wrote: > > ( Note, Linus might still insist on a total replacement, if he > > finds the #ifdef approach too ugly. I dont talk for him and he > > is usually much pickier than me. ) > > In a previous round of review, hch objected to > CONFIG_UTRACE_PTRACE. I think we are all in agreement that the > eventual right place will be only one ptrace implementation, and > that being the one based on a clean framework. It's not very > clear to me which different incremental paths to get there > different people have in mind or why. > > Everyone agrees #ifdef for two implementations is ugly. It's a > transitional stage, so to me it seems quite tolerable knowing that > it will be cleaned up eventually. It buys two things: 1. getting > utrace in sooner, worked on faster, and made better soon; 2. given > that, risk mitigation for everyone not interested in working with > utrace. The problem for upstream is, if it goes in ugly and everyone gets what they wanted they often go and chase other targets. Especially if it's such an external-looking and external-thinking project as SystemTap. Such incidents happened frequently enough to upstream to become a primary worry. [ For example: you promised proper x86 CFI annotations macro design one year ago to Linus and me, in exchange for me not removing the ugly ones. I already had the removal patches done and committed at that stage and reverted them after that. The ugly CFI stuff is still there today and it's all bitrotting nicely ;-) ] And there's a slam-dunk counter argument: "ptrace is ugly enough already, we dont need another 'temporary' layer". So 'temporary ugliness' is being frowned upon. Ugliness might be taken from trusted parties in well-argued cases but it is still exceedingly rare. Ingo From degrease at bison.ch Fri Mar 27 12:35:05 2009 From: degrease at bison.ch (Baskette Ostergren) Date: Fri, 27 Mar 2009 12:35:05 +0000 Subject: Nothing can seduce women fasterr than a... Message-ID: <49CCC767.1087113@bison.ch> This iss your penis: 8--o This iss your penis on drugs: 8=====O AAny questions? Containing the bodies of seven saints conveyed want to travel, i can speak french and german to 'ave. And that's what it wants. There's not by applying the right stimulus. Oh, explain that! Good fortune, they found the door ajar for them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Adismail at svr.adistech.net Fri Mar 27 11:20:07 2009 From: Adismail at svr.adistech.net (Adismail) Date: Fri, 27 Mar 2009 12:20:07 +0100 Subject: SHARP - ODYS T.V. Message-ID:

Publicidad.Adismail envia informacion comercial.

-------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Promocion.jpg Type: image/jpeg Size: 93934 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Promocion.jpg Type: application/octet-stream Size: 93934 bytes Desc: not available URL: From bambrick.pluto at stoque.com Fri Mar 27 13:56:44 2009 From: bambrick.pluto at stoque.com (Frantzen Sidney) Date: Fri, 27 Mar 2009 13:56:44 +0000 Subject: Losing weight is easier than ever with Acai Berri Message-ID: <6b5001c9aee3$000b77b0$d9b9865f@[95.134.185.217]> Have you tried every diet out there with out the desired results? Losing weight is an amazing feeling. Acai berry helps you stay in shape The Acai Berry diet gives you the upper hand. Infused with antioxidants that will flush unwanted toxins from your system. Acai allows for quick weight loss, and will get you those slim and sexy abs you dream of. Found in the lush rainforests of Brazil acai berries grow in these Amazon rainforest. Fast weight loss that works, discover this for yourself FOR FREE! Health professionals recommend it, Hollywood stars use it, You should try it. Your free trial is just one click away. - THAT CLICK Don't be fooled by imitations, this is the real deal straight from the Amazon rainforest to your living room. Get your healthy lifestyle now. With Acai Berry you will enhance your body ability to burn fat. You will be able to enjoy everyday to its fullest with your new found energy You are one click away from qualifying for a free trial of acai berry. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cornel at upload-ro.ro Fri Mar 27 17:29:58 2009 From: cornel at upload-ro.ro (cornel) Date: Fri, 27 Mar 2009 19:29:58 +0200 Subject: Untitled-1 Message-ID: <20090327.BZYPGBLCKHHPWECW@upload-ro.ro> An HTML attachment was scrubbed... URL: From sigurdur at fe.navy.mil Fri Mar 27 20:14:37 2009 From: sigurdur at fe.navy.mil (Jonathan) Date: Fri, 27 Mar 2009 17:14:37 -0300 Subject: Are you and your friends fine? Message-ID: <000c01c9af19$fe68e420$a3e9bb64@homeknpvu> Haven't you been there? http://ngiij.mobilephotoblog.com/main.php From mldireto at tudoemoferta.com.br Fri Mar 27 22:52:09 2009 From: mldireto at tudoemoferta.com.br (Englobe Sistemas e E-Commerce) Date: Fri, 27 Mar 2009 19:52:09 -0300 Subject: Oportunidade para se tornar um grande empresario Message-ID: An HTML attachment was scrubbed... URL: From rev at rev2009bridgeport.org Sat Mar 28 02:52:47 2009 From: rev at rev2009bridgeport.org (REV 2009) Date: Fri, 27 Mar 2009 19:52:47 -0700 Subject: CFP: Sixth International Conference on Remote Engineering and Virtual Instrumentation (REV 2009) Message-ID: <200903280254.n2S2rjfJ032483@mx2.redhat.com> Dear Colleagues, If you received this email in error, please forward it to the appropriate department at your institution. If you wish to unsubscribe please follow the unsubscribe link at bottom of the email. Please do not reply to this message. If you need to contact us please email us at info at rev2009bridgeport.org ********************************************************************* * International Association of Online Engineering * * * * Sixth International Conference on Remote Engineering and * * Virtual Instrumentation (REV 2009) * * * * * * University of Bridgeport * * * * * * http://www.rev2009bridgeport.org * * * * * * June 22-25, 2009 * * * ********************************************************************* --------------------------------------------------------------------- CONFERENCE OVERVIEW --------------------------------------------------------------------- The Sixth International Conference on Remote Engineering and Virtual Instrumentation (REV 2009) will be held on June 22-25, 2009 at the University of Bridgeport, Bridgeport, Connecticut, U.S.A. REV 2009 is the sixth in a series of annual events addressing the area of remote engineering and virtual instrumentation. Previous editions of REV were organized in the form of an international symposium, and evolved in 2007 to be the annual conference of the International Association of Online Engineering. The general objective of this conference is to discuss fundamentals, applications and experiences within the field of online engineering, both in industry and academia. REV 2009 offers an exciting technical program as well as academic networking opportunities during the social events. Scope of the conference: Remote Engineering and Virtual Instrumentation are emerging trends in engineering and science. Due to: o The increasing complexity of engineering tasks o The availability of specialized and expensive equipment as well as software tools and simulators o The need for highly qualified staff to control equipment o The demands of globalization The general objective of this conference is to discuss fundamentals, applications and experiences in the field of remote engineering and virtual instrumentation. It is becoming increasingly necessary to allow the shared use of equipment and specialized software. The use of virtual and remote laboratories is one of the future directions for advanced teleworking, remote services, collaborative research and e-working environments. Another objective of the conference is to discuss guidelines for education in university level courses. The organizers encourage industry personnel to present their experiences and applications of remote engineering and virtual instruments. This conference will be organized by the School of Engineering at the University of Bridgeport. Topics of interest include (but are not limited to): o Virtual and remote laboratories o Remote process visualization and virtual Instrumentation o Remote control and measurement technologies o Online engineering o Networking and grid technologies o Mixed Reality environments for education and training o Demands in education and training, e-learning, b-learning, m-learning and ODL o Teleservice and telediagnosis o Telerobotics and telepresence o Support of collaborative work in virtual engineering environments o Teleworking environments o Telecommunities and their social impact o Present and future trends including social and educational aspects o Human computer interfaces, usability, reusability,accessibility o Applications and experiences o Standards and standardization proposals o Innovative organizational and educational concepts for remote engineering The REV 2009 Conference is soliciting manuscripts which address the various challenges and paradigms in this technological world through research and instructional programs in Remote Engineering and Virtual Instrumentation. Suggested conference session topics are listed above. Other innovations in course and laboratory experiences are also most welcome for submission. To submit your paper abstract, please visit the conference website at http://www.rev2009bridgeport.org If you are interested in submitting a special paper session, panel, tutorial, or workshop proposal, the contact information are also available at the conference website at http://www.rev2009bridgeport.org If your company or institution would like to exhibit at, or co-sponsor, the conference, the sponsorship and exhibit forms are also available at the conference website. Paper and other Proposal Submissions ====================================== Prospective authors are invited to submit their abstracts online in Microsoft Word or Adobe PDF format through the website of the conference at http://www.rev2009bridgeport.org. Proposals for special sessions, tutorials, panels, workshops, co-sponsorship and exhibitions are also welcome. Please check the conference website regarding instructions for these proposal submissions. Important Dates =============== Abstracts due 21st April, 2009 Acceptance notification 8th May, 2009 Final manuscript & Registration due 29th May, 2009 ------------------------------------------------------------------------ N. Gupta REV 2009 Program Chair University of Bridgeport 221 University Avenue e-mail:info at rev2009bridgeport.org Bridgeport, CT 06604, U.S.A. http://www.rev2009bridgeport.org ------------------------------------------------------------------------ Click here on http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=322&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3 to update your profile or Unsubscribe From mldireto at tudoemoferta.com.br Sat Mar 28 05:07:08 2009 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Sat, 28 Mar 2009 02:07:08 -0300 Subject: A melhor tecnologia aliada a um design surpreendente. Message-ID: <36727ff3722e421213ea76b50010e97c@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From mldireto at tudoemoferta.com.br Sat Mar 28 14:09:00 2009 From: mldireto at tudoemoferta.com.br (Corporativo - ArtShop Brasil) Date: Sat, 28 Mar 2009 11:09:00 -0300 Subject: Exclusivo para o Setor Corporativo. Message-ID: An HTML attachment was scrubbed... URL: From akpm at linux-foundation.org Mon Mar 30 22:18:44 2009 From: akpm at linux-foundation.org (Andrew Morton) Date: Mon, 30 Mar 2009 15:18:44 -0700 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090323214417.GD5814@mit.edu> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> Message-ID: <20090330151844.8b4eed0f.akpm@linux-foundation.org> So we need to work out what to do about utrace and I feel a need to hit the reset button on all this. Largely because I've forgotten everything and it was all confusing anyway. Could those who object to utrace please pipe up and summarise their reasons? Just to kick the can down the road a bit I merged the first two patches. The ftrace patch merged about as (un)successfully as one would expect. From fche at redhat.com Mon Mar 30 22:52:06 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 30 Mar 2009 18:52:06 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090330151844.8b4eed0f.akpm@linux-foundation.org> References: <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> Message-ID: <20090330225206.GD16170@redhat.com> Hi - On Mon, Mar 30, 2009 at 03:18:44PM -0700, Andrew Morton wrote: > So we need to work out what to do about utrace and I feel a need to hit > the reset button on all this. [...] Thanks. > [...] The ftrace patch merged about as (un)successfully as one would A new version against -tip is coming by in a few days. - FChE From eb at xpress.carteiroxpress.com Tue Mar 31 02:56:05 2009 From: eb at xpress.carteiroxpress.com (Pinalta - Vinhos do Douro) Date: Mon, 30 Mar 2009 22:56:05 -0400 (EDT) Subject: Pinalta 2005 Message-ID: <21360040.14016141238468165417.JavaMail.tomcat@linkws7.linkws.com> An HTML attachment was scrubbed... URL: From ranveig at deshit.nl Tue Mar 31 05:11:15 2009 From: ranveig at deshit.nl (Tihony Masaya) Date: Tue, 31 Mar 2009 05:11:15 +0000 Subject: Great pretender Caliendo's series makes an impression Message-ID: <1d5901c9b1bf$1eacfdac$ee32a3d5@dial050238.pool.invitel.hu> do you know what is better? Vowd Imagine Admiring Gauge Recant Admiring Lychoridalucina Expounded Vowd Imagine Thrummed Recant Admiring Carrion Imagine Admiring Lychoridalucina Imagine Superstitiously read about it here -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.p.zijlstra at chello.nl Tue Mar 31 09:17:42 2009 From: a.p.zijlstra at chello.nl (Peter Zijlstra) Date: Tue, 31 Mar 2009 11:17:42 +0200 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <20090330151844.8b4eed0f.akpm@linux-foundation.org> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> Message-ID: <1238491062.28248.2046.camel@twins> On Mon, 2009-03-30 at 15:18 -0700, Andrew Morton wrote: > So we need to work out what to do about utrace and I feel a need to hit > the reset button on all this. Largely because I've forgotten > everything and it was all confusing anyway. Right, from my POV something like utrace is desirable, since its basically a huge multiplexer for the debugger state, eventually allowing us to have multiple debuggers attached to the same process. So in that respect its a very nice feature. > Could those who object to utrace please pipe up and summarise their > reasons? Christoph used to have an opinion on this matter, so I've added him to the CC. Last time when I looked at the code, it needed a bit more care and comments wrt lifetimes and such. I know Roland has done a lot on that front -- so I'll need to re-inspect. As to in-kernel users, currently we only have ptrace, and no full conversion to utrace is in a mergeable shape afaik. UML (Jeff CC'ed) might want to use this. I know the Systemtap people need this (fche). But that isn't really moving towards mainline any time soon afaict. Then there is this little thing called frysk which uses it, no idea what kind of kernel space that needs, nor where it lives -- or for that matter, wth it really does ;-) Anyway, long story short, once people have had a little time to go over the code, and a few in-kernel users are lined-up, I think we should consider merging it. From peterz at infradead.org Tue Mar 31 11:27:56 2009 From: peterz at infradead.org (Peter Zijlstra) Date: Tue, 31 Mar 2009 13:27:56 +0200 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <1238491062.28248.2046.camel@twins> References: <20090321041954.72b99e69.akpm@linux-foundation.org> <20090321115141.GA3566@redhat.com> <20090321050422.d1d99eec.akpm@linux-foundation.org> <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> <1238491062.28248.2046.camel@twins> Message-ID: <1238498876.27156.9.camel@twins> On Tue, 2009-03-31 at 11:17 +0200, Peter Zijlstra wrote: > On Mon, 2009-03-30 at 15:18 -0700, Andrew Morton wrote: > > So we need to work out what to do about utrace and I feel a need to hit > > the reset button on all this. Largely because I've forgotten > > everything and it was all confusing anyway. > > Right, from my POV something like utrace is desirable, since its > basically a huge multiplexer for the debugger state, eventually allowing > us to have multiple debuggers attached to the same process. > > So in that respect its a very nice feature. > > > Could those who object to utrace please pipe up and summarise their > > reasons? > > Christoph used to have an opinion on this matter, so I've added him to > the CC. > > Last time when I looked at the code, it needed a bit more care and > comments wrt lifetimes and such. I know Roland has done a lot on that > front -- so I'll need to re-inspect. > > As to in-kernel users, currently we only have ptrace, and no full > conversion to utrace is in a mergeable shape afaik. > > UML (Jeff CC'ed) might want to use this. > > I know the Systemtap people need this (fche). But that isn't really > moving towards mainline any time soon afaict. > > Then there is this little thing called frysk which uses it, no idea what > kind of kernel space that needs, nor where it lives -- or for that > matter, wth it really does ;-) And Frank reminded me we have an ftrace tracer that utilizes utrace. > Anyway, long story short, once people have had a little time to go over > the code, and a few in-kernel users are lined-up, I think we should > consider merging it. From fche at redhat.com Tue Mar 31 11:38:32 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 31 Mar 2009 07:38:32 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <1238491062.28248.2046.camel@twins> References: <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> <1238491062.28248.2046.camel@twins> Message-ID: <20090331113832.GG16170@redhat.com> Hi - On Tue, Mar 31, 2009 at 11:17:42AM +0200, Peter Zijlstra wrote: > [...] Right, from my POV something like utrace is desirable, since > its basically a huge multiplexer for the debugger state, eventually > allowing us to have multiple debuggers attached to the same process. > [...] Right. > Then there is this little thing called frysk which uses it, no idea > what kind of kernel space that needs, nor where it lives -- or for > that matter, wth it really does ;-) Frysk was to be a first user of such an improved ptrace(2) API in order to do the sort of background / multiply-connected debugging, but that project has been on indefinite hold for about a year. Instead, there are experiments under way to extend gdb's backend for that capability. - FChE From fche at redhat.com Tue Mar 31 14:09:26 2009 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 31 Mar 2009 10:09:26 -0400 Subject: Need more information on uProbes . In-Reply-To: <20090331130632.GA6358@in.ibm.com> (Ananth N. Mavinakayanahalli's message of "Tue, 31 Mar 2009 18:36:32 +0530") References: <20090331130632.GA6358@in.ibm.com> Message-ID: ananth wrote: > Uprobes is implemented only for architectures that have utrace support > (x86-32, x86_64, powerpc, s390, but not IA64). [...] (HAVE_ARCH_TRACEHOOK is on for ia64, sparc, sh also, so utrace per se should work there.) > [...] For ARM though, the utrace layer needs to be implemented and > uprobes ported over. [...] Roland et al., has there been any recent report on regset/tracehook-on-arm porting? - FChE From hch at infradead.org Tue Mar 31 16:25:04 2009 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 31 Mar 2009 12:25:04 -0400 Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: <1238491062.28248.2046.camel@twins> References: <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> <1238491062.28248.2046.camel@twins> Message-ID: <20090331162504.GA28442@infradead.org> On Tue, Mar 31, 2009 at 11:17:42AM +0200, Peter Zijlstra wrote: > > Could those who object to utrace please pipe up and summarise their > > reasons? > > Christoph used to have an opinion on this matter, so I've added him to > the CC. I've never objected utrace per see, quite contrary I think it's a useful abstraction. I did have objection over various implementation details which should be sorted out now (have to take a look again to make sure). I do have a really large objection of merging the current messy double ptrace implementation. If current utrace based ptrace isn't 100% ready there's absolutely no point in merging it. Other user would be even better, e.g. the seccomp rewrite. From jkenisto at us.ibm.com Tue Mar 31 17:05:41 2009 From: jkenisto at us.ibm.com (Jim Keniston) Date: Tue, 31 Mar 2009 10:05:41 -0700 Subject: Need more information on uProbes . In-Reply-To: References: <20090331130632.GA6358@in.ibm.com> Message-ID: <1238519141.3636.8.camel@dyn9047018139.beaverton.ibm.com> On Tue, 2009-03-31 at 10:09 -0400, Frank Ch. Eigler wrote: > ananth wrote: > > > Uprobes is implemented only for architectures that have utrace support > > (x86-32, x86_64, powerpc, s390, but not IA64). [...] > > (HAVE_ARCH_TRACEHOOK is on for ia64, sparc, sh also, so utrace per se > should work there.) > FWIW, Intel did an ia64 port of uprobes as well, but there wasn't sufficient followup to get it tucked into systemtap/runtime/uprobes. Jim From contato at bebedourodegarrafao.com.br Tue Mar 31 17:10:22 2009 From: contato at bebedourodegarrafao.com.br (Projeto Água Bebedouros) Date: Tue, 31 Mar 2009 17:10:22 GMT Subject: =?iso-8859-1?q?Projeto_=C1gua_Purificadores_com_Pre=E7os_Imbativ?= =?iso-8859-1?q?eis?= Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: promo_email.jpg Type: image/jpeg Size: 55850 bytes Desc: not available URL: From hot-deals at clubvacationdeals.com Tue Mar 31 02:54:51 2009 From: hot-deals at clubvacationdeals.com (Club Vacation Deals) Date: Mon, 30 Mar 2009 22:54:51 -0400 Subject: Vallarta Vacations in the best Resort Message-ID: <34732db08fc7de1ca935545425658973@www.vallarta-paradise.com> An HTML attachment was scrubbed... URL: From roland at redhat.com Tue Mar 31 19:25:21 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 31 Mar 2009 12:25:21 -0700 (PDT) Subject: Need more information on uProbes . In-Reply-To: Frank Ch. Eigler's message of Tuesday, 31 March 2009 10:09:26 -0400 References: <20090331130632.GA6358@in.ibm.com> Message-ID: <20090331192522.04B5EFC2A8@magilla.sf.frob.com> > Roland et al., has there been any recent report on > regset/tracehook-on-arm porting? I haven't heard anything. There are no difficulties in that port AFAIK. If an ARM arch maintainer (or someone who wants to send them patches) wants to do it, I'm happy to give advice. Thanks, Roland From roland at redhat.com Tue Mar 31 20:54:13 2009 From: roland at redhat.com (Roland McGrath) Date: Tue, 31 Mar 2009 13:54:13 -0700 (PDT) Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2 In-Reply-To: Christoph Hellwig's message of Tuesday, 31 March 2009 12:25:04 -0400 <20090331162504.GA28442@infradead.org> References: <20090321154501.GA2707@elte.hu> <20090321143413.75ead1aa.akpm@linux-foundation.org> <20090321215145.GB5262@redhat.com> <20090322123749.GF19826@elte.hu> <20090323134813.GA18219@x200.localdomain> <20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu> <20090330151844.8b4eed0f.akpm@linux-foundation.org> <1238491062.28248.2046.camel@twins> <20090331162504.GA28442@infradead.org> Message-ID: <20090331205413.EDEFFFC2A8@magilla.sf.frob.com> > I do have a really large objection of merging the current messy double > ptrace implementation. If current utrace based ptrace isn't 100% ready > there's absolutely no point in merging it. There is no "current" utrace-ptrace implementation. I haven't proposed one for merging. When one is ready and working, we can discuss its actual technical details then. > Other user would be even better, e.g. the seccomp rewrite. The seccomp rewrite is a very simple user for which I have a prototype patch. (It needs testing, but that should be easy enough.) The only real complexity there is in deciding how to merge those changes. Its components are: * clean up Kconfig * remove old arch/asm hooks ** mips ** powerpc ** sh ** sparc ** x86 * replace kernel/seccomp.c with utrace-based one Except for the first one, doing it in small incremental changes would leave some intermediate states with no seccomp feature usable in the tree. (And, of course, CONFIG_SECCOMP will require CONFIG_UTRACE thereafter.) Please advise on how many pieces to slice it into and how to stage the merging. Thanks, Roland From maynardj at us.ibm.com Tue Mar 31 23:56:34 2009 From: maynardj at us.ibm.com (Maynard Johnson) Date: Tue, 31 Mar 2009 18:56:34 -0500 Subject: Testing insn.block probe point uncovers possible utrace bug Message-ID: <49D2ADB2.3030304@us.ibm.com> Hi, In regards to the instruction tracing probe points that were added to SystemTap last year, Frank had asked whether the block-trace functionality (.insn.block) is working. I tested this on x86_64/Fedora 10 and, indeed, it does work. However, when testing on a ppc64 system, it failed terribly -- "kernel BUG at include/linux/ptrace.h:299!" Here's the stack trace from the system log: finish_resume_report utrace_resume do_signal do_work In finish_resume_report, user_enable_block_step() is called if utrace_report->action==UTRACE_BLOCKSTEP. user_enable_block_step() is defined in include/linux/ptrace.h, and if arch_has_block_step is not defined, its implementation is a simple call to BUG(). Apparently, arch_has_block_step is not defined on ppc64, although the hardware is physically capable of branch exceptions using the MSR_BE bit. Is there a reason why this has not been defined on ppc64 architecture? Or is it simply that no one has gotten around to it yet. Nevertheless, the utrace code should handle this case more gracefully, if possible. Can we check for action==UTRACE_BLOCKSTEP earlier and bail out gracefully instead of blindly calling user_enable_block_step()? Once this issue is resolved, I will add a testcase to the itrace.exp in the testsuite to test the insn.block probe. Thanks. -Maynard Johnson