[RFC PATCH] audit-testsuite: improve our chances of losing records in lost_reset

Ondrej Mosnacek omosnace at redhat.com
Sat Dec 15 17:35:14 UTC 2018


On Fri, Dec 14, 2018 at 10:00 PM Richard Guy Briggs <rgb at redhat.com> wrote:
> On 2018-12-14 15:35, Paul Moore wrote:
> > On Fri, Dec 14, 2018 at 11:12 AM Richard Guy Briggs <rgb at redhat.com> wrote:
> > > On 2018-12-14 10:53, Paul Moore wrote:
> > > > On Thu, Dec 13, 2018 at 8:59 PM Richard Guy Briggs <rgb at redhat.com> wrote:
> > > > > On 2018-12-13 18:23, Paul Moore wrote:
> > > > > > On Thu, Dec 13, 2018 at 6:17 PM Paul Moore <paul at paul-moore.com> wrote:
> >
> > ...
> >
> > > > As an aside, have you spent any time trying to debug that wrong PID
> > > > problem?  While not strictly audit related, that seems like a pretty
> > > > serious Bash bug.  Or maybe it's a problem with the test?  I vaguely
> > > > remember a discussion between you and Ondrej and some confusion around
> > > > which Bash variable to use to fetch PIDs, but I may be mistaken.
> > >
> > > I haven't spent much time trying to debug that bash PID increment issue,
> > > but it perplexed me since it was the identical technique used in
> > > multiple tests in the audit-testsuite that has never caused an issue
> > > previously on any of the same set of test machines.  This was for the
> > > missing mount umount2 hang bug test
> > > https://github.com/linux-audit/audit-testsuite/pull/76
> >
> > Ah, I think I see the problem.  Unless I'm mistaken, you are talking
> > about the shell/Bash command where the tests print the PID using "echo
> > $$" or similar, yes?  As you likely already know, in Bash (and likely
> > other shells as well), $$ is the PID of the running Bash process; in
> > the two places where I saw it used in the existing audit-testsuite $$
> > is being used to reference the Bash instance itself or something it
> > exec's (which ends up inheriting the PID).  It looks like you are
> > using $$ as a way to capture the PID of a child process being spawned
> > by the shell, yes?  This may explain why you sometimes get lucky and
> > $$+1 works for the PID.
>
> I was always getting lucky with $$+1, but understandably uncomfortable
> with it since others weren't so fortunate.
>
> In the code that's there, that process is backgrounded, but I'm fairly
> certain I tested without that and carefully checked the options (-f and
> -s) to ensure it wasn't daemonizing or multithreading.  I was pretty
> careful to set up exactly the same conditions for running that process
> as other tests use, but that looks like the first thing to check next
> time I try it.  It wouldn't be the first time I've missed something
> obvious.

Right, let me chime into the PID issue again :) I am pretty sure the
code to get the child PID in Richard's umount2 test is wrong. I didn't
get back to that thread eventually because I tried to simplify the
fixed code to something better (and still 100% reliable), but I
couldn't get it to a state I'd like and eventually I forgot about it
and switched to other things...

So, let me try to explain the problem again. This is the relevant
snippet of code:

system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");

Let's rewrite this into pseudocode so it is clear what's actually going on:
1. run "cd $basedir/$clientdir"
2. write the PID of this bash process into "$stdout"
3. fork a child and in that child, execve() this command: "./$client
-f -s $tmpdir"

So the problem is that you log the PID of the parent bash process and
not the child that exec's into the "$client" program. Since the child
gets forked away very quickly after the parent bash process is
created, in 99.9% of the cases it gets a PID exactly one greater -
that's why your +1 trick works. The reason why a similar pattern
("echo $$; exec ...") works elsewhere in the testsuite is that there
is no forking going on in those cases (IIRC).

I just looked at the code in the lost_reset test that actually deals
with the forking scenario, and it uses the "$!" variable, which is the
correct way to get the PID of the last process forked by bash. I admit
I didn't know about this variable when I was pointing out the problem
in Richard's patch, but now I realize this is what I should have
suggested back then...

So, in the umount2 test this:

system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");

should be replaced with this (along with dropping the "$pid_fuse += 1;" line):

system("cd $basedir/$clientdir; exec ./$client -f -s $tmpdir & echo
\$! > $stdout;");

That said, I think the code in the lost_reset test is doing the right
thing and I wouldn't expect it to get the ping PID wrong.

Hope that helps,

--
Ondrej Mosnacek <omosnace at redhat dot com>
Associate Software Engineer, Security Technologies
Red Hat, Inc.




More information about the Linux-audit mailing list