答复: [PATCH][RFC] audit: set wait time to zero when audit failed

Tue Sep 17 01:08:04 UTC 2019

> -----邮件原件-----
> 发件人: Paul Moore [mailto:paul at paul-moore.com]
> 发送时间: 2019年9月17日 6:52
> 收件人: Li,Rongqing <lirongqing at baidu.com>
> 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> 
> On Sun, Sep 15, 2019 at 10:55 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > > > if audit_log_start failed because queue is full, kauditd is
> > > > waiting the receiving queue empty, but no receiver, a task will be
> > > > forced to wait 60 seconds for each audited syscall, and it will be
> > > > hang for a very long time
> > > >
> > > > so at this condition, set the wait time to zero to reduce wait,
> > > > and restore wait time when audit works again
> > > >
> > > > it partially restore the commit 3197542482df ("audit: rework
> > > > audit_log_start()")
> > > >
> > > > Signed-off-by: Li RongQing <lirongqing at baidu.com>
> > > > Signed-off-by: Liang ZhiCheng <liangzhicheng at baidu.com>
> > > > ---
> > > > reboot is taking a very long time on my machine(centos 6u4 +kernel
> > > > 5.3) since TIF_SYSCALL_AUDIT is set by default, and when reboot,
> > > > userspace process which receiver audit message , will be killed,
> > > > and lead to that no user drain the audit queue
> > > >
> > > > git bitsect show it is caused by 3197542482df ("audit: rework
> > > > audit_log_start()")
> > > >
> > > >  kernel/audit.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > This is typically solved by increasing the backlog using the
> "audit_backlog_limit"
> > > kernel parameter (link to the docs below).
> >
> > It should be able to avoid my issue, but the default behaviors does not
> working for me; And not all have enough knowledge about audit, who maybe
> spend lots of effort to find the root cause, and estimate how large should be
> "audit_backlog_limit"
> 
> The pause/sleep behavior is desired behavior and is intended to help
> kauditd/auditd process the audit backlog on a busy system.  If we didn't sleep
> the current process and give kauditd/auditd a chance to flush the backlog when
> it was full, a lot of bad things could happen with respect to audit.  We
> generally select the backlog limit so that this is not a problem for most systems,
> although there will always be edge cases where the default does not work well;
> it is impossible to pick defaults that work well for every case.
> 

I just want to it as before 3197542482df ("audit: rework audit_log_start()"),
wait 60 seconds once if auditd/readaheaad-collector have some problem to
drain the audit backlog.

And once the auditd/readahead-collector recovers, restore the wait time to 60 seconds

> If you are not using audit, you can always disable it via the kernel command line,
> or at runtime (look at what Fedora does).
> 
> > > You might also want to investigate
> > > what is generating some many audit records prior to starting the
> > > audit daemon.
> >
> > It is /sbin/readahead-collector, in fact, we stop the auditd; We are doing a
> reboot test, which rebooting machine continue to test hardware/software.
> >
> > it is same as below:
> > auditctl -a always,exit -S all -F pid='xxx'
> > kill -s 19 `pidof auditd`
> >
> > then the audited task will be hung
> 
> So you are seeing this problem only when you run a test, or did you provide this
> as a reproducer?
> 

auditctl -a always,exit -S all -F ppid=`pidof sshd`
kill -s 19 `pidof auditd`
ssh root at 127.0.0.1 

then ssh will be hung forever

-Li RongQing

> --
> paul moore
> www.paul-moore.com