答复: [PATCH][RFC] audit: set wait time to zero when audit failed

Thu Sep 19 01:50:05 UTC 2019

> -----邮件原件-----
> 发件人: Paul Moore [mailto:paul at paul-moore.com]
> 发送时间: 2019年9月18日 20:23
> 收件人: Li,Rongqing <lirongqing at baidu.com>
> 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> 
> On Tue, Sep 17, 2019 at 9:07 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > > -----邮件原件-----
> > > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > > 发送时间: 2019年9月18日 3:17
> > > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> > >
> > > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongqing at baidu.com>
> wrote:
> > > > > -----邮件原件-----
> > > > > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > > > > 发送时间: 2019年9月17日 6:52
> > > > > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > > > > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit
> > > > > failed
> 
> ...
> 
> > > > I just want to it as before 3197542482df ("audit: rework
> > > > audit_log_start()"), wait 60 seconds once if
> > > > auditd/readaheaad-collector have some problem to drain the audit
> backlog.
> > >
> > > The patch you mention fixed what was deemed to be buggy behavior; as
> > > mentioned previously in this thread I see no good reason to go back
> > > to the old behavior.
> > >
> > > > > If you are not using audit, you can always disable it via the
> > > > > kernel command line, or at runtime (look at what Fedora does).
> > > > >
> > > > > > > You might also want to investigate what is generating some
> > > > > > > many audit records prior to starting the audit daemon.
> > > > > >
> > > > > > It is /sbin/readahead-collector, in fact, we stop the auditd;
> > > > > > We are doing a
> > > > > reboot test, which rebooting machine continue to test
> hardware/software.
> > > > > >
> > > > > > it is same as below:
> > > > > > auditctl -a always,exit -S all -F pid='xxx'
> > > > > > kill -s 19 `pidof auditd`
> > > > > >
> > > > > > then the audited task will be hung
> > > > >
> > > > > So you are seeing this problem only when you run a test, or did
> > > > > you provide this as a reproducer?
> > > >
> > > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19
> > > > `pidof auditd` ssh root at 127.0.0.1
> > > >
> > > > then ssh will be hung forever
> > >
> > > That is expected behavior.  You are putting a massive audit load on
> > > the system by telling the kernel to audit every syscall that sshd
> > > makes, then you are intentionally killing the audit daemon and attempting
> to ssh into the system.
> > > The proper fix(es) here would be to 1) set reasonable audit rules
> > > and/or 2) use an init system that monitors and restarts auditd when
> > > it fails (systemd has this capability, I believe some others do as well).
> >
> > Both are not working.
> > The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not
> restart it.
> > Even if with little audit rules, after multiple accesses, the backlog
> > will full due to no receiver
> 
> Fair point, however I still stand by my previous comments that there are
> runtime configuration knobs which can mitigate this problem if it is something
> you are concerned about.  Depending on the situation, you can either increase
> the backlog to deal with transient problems, or decrease the backlog wait time
> (possibly to zero) to prevent blocking entirely.
> 

No need knobs, auditctl can change the backlog length and wait time. And it is helpless to change the backlog length if auditd is hung forever, as a task can be hung forever due to disk/filesystem's abnormal, etc

I am saying the audit default behaviors which is changed, I truly meet the issue as description of the below commit, if we can make change, other can avoid this issue.

commit ac4cec443a80bfde829516e7a7db10f7325aa528
Author: David Woodhouse <dwmw2 at shinybook.infradead.org>
Date:   Sat Jul 2 14:08:48 2005 +0100

    AUDIT: Stop waiting for backlog after audit_panic() happens
    
    We force a rate-limit on auditable events by making them wait for space
    on the backlog queue. However, if auditd really is AWOL then this could
    potentially bring the entire system to a halt, depending on the audit
    rules in effect.


Other method to avoid this issue to make audit_backlog_wait_time as 0 by default

diff --git a/kernel/audit.c b/kernel/audit.c
index da8dc0db5bd3..0a7f7c290644 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,7 +119,7 @@ static u32  audit_rate_limit;
  * When set to zero, this means unlimited. */
 static u32     audit_backlog_limit = 64;
 #define AUDIT_BACKLOG_WAIT_TIME (60 * HZ)
-static u32     audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
+static u32     audit_backlog_wait_time = 0;
 
 /* The identity of the user shutting down the audit system. */
 kuid_t         audit_sig_uid = INVALID_UID;


-RongQing


> --
> paul moore
> www.paul-moore.com