[PATCH][RFC] audit: set wait time to zero when audit failed

Thu Sep 19 02:30:25 UTC 2019

On Wed, Sep 18, 2019 at 9:50 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > -----邮件原件-----
> > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > 发送时间: 2019年9月18日 20:23
> > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> >
> > On Tue, Sep 17, 2019 at 9:07 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > > > -----邮件原件-----
> > > > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > > > 发送时间: 2019年9月18日 3:17
> > > > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > > > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> > > >
> > > > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongqing at baidu.com>
> > wrote:
> > > > > > -----邮件原件-----
> > > > > > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > > > > > 发送时间: 2019年9月17日 6:52
> > > > > > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > > > > > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > > > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit
> > > > > > failed
> >
> > ...
> >
> > > > > I just want to it as before 3197542482df ("audit: rework
> > > > > audit_log_start()"), wait 60 seconds once if
> > > > > auditd/readaheaad-collector have some problem to drain the audit
> > backlog.
> > > >
> > > > The patch you mention fixed what was deemed to be buggy behavior; as
> > > > mentioned previously in this thread I see no good reason to go back
> > > > to the old behavior.
> > > >
> > > > > > If you are not using audit, you can always disable it via the
> > > > > > kernel command line, or at runtime (look at what Fedora does).
> > > > > >
> > > > > > > > You might also want to investigate what is generating some
> > > > > > > > many audit records prior to starting the audit daemon.
> > > > > > >
> > > > > > > It is /sbin/readahead-collector, in fact, we stop the auditd;
> > > > > > > We are doing a
> > > > > > reboot test, which rebooting machine continue to test
> > hardware/software.
> > > > > > >
> > > > > > > it is same as below:
> > > > > > > auditctl -a always,exit -S all -F pid='xxx'
> > > > > > > kill -s 19 `pidof auditd`
> > > > > > >
> > > > > > > then the audited task will be hung
> > > > > >
> > > > > > So you are seeing this problem only when you run a test, or did
> > > > > > you provide this as a reproducer?
> > > > >
> > > > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19
> > > > > `pidof auditd` ssh root at 127.0.0.1
> > > > >
> > > > > then ssh will be hung forever
> > > >
> > > > That is expected behavior.  You are putting a massive audit load on
> > > > the system by telling the kernel to audit every syscall that sshd
> > > > makes, then you are intentionally killing the audit daemon and attempting
> > to ssh into the system.
> > > > The proper fix(es) here would be to 1) set reasonable audit rules
> > > > and/or 2) use an init system that monitors and restarts auditd when
> > > > it fails (systemd has this capability, I believe some others do as well).
> > >
> > > Both are not working.
> > > The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not
> > restart it.
> > > Even if with little audit rules, after multiple accesses, the backlog
> > > will full due to no receiver
> >
> > Fair point, however I still stand by my previous comments that there are
> > runtime configuration knobs which can mitigate this problem if it is something
> > you are concerned about.  Depending on the situation, you can either increase
> > the backlog to deal with transient problems, or decrease the backlog wait time
> > (possibly to zero) to prevent blocking entirely.
> >
>
> No need knobs, auditctl can change the backlog length and wait time.

That is what I meant by "knobs".  The term "knobs" is commonly used to
reference some method of changing the configuration.

> And it is helpless to change the backlog length if auditd is hung forever, as a task can be hung forever due to disk/filesystem's abnormal, etc

In this case changing the wait time would work (as previously
mentioned).  It is worth noting that the current code does not suffer
from a "hung forever" problem if the audit queue is blocked, it may
slow down quite a bit (dependent on the audit_backlog_wait_time
variable), but it should still make forward progress.

> I am saying the audit default behaviors which is changed, I truly meet the issue as description of the below commit, if we can make change, other can avoid this issue.

If we were hearing more reports of problems with the current defaults
I would be inclined to change them, but to the best of my knowledge
you are the only one who has run into this problem, so I would rather
you simply update your audit configuration.

-- 
paul moore
www.paul-moore.com