BIG performance hit with auditd on large cpus (>64 cpus)
Steve Grubb
sgrubb at redhat.com
Fri May 19 21:00:24 UTC 2017
On Friday, May 19, 2017 4:22:24 PM EDT Klaus Lichtenwalder wrote:
> (note to moderator: i sent this before from the wrong address, hope it
> doesn't get duplicated)
>
> Hi,
>
> we have a few SAP systems on RHEV (so virtualized on KVM) with >= 74
> CPUs and >= 400G RAM.
> When the system is busy with large SAP jobs, it goes onto its knees with
> cpu %system up to 80%, thus making the SAP jobs run twice as long. As
> soon as you stop auditd everything returns to normal...
>
> Facts:
> RHEL6 instances on RHEL7 hosts.
> the rule set (see below) runs fine on any other system with less cpus
> (<64, maybe this is the cut off?). We have smaller systems with this
> rule set that rotate the audit file nearly every minute without any
> noticable performance hit, these SAP systems rotate once every
> 20-24hours....
>
> Anyone has an idea?
>
> Here's an excerpt from "perf top":
> with auditd running:
>
> Samples: 28M of event 'cpu-clock', Event count (approx.): 236747914918
> Overhead Shared Object Symbol
> 23.13% [kernel] [k] get_task_cred
> 10.05% [kernel] [k] audit_filter_rules
> 4.21% [kernel] [k] _spin_unlock_irqrestore
> 3.30% libdb2e.so.1 [.] sqlbfix
> 2.92% [kernel] [k] finish_task_switch
> 1.69% disp+work [.] rrol_in
> 1.69% disp+work [.] rrol_out
> 0.98% [kernel] [k] run_timer_softirq
> 0.96% [kernel] [k] rcu_process_gp_end
>
>
> auditd stopped:
>
> Samples: 3M of event 'cpu-clock', Event count (approx.): 526535382557
> Overhead Shared Object Symbol
> 2.41% disp+work [.] memcmpU16
> 2.32% disp+work [.] MmxMalloc2
> 2.25% disp+work [.] ab_Rudi
> 2.07% disp+work [.] rrol_out
> 1.98% disp+work [.] rrol_in
> 1.95% disp+work [.] ab_CompByCmpCntx
> 1.88% libdb2e.so.1 [.] sqlbfix
> 1.73% disp+work [.] MmxFree2
> 1.62% [kernel] [k] run_timer_softirq
> 1.56% [kernel] [k] __do_softirq
> 1.39% disp+work [.] ab_InitRcDecompress
>
> These are the audit rules:
> auditctl -l
> -a always,exit -S all -F path=/etc/environment -F perm=wa -F auid>=400 -F
> key=CRIT_CONF
Clipped all the other rules. Out of curiosity, why do you include -S all in
every rule? That will automatically send the syscall into the syscall rules
which affects the performance of every single syscall in every single
application. The majority of your rules are file watches which generally takes
a different route that is more efficient.
To fix this, just remove "-S all" in every rule. I bet it works much better
after that.
-Steve
More information about the Linux-audit
mailing list