Flush the hold queue fall into an infinite loop.

Thu Jan 13 11:56:29 UTC 2022

When we add "audit=1" to the cmdline, kauditd will take up 100%
cpu resource.As follows:

    configurations:
    	auditctl -b 64
    	auditctl --backlog_wait_time 60000
    	auditctl -r 0
    	auditctl -w /root/aaa  -p wrx
    shell scripts：
    	#!/bin/bash
    	i=0
    	while [ $i -le 66 ]
    	do
    	    touch /root/aaa
    	    let i++
    	done
    mandatory conditions:

        add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).

  As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
  an infinite loop.

> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
>  714                               struct sk_buff_head *queue,
>  715                               unsigned int retry_limit,
>  716                               void (*skb_hook)(struct sk_buff *skb),
>  717                               void (*err_hook)(struct sk_buff *skb))
>  718 {
>  719         int rc = 0;
>  720         struct sk_buff *skb;
>  721         unsigned int failed = 0;
>  722
>  723         /* NOTE: kauditd_thread takes care of all our locking, we 
> just use
>  724          *       the netlink info passed to us (e.g. sk and 
> portid) */
>  725
>  726         while ((skb = skb_dequeue(queue))) {
>  727                 /* call the skb_hook for each skb we touch */
>  728                 if (skb_hook)
>  729                         (*skb_hook)(skb);
>  730
>  731                 /* can we send to anyone via unicast? */
>  732                 if (!sk) {
>  733                         if (err_hook)
>  734                                 (*err_hook)(skb);
>  735                         continue;
>  736                 }
>  737
>  738 retry:
>  739                 /* grab an extra skb reference in case of error */
>  740                 skb_get(skb);
>  741                 rc = netlink_unicast(sk, skb, portid, 0);
>  742                 if (rc < 0) {
>  743                         /* send failed - try a few times unless 
> fatal error */
>  744                         if (++failed >= retry_limit ||
>  745                             rc == -ECONNREFUSED || rc == -EPERM) {
>  746                                 sk = NULL;
>  747                                 if (err_hook)
>  748                                         (*err_hook)(skb);
>  749                                 if (rc == -EAGAIN)
>  750                                         rc = 0;
>  751                                 /* continue to drain the queue */
>  752                                 continue;
>  753                         } else
>  754                                 goto retry;
>  755                 } else {
>  756                         /* skb sent - drop the extra reference 
> and continue */
>  757                         consume_skb(skb);
>  758                         failed = 0;
>  759                 }
>  760         }
>  761
>  762         return (rc >= 0 ? 0 : rc);
>  763 }

When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
fall into an infinite loop.
I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
into kauditd_rehold_skb when the auditd is abnormal?

Look forward your reply. Thank you very much.

Gaosheng.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-audit/attachments/20220113/5bcebc87/attachment.htm>