[dm-devel] Serial console is causing system lock-up

Mikulas Patocka mpatocka at redhat.com
Wed Mar 6 16:07:55 UTC 2019



On Wed, 6 Mar 2019, Petr Mladek wrote:

> On Wed 2019-03-06 09:27:13, Mikulas Patocka wrote:
> > Hi
> > 
> > I was debugging some kernel lockup with storage drivers and it turned out 
> > that the lockup is caused by the serial console subsystem. If we use 
> > serial console and if we write to it excessively, the kernel sometimes 
> > lockup, sometimes reports rcu stalls and NMI backtraces. Sometimes it will 
> > just print the console messages without donig anything else.
> 
> This is a very old problem that we have been trying to solve for
> years. There are two conflicting requirements on printk():
> be fast and reliable.
> 
> The historical solution is that printk() callers store the messages
> into the log buffer and then just _try_ to take the console lock.
> The winner who succeeds is responsible for flushing all
> pending messages to the console. As a result a random victim
> might get blocked by the console handling for a long time.

This bug only happens if we select large logbuffer (millions of 
characters). With smaller log buffer, there are messages "** X printk 
messages dropped", but there's no lockup.

The kernel apparently puts 2 million characters into a console log buffer, 
then takes some lock and than tries to write all of them to a slow serial 
line.

> An obvious solution is offloading the console handling. But
> it is against the reliability. There are no guarantees that
> the offload mechanism (kthread, irq) would happen when the
> system is on their knees.
> 
> Anyway, which kernel version are you using, please?

RHEL8-4.18, Debian-4.19, Upstream 5.0. I didn't try older versions.

> I wonder if you already have the dbdda842fe96f8932 ("printk: Add
> console owner and waiter logic to load balance console writes").
> It improves the situation a lot. There was a hope that it would
> be enough in the real life.

Yes - this patch is present in the kernels that I tried.

> > This program tests the issue - on framebuffer console, the system is 
> > sluggish, but it is possible to unload the module with rmmod. On serial 
> > console, it locks up to the point that unloading the module is not 
> > possible.
> 
> Is there any chance to send us logs from the original (real life)
> problem, please?
> 
> Best regards,
> Petr

I uploaded the logs here: 
http://people.redhat.com/~mpatocka/testcases/console-lockup/

Mikulas




More information about the dm-devel mailing list