[dm-devel] Serial console is causing system lock-up

Mikulas Patocka mpatocka at redhat.com
Thu Mar 7 14:26:46 UTC 2019



On Thu, 7 Mar 2019, John Stoffel wrote:

> The real problem is the disconnect between serial console speed and
> capacity in bits/sec and that of the regular console.  Serial, esp at
> 9600 baud is just a slow and limited resource which needs to be
> handled differently than a graphical console.
>
> I'm also big on ratelimiting messages, even critical warning
> messages.  Too much redundant info doesn't help anyone.  And what a
> subsystem thinks is critical, may not be critical to the system as a
> whole.

Perhaps a proper solution would be to drop excessive messages to serial 
console unless Oops or BUG has happened?

> In this case, if these checksum messages are telling us that there's
> corruption, why isn't dm-integrity going readonly and making the block
> device get the filesystem to also go readonly and to stop the damage
> right away?

Because we don't want to kill the filesystem. dm-integrity detects an 
error and returns the error code to md-raid5. md-raid5 recalculates the 
correct data from the remaining disks. And then, md-raid5 submits the 
correct data to the filesystem and also writes the correct data to the 
device that had the error.

In a real-life scenario, there would be few errors. When we are testing 
it, we deliberately create a device with all errorneous sectors.

> If it's just a warning for the niceness, then please rate limit them,
> or summarize them in some more useful way.  Or even log them to
> somewhere else than the console once the problem is noted.
> 
> John

I made a patch that rate-limits the message. But still, killing the 
machine is wrong.

Mikulas




More information about the dm-devel mailing list