[dm-devel] Serial console is causing system lock-up
Mikulas Patocka
mpatocka at redhat.com
Thu Mar 7 14:26:46 UTC 2019
On Thu, 7 Mar 2019, John Stoffel wrote:
> The real problem is the disconnect between serial console speed and
> capacity in bits/sec and that of the regular console. Serial, esp at
> 9600 baud is just a slow and limited resource which needs to be
> handled differently than a graphical console.
>
> I'm also big on ratelimiting messages, even critical warning
> messages. Too much redundant info doesn't help anyone. And what a
> subsystem thinks is critical, may not be critical to the system as a
> whole.
Perhaps a proper solution would be to drop excessive messages to serial
console unless Oops or BUG has happened?
> In this case, if these checksum messages are telling us that there's
> corruption, why isn't dm-integrity going readonly and making the block
> device get the filesystem to also go readonly and to stop the damage
> right away?
Because we don't want to kill the filesystem. dm-integrity detects an
error and returns the error code to md-raid5. md-raid5 recalculates the
correct data from the remaining disks. And then, md-raid5 submits the
correct data to the filesystem and also writes the correct data to the
device that had the error.
In a real-life scenario, there would be few errors. When we are testing
it, we deliberately create a device with all errorneous sectors.
> If it's just a warning for the niceness, then please rate limit them,
> or summarize them in some more useful way. Or even log them to
> somewhere else than the console once the problem is noted.
>
> John
I made a patch that rate-limits the message. But still, killing the
machine is wrong.
Mikulas
More information about the dm-devel
mailing list