[dm-devel] Serial console is causing system lock-up
Steven Rostedt
rostedt at goodmis.org
Wed Mar 6 22:19:43 UTC 2019
On Wed, 6 Mar 2019 12:11:10 -0500 (EST)
Mikulas Patocka <mpatocka at redhat.com> wrote:
> On Wed, 6 Mar 2019, Theodore Y. Ts'o wrote:
>
> > On Wed, Mar 06, 2019 at 11:07:55AM -0500, Mikulas Patocka wrote:
> > > This bug only happens if we select large logbuffer (millions of
> > > characters). With smaller log buffer, there are messages "** X printk
> > > messages dropped", but there's no lockup.
> > >
> > > The kernel apparently puts 2 million characters into a console log buffer,
> > > then takes some lock and than tries to write all of them to a slow serial
> > > line.
> >
> > What are the messages; from what kernel subsystem? Why are you seeing
> > so many log messages?
> >
> > - Ted
>
> The dm-integity subsystem (drivers/md/dm-integrity.c) can be attached to a
> block device to provide checksum protection. It will return -EILSEQ and
> print a message to a log for every corrupted block.
>
> Nigel Croxon was testing MD-RAID recovery capabilities in such a way that
> he activated RAID-5 array with one leg replaced by a dm-integrity block
> device that had all checksums invalid.
>
> The MD-RAID is supposed to recalculate data for the corrupted device and
> bring it back to life. However, scrubbing the MD-RAID device resulted in a
> lot of reads from the device with bad checksums, these were reported to
> the log and killed the machine.
>
>
> I made a patch to dm-integrity to rate-limit the error messages. But
> anyway - killing the machine in case of too many log messages seems bad.
> If the log messages are produced faster than the kernel can write them,
> the kernel should discard some of them, not kill itself.
Sounds like another aurgment for the new printk design.
-- Steve
More information about the dm-devel
mailing list