[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Interesting interaction between journal recovery and slow boots



For some time now I have been puzzled as to why certain portions
of my system boot were quite slow -- but only after journal
recoveries.

I was fearing that there was some ugly interaction between the
recovery and the use of the journal shortly afterward but alas that is
not the case.  So just in case anybody else is seeing this problem and
decides to try to hunt it down, let me save you some time.

I am using an 0.0.6b ext3 kernel on a Red Hat 7.0 system.

I was finding that if my system did not get shutdown nicely, upon
reboot it would of course do journal recoveries.  When it was
starting the system services after the recoveries, the "portmapper"
and "nfslock" startup scripts were taking for ever to complete -- up
to a minute or two each.

After much investigation (strace and "set -x" in bash scripts) and
many times hitting the power switch) I finally figured it out.  The
portmapper and nfslock scripts are the first ones started after
syslogd and klogd are started.  During the journal recovery there
are copious quantities of messages such as:

Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass:  
Mar 20 12:27:20 brian-laptop fsck: Scanning for sequence ID 484035 at 9590/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9591/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484036 at 9593/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484037 at 9594/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484037 at 9596/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484038 at 9597/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484038 at 9599/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484039 at 9600/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484039 at 9602/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484040 at 9603/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484040 at 9609/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484041 at 9610/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484041 at 9612/10000 
Mar 20 12:27:20 brian-laptop fsck: JFS DEBUG: (recovery.c, 366): do_one_pass: Scanning for sequence ID 484042 at 9613/10000 

diplayed.

On Red Hat systems (and others I am sure) the startup scripts use a
package called initlog to capture all of the console output (including
the above) and "syslog" it (to /var/log/boot.log on Red Hat 7.0
systems).  Well, the copious amounts of JFS records were flooding the
syslog server and causing it to back up, blocking further attemtps to
write to syslog while the back up was being flushed to disk.

Every startup script uses initlog to write the output of the startup
to syslog, so the scripts immediatley following the start of the
syslogger were blocking, presumably due to syslogd's buffer being
full.

b.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]