Fedora 10 - Boot Analysis

Tue Dec 16 23:17:19 UTC 2008

On Tue, 2008-12-16 at 17:47 -0500, Jeremy Katz wrote:
> On Tue, 2008-12-16 at 16:30 -0600, Chris Adams wrote:
> > Once upon a time, Jesse Keating <jkeating at redhat.com> said:
> > > On Tue, 2008-12-16 at 16:41 -0500, Seth Vidal wrote:
> > > > so a good reason to power off is simply to save power
> > > 
> > > Or to reboot for one of our frequent updates that require it.  kernel,
> > > dbus, etc...
> > 
> > Why does anything other than a kernel update require a reboot?
> 
> The system bus cannot be restarted.  Similarly, any apps you have
> running will be using old libraries so things like glibc you really want
> to reboot for.  

The problem is mainly one of state.  For example, NetworkManager has
some code to handle reconnection to the system bus.  But to
transparently handle bus restarts, NetworkManager would have to perform
the following actions; none of this is NM or D-Bus specific, it would be
the same problem in any system using separate services for specific
functions, as is the unix way.

1) Sync internal device list with HAL, ie if you pulled a card out while
the system bus was down

2) Re-query dhclient state (currently impossible since dhclient doesn't
talk over D-Bus), ie if your DHCP lease got renewed or changed while the
system bus was down

3) Re-query avahi-autoipd state (currently impossible since
avahi-autoipd doesn't speak D-Bus), ie of your autoip address changed
while the system bus was down

4) Re-query pppd state (currently impossible since pppd doesn't speak
D-Bus), ie if the cellular network dropped your connection while the
system bus was down

5) Re-query all device IP addresses and routes, delete addresses/routes
that aren't known to NM and add addresses/routes that got dropped.  Also
update stuff like MTU based on the DHCP server response that we had to
wait for in step #2

6) Re-query all VPN daemons for their connection state just in case any
of that changed while the system bus was down

7) Execute the callouts and send out D-Bus signals if any of these
things changed, so that scripts in /etc/NetworkManager/dispatcher.d get
notified of changes

8) Requery wpa_supplicant for the wireless scan list

9) Requery the wifi adapter for the current settings and ensure that
they are the same as NM thinks they should be, and if not, tear down
whatever the wifi card is doing and re-init the connection

10) Trust and rely on each D-Bus service to successfully reconnect to
the bus, where each service may handle this differently with different
reconnection attempt timeouts.  Latency for handling a bus restart could
be quite large if a service like wpa_supplicant that is lower in the
stack takes a bit longer to reconnect than other services.

11) Re-query the system settings and user settings services and figure
out if any new connections were added while the bus was down, or if any
connections (including the one that was active during the bus restart)
were deleted.  If the active connection was deleted during the bus
restart, tear it down.

How does all this stuff happen when the system bus is running?  Signals
get emitted from the various components when things change, and NM
listens for those signals so it doesn't have to poll everything.  Thus,
if the system bus isn't running, signals get lost and you have to
invalidate all your state, then re-query it when the bus comes back.

We've discussed less-invasive ways of fixing this, like queuing signals
in the services on the service-side in libdbus or so until the service
reconnects to the system bus, but that could be quite tricky and error
prone.  You cannot do a 75% solution here.

Dan