Networkmanager service is shutdown too early

Dan Williams dcbw at redhat.com
Sun Jun 1 13:14:37 UTC 2008


On Fri, 2008-05-30 at 16:49 -0400, Alan Cox wrote:
> On Fri, May 30, 2008 at 03:33:37PM -0400, Colin Walters wrote:
> > DBus is not the same as any other random software because it is explicitly
> > designed to provide reliable communication *between* components, much like
> > the kernel.  If you restart it at random times that reliability guarantee is
> > destroyed.
> 
> So the questions you should ask are
> - Why does restarting dbus have to be unreliable

It's a communication pipe; restarting D-Bus itself is reliable becuase
it's just like TCP.  Its the transport.  But making what gets
_transported_ reliable is the kicker.

It's exactly like all those Cingular/AT&T dropped call commercials from
a while ago:

http://youtube.com/watch?v=DR26BZUo3Dk
http://youtube.com/watch?v=GEd3pS1jXJ4
http://www.spike.com/video/2839248 (spoof)

Suddenly all the state dependent on a D-Bus service is suspect, because
you have no idea what's going on while the bus is down.  You have to
re-synchronize your state after the bus comes back, and that's not a
simple task.

> - Why isn't there a recovery mechanism

The recovery mechanism would be in each service, because the service
knows whether or not it needs recovery or not, and would know how to
merge/synchronize it's state with the services that it depends on.  Some
don't need to.  But ones with state dependent on other D-Bus services
would.

> - Why does network manager have to do the work itself not the support code

Like above, because NM has specific state, and when D-Bus goes away,
it's communication channels with the daemons that affect that
NM-specific state are gone, and NM can't make any assumptions about
what's happening in any other daemon while the bus is gone.  Maybe your
VPN just came up for rekeying, but the signal got lost because D-Bus
isn't around.  So when the bus comes back, your VPN connection is
already dropped.

Or DHCP re-bound while the bus was down, and your sysadmin changed DNS
servers on you, and the signal from dhclient got lost (because the bus
was down).  Unless you re-do the entire DHCP transaction (or teach
dhclient about dbus properly so it can answer questions without having
to exec() stupid scripts that then re-emit state back over D-Bus) NM
would have no idea that the returned DHCP options had changed.  And thus
your DNS is dead.

> And more fundamentally
> 
> Why the ... are people still writing software which doesn't try and tolerate
> faults that are recoverable to a useful extent.  Yes dbus might have to lose
> a few messages and send everyone a "duh whoops" event so they can recover but
> "oh dear it broke everyone reboot" is not good engineering.

In some cases, it's a cost/benefit analysis.  Is the cost of writing and
maintaining a pile of code that handles a D-Bus restart, which shouldn't
ever happen, worth the benefit?  In some cases, definitely.  In other
cases, probably not.  That isn't an excuse to write crappy software, but
it's certainly not as simple of a problem as you present it.

Dan

> So I'm likewise pleased the Debian people raised a sensible point.
> 
> Alan
> 




More information about the fedora-devel-list mailing list