[libvirt] [RFC] [PATCH 3/3 v2] vepa+vsi: Some experimental code for 802.1Qbh

Sun May 23 01:24:59 UTC 2010

On Sat, May 22, 2010 at 12:17:05PM -0700, Scott Feldman wrote:
> On 5/22/10 11:34 AM, "Dave Allan" <dallan at redhat.com> wrote:
> 
> > On Sat, May 22, 2010 at 11:14:20AM -0400, Stefan Berger wrote:
> >> On Fri, 2010-05-21 at 23:35 -0700, Scott Feldman wrote:
> >>> On 5/21/10 6:50 AM, "Stefan Berger" <stefanb at linux.vnet.ibm.com> wrote:
> >>> 
> >>>> This patch may get 802.1Qbh devices working. I am adding some code to
> >>>> poll for the status of an 802.1Qbh device and loop for a while until the
> >>>> status indicates success. This part for sure needs more work and
> >>>> testing...
> >>> 
> >>> I think we can drop this patch 3/3.  For bh, we don't want to poll for
> >>> status because it may take awhile before status of other than in-progress is
> >>> indicated.  Link UP on the eth is the async notification of status=success.
> >> 
> >> The idea was to find out whether the association actually worked and if
> >> not either fail the start of the VM or not hotplug the interface. If we
> >> don't do that the user may end up having a VM that has no connectivity
> >> (depending on how the switch handles an un-associated VM) and start
> >> debugging all kinds of things... Really, I would like to know if
> >> something went wrong. How long would we have to wait for the status to
> >> change? How does a switch handle traffic from a VM if the association
> >> failed? At least for 802.1Qbg we were going to get failure notification.
> > 
> > I tend to agree that we should try to get some indication of whether
> > the associate succeeded or failed.  Is the time that we would have to
> > poll bounded by anything, or is it reasonably short?
> 
> It's difficult to put an upper bound on how long to poll.  In most case,
> status would be available in a reasonably short period of time, but the
> upper bound depends on activity external to the host.

That makes sense.  The timeout should be a configurable value.  What
do you think is a reasonable default?

> > Mostly I'm concerned about the failure case: how would the user know
> > that something has gone wrong, and where would information to debug
> > the problem appear?
> 
> Think of it as equivalent to waiting to get link UP after plugging in a
> physical cable into a physical switch port.  In some cases negotiation of
> the link may take on the order of seconds.  Depends on the physical media,
> of course.  A user can check for link UP using ethtool or ip cmd.
> Similarly, a user can check for association status using ip cmd, once we
> extend ip cmd to know about virtual ports (patch for ip cmd coming soon).

That's the way I was thinking about it as well.  The difference I see
between an actual physical cable and what we're doing here is that if
you're in the data center and you plug in a cable, you're focused on
whether the link comes up.  Here, the actor is likely to be an
automated process, and users will simply be presented with a VM with
no or incorrect connectivity, and they will have no idea what
happened.  It's just not supportable to provide them with no
indication of what failed or why.

Dave