[libvirt] [PATCH 1/4] virsh-domain: use correct base for virStrToLong_ui
phrdina at redhat.com
Mon Oct 26 09:53:27 UTC 2015
On Fri, Oct 23, 2015 at 12:48:07PM -0400, Laine Stump wrote:
> On 10/23/2015 03:59 AM, Pavel Hrdina wrote:
> > On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
> >> On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
> >>> On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
> >>>> While parsing device addresses we should use correct base and don't
> >>>> count on auto-detect. For example, PCI address uses hex numbers, but
> >>>> each number starting with 0 will be auto-detected as octal number and
> >>>> that's wrong. Another wrong use-case is for PCI address if for example
> >>>> bus is 10, than it's incorrectly parsed as decimal number.
> >>>> PCI and CCW addresses have all values as hex numbers, IDE and SCSI
> >>>> addresses are in decimal numbers.
> >>> I've seen examples for PCI that use decimal a number for the slot (which
> >>> is the one item that's likely to have a value > 7 unless you have a
> >>> system with a whole bunch of PCI controllers)[*], and those that use hex
> >>> numbers always prefix the number with "0x". libvirt itself always a
> >> lspci doesn't really use 0x prefix:
> My statement above was for any libvirt documentation, not "any program
> anywhere displaying a PCI address". The standard notation for a "unified
> PCI address" (coining my own term :-) is as they are displayed in lspci
> (and in all the entries in sysfs).
> >> 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
> >> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03)
> >> 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)
> >> 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
> >> Btw that's a laptop, not a super special system.
> >>> prefixes the domain, bus, and slot with 0x, so an auto-detected base
> >>> will always get those right.
> >> Well, not entirely:
> >> nodedev identificators use hex, possibly padded with zeroes:
> >> pci_0000_00_03_0
> >> pci_0000_00_14_0
> >> pci_0000_00_16_0
> >> pci_0000_00_19_0
> >> pci_0000_00_1b_0
> >> Which may result also in some octal 'fun' on boxes with 16 buses ;).
> Sigh. Yes, I was only considering the <address> element, not the stuff
> in node device XML and virsh nodedev-* output :-(. What you have pointed
> out makes me even more wary of changing libvirt's parsing behavior.
> >> nodedev XML uses decimal in the parsed address:
> >> $ virsh nodedev-dumpxml pci_0000_00_19_0
> >> <device>
> >> <name>pci_0000_00_19_0</name>
> >> <path>/sys/devices/pci0000:00/0000:00:19.0</path>
> >> <parent>computer</parent>
> >> <driver>
> >> <name>e1000e</name>
> >> </driver>
> >> <capability type='pci'>
> >> <domain>0</domain>
> >> <bus>0</bus>
> >> <slot>25</slot>
> >> <function>0</function>
> >> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product>
> >> <vendor id='0x8086'>Intel Corporation</vendor>
> >> </capability>
> >> </device>
> >> (Uhhh, so 19 ... or 25? ... or perhaps 31?)
> >> And for hostdevs we are promoting the use of 0x prefixed hex:
> >> <address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
> >> As well as with target addresses. Fortunately.
> >> <sarcasm>Well that's very consistent.</sarcasm>
> Yes, the inconsistency is actually part of my point. If we silently
> change the parsing to now recognize "25" (no "0x" prefix) as a hex
> number, it's going to give the wrong results for any XML written using a
> decimal slot# that worked properly on "old" libvirt, and it's going to
> make everything confusing in a different way since everyone will need to
> keep straight which version of libvirt interprets it which way - you'll
> *never* be able to write examples for documentation that omit the "0x"
> anyway, since you can't be sure the reader of the documentation has a
> new enough version of libvirt to force parsing as hex.
> For example, if someone has a script that uses XML they created "a long
> time ago" by copy/pasting the slot# from the nodedev-dumpxml output (or
> maybe just hand-writing it) into XML that they use for "virsh create" or
> "virsh attach-device" (i.e. the XML is stored/generated external to
> libvirt in its original form, *not* normalized by the libvirt
> parse/format cycle), and upgraded libvirt will magically/silently begin
> interpreting any slot# > 9 differently, leading to anything from a
> different layout of devices on the guest PCI bus, to failure to attach a
> device because it wasn't found (or even worse, silently/erroneously
> attaching the *wrong* host device to the guest). I don't think that is
> If we want to get rid of the inconsistency, without creating the
> possibility of *silently* performing incorrect operations, we should:
> * require the input in the <address> element to have a "0x" prefix to be
> sure that there is no assumption on the part of the user that they are
> entering a decimal number.
> * change the node device XML (nodedev-dumpxml) to output the domain,
> bus, and slot elements as hex, including the 0x (function is irrelevant,
> since by definition it can never be > 7)
> Even doing that would carry the danger of causing existing functioning
> systems to fail though (although they would at least fail in a vocal
> (BTW, I looked for any places in libvirt that currently parse a number
> *in config* as hex without the 0x prefix - the only one I can find is this:
> * <address type='spapr-vio' reg='0x3000'/> <- reg is forced base16
> (and a deprecated "devaddr" in the status (so probably unused in many
> years) that parses a "unified" address). Anything else that is read from
> config is either forced to base 10, or is base 0. So there isn't much
> precedence for interpreting numbers in libvirt config with no "0x"
> prefix as hex.)
> >>> So I I think the existing code is correct, and don't see an upside to
> >>> making this restriction/invisible change in semantics, and it could
> >>> potentially lead to incorrect results if someone is thinking that
> >>> they're entering decimal numbers.
> >> The existing code is correct only perhaps from a historical point of
> >> view. From a functional it's more than flawed. ...
> >>> [*] There was one particular document that even went to the trouble of
> >>> explaining how to convert the hex value of slot into a decimal number to
> >>> use in the libvirt config!
> >> Well, this hints that we do actually something wrong here. The ambiguity
> >> in parsing of numbers with virStrToLong_ui has already bitten us (I
> >> won't bother looking up the commit though).
> It's not ambiguous to the parser. Any given string is parsed to exactly
> one result. The problem is that the users expect the parser to behave
> differently than it does. Well, *some* of the users do. Maybe even
> *most* of them. But not *all* of them :-).
> As for previous problems with differing bases, this reminded me of a
> very troublesome problem that I spent the time to look up just to fill
> the blank in my memory - commit 8efebd176. This is a case where libvirt
> was formatting the bus/device numbers of a USB device on the qemu
> commandline with %.3d (forcing leading 0's) and qemu was then
> interpreting the numbers as octal, so silently/magically attaching the
> wrong USB device. This problem wasn't solved by making qemu always parse
> USB bus/device as decimal though; it was instead solved by making
> libvirt aware of / follow the rules of qemu's parser. (I realize it's a
> bit more difficult to "patch the user", but that is also an option :-)
> >> The problem is that if you use 0 as a base argument it's not really
> >> clear what you've parsed and that will result in situations like this.
> >> The developer may be lucky in trying numbers that were parsed correctly
> >> misleading him that he used the correct base.
> >> I think that we shouldn't be using 0 as base in ANY place in our code.
> I think that it is dangerous to have a mix of base 16 and base 10
> numbers in a file while not requiring a clear indicator on every number
> of what base was used to encode them. Sure there are some numbers that
> are obviously hex (an address in memory) and there are some that are
> obviously decimal (timeout in seconds), but there are some where it's
> just not clear to the user. The case of PCI addresses is one of those
> (mostly due to the node device XML. But simply changing that won't
> eliminate all the existing examples out on the internet (nor will it
> immediately upgrade all the existing installations of libvirt).
> (BTW, I also think there is absolutely *no* place for octal
> representations of numbers anywhere in anything libvirt does. This is
> not 1975, and we're not all (waiting in line for our turn at) sitting in
> front of a TTY connected to a PDP-11 :-P)
> >> As of this particular case: If we format it in base 16, we should parse
> >> it in base 16. Having ambiguity in the parser code will only end up in
> >> problems.
> Okay, then to take that to its logical conclusion - if we format it with
> a leading 0x, then we should require the leading 0x when parsing.
> >>>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
> >>>> ---
> >>>> tools/virsh-domain.c | 30 +++++++++++++++---------------
> >>>> 1 file changed, 15 insertions(+), 15 deletions(-)
> >> Peter
> > A agree with Peter, as there is no reasonable point to represent PCI address in
> > decimal numbers, only libvirt does that and as said, it's wrong. I don't know
> > any other place, where the PCI address is printed as decimal number.
> > I've been motivated to update this behavior after I've used '0000:05:10.1' as
> > an argument for the '--srouce' and it parsed the slot as decimal number and
> > printed in to the XML as '0x0a'.
> Yeah I can understand that happening (and your annoyance when it did),
> and if all pci address info everywhere in libvirt and its documentation
> was (and always had been) output in hex (*without* the leading "0x") I
> would probably agree with this patch. But due to the history (and
> current state) I think it's just going to create as many problems as it
> solves. I *might* agree with it if it was done by requiring a leading
> '0x' (having the effect of causing failure for any existing XML that had
> been using decimal - I'm not 100% sure that would be acceptable either,
> but at least it is a *visible* failure, rather than a silent one), and I
> would be more prone to agree if the node device XML was modified to
> output the bus/slot/port in hex (with leading 0x - again, I can't say
> for sure that even that wouldn't screw up some existing management
> application though, so I'm *still* hesitant).
The current situation is not ideal, it's not documented anywhere and for users
you can only try the command and see what happens. Yes, it can and probably
will break some scripts for some users, but I think we should make it somehow
consistent and document it properly how to format it.
The output of nodedev-dumpxml should be definitely fixed to print the PCI
address using only hex numbers.
For the parsing part, this code is currently used only for 'attach-disk' and
the 'attach-interface' will be a new functionality and we can easily restrict
the format of PCI address to be provided only in hex numbers with '0x' prefix.
The only issue here is the 'attach-disk' where we could break some user's
> libvir-list mailing list
> libvir-list at redhat.com
More information about the libvir-list