[libvirt] [PATCH 1/4] virsh-domain: use correct base for virStrToLong_ui
laine at laine.org
Fri Oct 23 16:48:07 UTC 2015
On 10/23/2015 03:59 AM, Pavel Hrdina wrote:
> On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
>> On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
>>> On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
>>>> While parsing device addresses we should use correct base and don't
>>>> count on auto-detect. For example, PCI address uses hex numbers, but
>>>> each number starting with 0 will be auto-detected as octal number and
>>>> that's wrong. Another wrong use-case is for PCI address if for example
>>>> bus is 10, than it's incorrectly parsed as decimal number.
>>>> PCI and CCW addresses have all values as hex numbers, IDE and SCSI
>>>> addresses are in decimal numbers.
>>> I've seen examples for PCI that use decimal a number for the slot (which
>>> is the one item that's likely to have a value > 7 unless you have a
>>> system with a whole bunch of PCI controllers)[*], and those that use hex
>>> numbers always prefix the number with "0x". libvirt itself always a
>> lspci doesn't really use 0x prefix:
My statement above was for any libvirt documentation, not "any program
anywhere displaying a PCI address". The standard notation for a "unified
PCI address" (coining my own term :-) is as they are displayed in lspci
(and in all the entries in sysfs).
>> 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03)
>> 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)
>> 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
>> Btw that's a laptop, not a super special system.
>>> prefixes the domain, bus, and slot with 0x, so an auto-detected base
>>> will always get those right.
>> Well, not entirely:
>> nodedev identificators use hex, possibly padded with zeroes:
>> Which may result also in some octal 'fun' on boxes with 16 buses ;).
Sigh. Yes, I was only considering the <address> element, not the stuff
in node device XML and virsh nodedev-* output :-(. What you have pointed
out makes me even more wary of changing libvirt's parsing behavior.
>> nodedev XML uses decimal in the parsed address:
>> $ virsh nodedev-dumpxml pci_0000_00_19_0
>> <capability type='pci'>
>> <product id='0x15a2'>Ethernet Connection (3) I218-LM</product>
>> <vendor id='0x8086'>Intel Corporation</vendor>
>> (Uhhh, so 19 ... or 25? ... or perhaps 31?)
>> And for hostdevs we are promoting the use of 0x prefixed hex:
>> <address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
>> As well as with target addresses. Fortunately.
>> <sarcasm>Well that's very consistent.</sarcasm>
Yes, the inconsistency is actually part of my point. If we silently
change the parsing to now recognize "25" (no "0x" prefix) as a hex
number, it's going to give the wrong results for any XML written using a
decimal slot# that worked properly on "old" libvirt, and it's going to
make everything confusing in a different way since everyone will need to
keep straight which version of libvirt interprets it which way - you'll
*never* be able to write examples for documentation that omit the "0x"
anyway, since you can't be sure the reader of the documentation has a
new enough version of libvirt to force parsing as hex.
For example, if someone has a script that uses XML they created "a long
time ago" by copy/pasting the slot# from the nodedev-dumpxml output (or
maybe just hand-writing it) into XML that they use for "virsh create" or
"virsh attach-device" (i.e. the XML is stored/generated external to
libvirt in its original form, *not* normalized by the libvirt
parse/format cycle), and upgraded libvirt will magically/silently begin
interpreting any slot# > 9 differently, leading to anything from a
different layout of devices on the guest PCI bus, to failure to attach a
device because it wasn't found (or even worse, silently/erroneously
attaching the *wrong* host device to the guest). I don't think that is
If we want to get rid of the inconsistency, without creating the
possibility of *silently* performing incorrect operations, we should:
* require the input in the <address> element to have a "0x" prefix to be
sure that there is no assumption on the part of the user that they are
entering a decimal number.
* change the node device XML (nodedev-dumpxml) to output the domain,
bus, and slot elements as hex, including the 0x (function is irrelevant,
since by definition it can never be > 7)
Even doing that would carry the danger of causing existing functioning
systems to fail though (although they would at least fail in a vocal
(BTW, I looked for any places in libvirt that currently parse a number
*in config* as hex without the 0x prefix - the only one I can find is this:
* <address type='spapr-vio' reg='0x3000'/> <- reg is forced base16
(and a deprecated "devaddr" in the status (so probably unused in many
years) that parses a "unified" address). Anything else that is read from
config is either forced to base 10, or is base 0. So there isn't much
precedence for interpreting numbers in libvirt config with no "0x"
prefix as hex.)
>>> So I I think the existing code is correct, and don't see an upside to
>>> making this restriction/invisible change in semantics, and it could
>>> potentially lead to incorrect results if someone is thinking that
>>> they're entering decimal numbers.
>> The existing code is correct only perhaps from a historical point of
>> view. From a functional it's more than flawed. ...
>>> [*] There was one particular document that even went to the trouble of
>>> explaining how to convert the hex value of slot into a decimal number to
>>> use in the libvirt config!
>> Well, this hints that we do actually something wrong here. The ambiguity
>> in parsing of numbers with virStrToLong_ui has already bitten us (I
>> won't bother looking up the commit though).
It's not ambiguous to the parser. Any given string is parsed to exactly
one result. The problem is that the users expect the parser to behave
differently than it does. Well, *some* of the users do. Maybe even
*most* of them. But not *all* of them :-).
As for previous problems with differing bases, this reminded me of a
very troublesome problem that I spent the time to look up just to fill
the blank in my memory - commit 8efebd176. This is a case where libvirt
was formatting the bus/device numbers of a USB device on the qemu
commandline with %.3d (forcing leading 0's) and qemu was then
interpreting the numbers as octal, so silently/magically attaching the
wrong USB device. This problem wasn't solved by making qemu always parse
USB bus/device as decimal though; it was instead solved by making
libvirt aware of / follow the rules of qemu's parser. (I realize it's a
bit more difficult to "patch the user", but that is also an option :-)
>> The problem is that if you use 0 as a base argument it's not really
>> clear what you've parsed and that will result in situations like this.
>> The developer may be lucky in trying numbers that were parsed correctly
>> misleading him that he used the correct base.
>> I think that we shouldn't be using 0 as base in ANY place in our code.
I think that it is dangerous to have a mix of base 16 and base 10
numbers in a file while not requiring a clear indicator on every number
of what base was used to encode them. Sure there are some numbers that
are obviously hex (an address in memory) and there are some that are
obviously decimal (timeout in seconds), but there are some where it's
just not clear to the user. The case of PCI addresses is one of those
(mostly due to the node device XML. But simply changing that won't
eliminate all the existing examples out on the internet (nor will it
immediately upgrade all the existing installations of libvirt).
(BTW, I also think there is absolutely *no* place for octal
representations of numbers anywhere in anything libvirt does. This is
not 1975, and we're not all (waiting in line for our turn at) sitting in
front of a TTY connected to a PDP-11 :-P)
>> As of this particular case: If we format it in base 16, we should parse
>> it in base 16. Having ambiguity in the parser code will only end up in
Okay, then to take that to its logical conclusion - if we format it with
a leading 0x, then we should require the leading 0x when parsing.
>>>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
>>>> tools/virsh-domain.c | 30 +++++++++++++++---------------
>>>> 1 file changed, 15 insertions(+), 15 deletions(-)
> A agree with Peter, as there is no reasonable point to represent PCI address in
> decimal numbers, only libvirt does that and as said, it's wrong. I don't know
> any other place, where the PCI address is printed as decimal number.
> I've been motivated to update this behavior after I've used '0000:05:10.1' as
> an argument for the '--srouce' and it parsed the slot as decimal number and
> printed in to the XML as '0x0a'.
Yeah I can understand that happening (and your annoyance when it did),
and if all pci address info everywhere in libvirt and its documentation
was (and always had been) output in hex (*without* the leading "0x") I
would probably agree with this patch. But due to the history (and
current state) I think it's just going to create as many problems as it
solves. I *might* agree with it if it was done by requiring a leading
'0x' (having the effect of causing failure for any existing XML that had
been using decimal - I'm not 100% sure that would be acceptable either,
but at least it is a *visible* failure, rather than a silent one), and I
would be more prone to agree if the node device XML was modified to
output the bus/slot/port in hex (with leading 0x - again, I can't say
for sure that even that wouldn't screw up some existing management
application though, so I'm *still* hesitant).
More information about the libvir-list