[Libguestfs] New Python API?

Richard W.M. Jones rjones at redhat.com
Mon Aug 11 12:00:49 UTC 2014


On Mon, Aug 11, 2014 at 01:11:54AM +0200, Peter Wu wrote:
> Would there be interest for inclusion of such an API in hivex? Since it uses 
> the existing Python methods, breakage must not be possible unless you break 
> other programs relying on it.

It really depends on the details, but I suspect what is just a second
API is only going to cause confusion.  However you are certainly
welcome to maintain a nicer hivex API outside, and we can even point
to it in the documentation.

> > It's worth saying that encoding in the registry itself is not always
> > UTF-16LE.  It's sometimes UTF-8, ASCII or (in a case I found last
> > week) an NLS like ISO-8859-1 or Big5.  Essentially the consuming app
> > always has to know what encoding to use.  Doing "clever" stuff in the
> > bindings is therefore almost always going to be wrong in some case.
> > (This is also why the C functions like hivex_value_string are
> > deprecated).
>
> When doing a registry export (.reg), all strings like "Key"="Value"
> appears to be UTF-16 strings. Trying to push an UTF-8 string into
> the registry results in Chinese characters (UTF-16?). Could you
> confirm/reject this against the exports of your keys? Also, when the
> trailing NUL byte is missing in the services values, a BSOD can be
> observed.

Well it depends on the OS you are using.  Try it with Windows XP which
uses (some of the time) an old CodePage or ISO-8859-X encoding for at
least registry key names.  Also it depends on the Windows application
that is reading the registry.

See:
http://git.annexia.org/?p=hivex-test-data.git;a=commit;h=2145ff5774ecbd4c3e98b845cf9c64e0a669324e
http://git.annexia.org/?p=hivex-test-data.git;a=commit;h=e296fba552f57c63608087671833a3228b08e0d0

> If it is necessary to support other encodings, it may be worth to add a 
> function to wrap the encoding, (type?) and value:
> 
> UTF_16_LE = "utf-16-le"
> class RegistryString(object):
>     def __init__(self, type, value, encoding=UTF_16_LE):
>         ...
>     def value(self):
>         return self.value.encode(self.encoding) + u"\0".encode(self.encoding)
> 
> (maybe introduce a wrapper function for this to avoid long lines)

Right, this can be made easier to use in the common case, but it'll
break in other cases.

The fundamental problem here is the Registry format is not
well-specified.  Consumers can put whatever junk their developers felt
like at the time.  Consider it to be a store of arbitrary binary
strings, which sometimes happen to have a well-known encoding.

> Strings are always NUL-terminated, right? I recall reading something
> like that in the MSDN documentation.

Yup, in well-formed values, but I bet you can find places where that
is not the case.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/




More information about the Libguestfs mailing list