[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Libguestfs] hivex: some issues (key encoding, ...) and suggested fixes

On Sat, Feb 26, 2011 at 08:56:48PM +0200, Török Edwin wrote:
> Hi,
> libhivex seems to do a great job at parsing hives most of the time, but
> there are some issues with a few registry keys.
> These can be worked around in the application that uses libhivex, but I
> think it'd be better if libhivex handled these itself.
> 1. UTF16 string in REG_SZ that has garbage after the \0\0
> There is code in hivex.c to handle this already but I think it has a typo:
>   /* Deal with the case where Windows has allocated a large buffer
>    * full of random junk, and only the first few bytes of the buffer
>    * contain a genuine UTF-16 string.
>    *
>    * In this case, iconv would try to process the junk bytes as UTF-16
>    * and inevitably find an illegal sequence (EILSEQ).  Instead, stop
>    * after we find the first \0\0.
>    *
>    * (Found by Hilko Bengen in a fresh Windows XP SOFTWARE hive).
>    */
>   size_t slen = utf16_string_len_in_bytes_max (data, len);
>   if (slen > len)
>     len = slen;
>   char *ret = windows_utf16_to_utf8 (data, len);
> slen is only used to increase length of data, but I think it should be
> decreasing it (to stop earlier).

Yes, it's strange -- this does appear to be a bug.


> 2. Non-ascii node names
> I found a node with a \xDC (Ü) in it:
> SOFTWARE\\ODBC\\ODBCINST.INI\\MS Code Page-\xDCbersetzer
> hivex.c has a comment like this:
>   /* AFAIK the node name is always plain ASCII, so no conversion
>    * to UTF-8 is necessary.  However we do need to nul-terminate
>    * the string.
>    */
> I think hivex should convert the node names from CP1252 (or is it
> ISO-8859-1?) to UTF-8.
> Workaround: I do the CP1252 -> UTF8 conversion myself for now

This patch was posted but I didn't apply it because it seems
quite risky:


> 3. node_get_child is slow
> Documentation issue, it should say that using node_get_child is slow
> (because registry doesn't have an index, and you do a linear search).
> Workaround: I create a map of node names to children of a node, a lookup
> in that is faster than using node_get_child repeatedly


> 4. hivexml output is not a well-formed XML
> See problem #1 and #2, if value_string and node_name are fixed to not
> dump the binary garbage and just return UTF8 then I think hivexml's
> output would pass xmllint.

Shoot or fix.


Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]