[Libguestfs] XML encoding of the registry

Simson Garfinkel simsong at acm.org
Sat Mar 20 16:18:58 UTC 2010


> 
> 
> One issue that may be of concern is string encoding in registry
> values, which is not well defined.  Naturally for XML I suppose you'd
> want to represent string values as UTF-8.  However it's almost
> impossible to know for sure how strings are encoded in the registry,
> so doing this conversion would either involve a heuristic, or you'd
> have to store binary blobs in the XML (encoded as Base64 or as hex
> digits).  The registry is a mess in this respect.

Rich,

We have encountered the same problem with the XML encoding of file names. Sometimes they are in ASCII, sometimes they are in a code page, sometimes they are in UTF-8, and sometimes they are in corrupt UTF-8.

This is the approach we are using:

1. Represent everything that can be represented in UTF-8 as UTF-8.
2. If something can't be shown as UTF-8, then we add a "coding='base64'" attribute the XML tag and represent it as Base64

We would like to replace #2 with an explicit encoding of the invalid characters as Unicode entities, but we haven't written that. 

> 
> [...]
>> You can find an example of the digital forensics XML at:
>> http://www.forensicswiki.org/wiki/Fiwalk
> 
> Looks interesting.  It should be easily possible to get libguestfs to
> write this format for disk images.  There is already a (trivial) demo
> program I wrote along those lines:
> 
> http://git.annexia.org/?p=libguestfs.git;a=blob;f=examples/to-xml.c;hb=HEAD

Thanks. I'll check that out.  We've made a lot of progress writing program in Python that process the Digital Forensics XML, and it is proving to be a good approach for integrating a range of computer forensic tools. You may be interested in my paper:

Garfinkel, Simson., Automating Disk Forensic Processing with SleuthKit, XML and Python, Systematic Approaches to Digital Forensics Engineering (IEEE/SADFE 2009), Oakland, California. 
	http://simson.net/clips/academic/2009.SADFE.xml_forensics.pdf

> 
> - - -
> 
> If you have changes for libguestfs or hivex, please submit them to
> this mailing list as for any open source project:
> 
> http://people.redhat.com/~rjones/how-to-supply-code-to-open-source-projects/

Thanks. My understanding is that the current code does not build on MacOS. I was just going to download the GIT repository and have at it, but I was not sure how to send back changes. 





More information about the Libguestfs mailing list