How to best handle NULL return from xmlNodeGetContent()

Tue Jul 7 22:48:57 UTC 2020

libvirt has several uses of xmlNodeGetContent() (from libxml2) added at 
different times over the years. Some of those uses report an Out of 
Memory error when xmlNodeGetContent() returns NULL, and some of them 
ignore a NULL return (treating it as if it were ""), and some just 
assume that the return will never be NULL, but always at least a pointer 
to "".

I ran across this when I noticed a usage of the latter type - it wasn't 
checking for NULL at all. A lack of check seemed troubling, so I looked 
at other uses within libvirt and found the hodge-podge described above, 
so no help there in determining the right thing to do. I then looked at 
the libxml2 documentation for xmlNodeGetContent(), which says:

   Returns: a new #xmlChar * or NULL if no content is available.

To an uninformed outsider, this sounds like the function could return 
NULL simply if the node was empty (e.g. "<wwn/>"). But when we look at 
the return from xmlNodeGetContent() for this example, it says that the 
content is "", not NULL.

In the meantime, since libxml doesn't abort on OOM errors (as libvirt 
does), it could also be possible that it's returning NULL due to OOM. So 
using anecdotal evidence acquired so far, one *could* surmise that any 
time libvirt gets a NULL return from xmlNodeGetContent(), it is indeed 
an OOM error.

The purist in me thinks that isn't right, though - I took a quick look 
at the libxml code and saw cases where  it returns NULL that don't seem 
related to OOM, but rather to the type of node or something. But being 
an outsider and not wanting to learn any more than necessary about the 
internals of libxml, I'm not sure if any of those cases even apply to 
libvirt's simple use of xmlNodeGetContent().

So, in the end I just want to modify libvirt's dozen or so calls to 
xmlNodeGetContent() to consistently do the right thing, but first I want 
to learn the true answers to these questions:

1) Keeping in mind that we've already successfully parsed the XML, will 
calls to xmlNodeGetContent() in the simple cases as when libvirt calls 
it only return NULL for OOM, but not for any other reason?

2) If not, is the proper way to distinguish OOM in this case to call 
xmlGetLasterror(), and check if the domain is XML_FROM_MEMORY?

3) Aside from returning NULL in the case of errors, would it ever be 
possible for correct XML to return NULL as valid "node content", or is 
it always an error of some kind?

Since libvirt now aborts on OOM, an OOM error could be handled in one 
place by a wrapper function around xmlNodeGetContent() (we already have 
such a function, currently a one-liner passthrough, and not called by 
everyone). But if there is any chance that any other libxml error could 
be encountered, then I suppose we really should be reporting those 
without aborting, and then still checking for NULL on return from the 
wrapper function (presumably by just logging the contents of "message" 
from the xmlErrorPtr returned from xmlGetLastError().