[RFC 01/21] build-aux: Add a tool to generate xml parse/format functions

Mon Jun 29 12:36:11 UTC 2020

On Wed, Jun 10, 2020 at 09:20:29AM +0800, Shi Lei wrote:
> This tool is used to generate parsexml/formatbuf functions for structs.
> It is based on libclang and its python-binding.
> Some directives (such as genparse, xmlattr, etc.) need to be added on
> the declarations of structs to direct the tool.
> 
> Signed-off-by: Shi Lei <shi_lei at massclouds.com>
> ---
>  build-aux/generator/directive.py | 839 +++++++++++++++++++++++++++++++
>  build-aux/generator/go           |  14 +
>  build-aux/generator/main.py      | 416 +++++++++++++++
>  build-aux/generator/utils.py     | 100 ++++

FWIW, we've tried to standardize on scripts/ directory for any helper
scripts used during the build process. Calling it "generator" is probably
too broad, as we've lots of diffferent generators.

I think this would would be a good fit as scripts/xmlgen/

>  po/POTFILES.in                   |   1 +
>  5 files changed, 1370 insertions(+)
>  create mode 100644 build-aux/generator/directive.py
>  create mode 100755 build-aux/generator/go
>  create mode 100755 build-aux/generator/main.py
>  create mode 100644 build-aux/generator/utils.py

> +T_CLEAR_FUNC_IMPL = '''
> +void
> +${typename}Clear(${typename}Ptr def)
> +{
> +    if (!def)
> +        return;
> +
> +    ${body}
> +}
> +'''
> +
> +T_CLEAR_FUNC_DECL = '''
> +void
> +${typename}Clear(${typename}Ptr def);

In general I think we have a preference for ${typename}Free
functions, rather than ${typename}Clear.

A Clear() func is needed when a struct is often stack allocated,
or when the method allocating & free'ing the struct is different
from the method clearing it. The latter case is generally considered
bad practice - it is safer and easier to verify code when clearing
and freeing is done at the same same.

> +def clearMember(member):
> +    mtype = TypeTable().get(member['type'])
> +
> +    refname = 'def->%s' % member['name']
> +    if member.get('array'):
> +        refname += '[i]'
> +
> +    code = ''
> +    if mtype['meta'] == 'Struct':
> +        if member['pointer'] and not mtype.get('external'):
> +            code = '%sClear(%s);' % (mtype['name'], refname)
> +            code += '\nVIR_FREE(%s);' % refname

We should use g_free() for any new code.

> +        else:
> +            code = '%sClear(&%s);' % (mtype['name'], refname)
> +    elif mtype['name'] == 'String':
> +        code = 'VIR_FREE(%s);' % refname
> +    elif mtype['name'] in ['Chars', 'UChars']:
> +        code = 'memset(%s, 0, sizeof(%s));' % (refname, refname)
> +    elif not member.get('array'):
> +        code = '%s = 0;' % refname
> +
> +    if member.get('specified'):
> +        assert not member.get('array'), "'specified' can't come with 'array'."
> +        code += '\n%s_specified = false;' % refname
> +
> +    if member.get('array') and code:
> +        counter = counterName(member['name'])
> +        if singleline(code):
> +            code = render(T_LOOP_SINGLE, counter=counter, body=code)
> +        else:
> +            code = render(T_LOOP_MULTI,
> +                          counter=counter, body=indent(code, 2))
> +        code += '\nVIR_FREE(def->%s);' % member['name']
> +        code += '\ndef->%s = 0;' % counter
> +
> +    return code
> +
> +
> +T_CLEAR_NAMESPACE = '''
> +if (def->namespaceData && def->ns.free)
> +    (def->ns.free)(def->namespaceData);
> +'''
> +
> +
> +def makeClearFunc(writer, atype):
> +    if 'genparse' not in atype:
> +        return
> +
> +    blocks = BlockAssembler()
> +    for member in atype['members']:
> +        blocks.append(clearMember(member))
> +
> +    if 'namespace' in atype:
> +        blocks.append(T_CLEAR_NAMESPACE.strip())
> +
> +    body = indent(blocks.output('\n\n'), 1)
> +
> +    impl = render(T_CLEAR_FUNC_IMPL, typename=atype['name'], body=body)
> +    writer.write(atype, 'clearfunc', '.c', impl)
> +
> +    decl = render(T_CLEAR_FUNC_DECL, typename=atype['name'])
> +    writer.write(atype, 'clearfunc', '.h', decl)

Bearing in mind the above, I'd say we should create Free() funcs,
not Clear() funcs here.

> +
> +
> +#
> +# Templates for parsing member block
> +#
> +T_READ_ATTR_BY_PROP = '${mdvar} = virXMLPropString(node, "${tname}");'
> +T_READ_ELEM_BY_PROP = '${mdvar} = virXMLChildNode(node, "${tname}");'
> +T_READ_ELEM_CONTENT = '${mdvar} = virXMLChildNodeContent(node, "${tname}");'
> +
> +T_PARSE_MEMBER_MORE = '''
> +${number} = virXMLChildNodeSet(node, "${tname}", &nodes);
> +if (${number} > 0) {
> +    size_t i;
> +
> +    if (VIR_ALLOC_N(def->${name}, ${number}) < 0)
> +        goto error;
> +
> +    for (i = 0; i < ${number}; i++) {
> +        xmlNodePtr tnode = nodes[i];
> +        ${item}
> +    }
> +    def->${counter} = ${number};
> +    VIR_FREE(nodes);
> +} else if (${number} < 0) {
> +    virReportError(VIR_ERR_XML_ERROR, "%s",
> +                   _("Invalid ${tname} element found."));

For error messages we should avoid using python string substitution.
Instead we want todo

    virReportError(VIR_ERR_XML_ERROR,
                   _("Invalid %s element found.").
		   "${tname}");

This means that when we repeat this code with many different values
of ${tname}, we're only getting 1 string for the translation team
to deal with.

The same point for other virReportError error callsin this file,
I won't mention them all individually.

> +T_ALLOC_MEMORY = '''
> +if (VIR_ALLOC(def->${name}) < 0)
> +    goto error;
> +'''

We prefer  g_new0 for new code now.

As a general rule for any code, if there is a GLib API that works
then we should use that, instead of a libvirt API.

> +T_PARSE_FUNC_DECL = '''
> +int
> +${funcname}(${args});
> +'''
> +
> +T_PARSE_FUNC_IMPL = '''
> +int
> +${funcname}(${args})
> +{
> +    ${declare_vars}
> +    VIR_USED(instname);
> +    VIR_USED(opaque);

There should not be any need for this macro. Instead change
the function declaration to add G_GNUC_UNUSED for "instname"
and "opaque". This annotation is harmless if the variable
really is used.

> +
> +    if (!def)
> +        goto error;
> +
> +    ${body}
> +
> +    return 0;
> +
> + error:
> +    ${cleanup_vars}
> +    ${typename}Clear(def);

IIUC, the caller of this method is responsible for allocating
'def', and so the caller should be responsible for running
${typename}Free(def). IOW, we should not need to call Clear
for this.

> +    return -1;
> +}

> +def _handleTmpVars(tmpvars):
> +    heads, tails = [], []
> +    tmpvars = dedup(tmpvars)
> +    for var in tmpvars:
> +        if var == 'nodes':
> +            heads.append('xmlNodePtr *nodes = NULL;')
> +            tails.append('VIR_FREE(nodes);')
> +        elif var.endswith('Str'):
> +            heads.append('g_autofree char *%s = NULL;' % var)
> +        elif var.endswith('Node'):
> +            heads.append('xmlNodePtr %s = NULL;' % var)
> +        else:
> +            assert var.endswith('Nodes') and var.startswith('n')
> +            heads.append('int %s = 0;' % var)
> +
> +    return '\n'.join(heads), '\n'.join(tails)

One thing I noticed in the generated code is that we;re
duplicating the strings twice:

Consider this snippet of generated code:

    g_autofree char *nameStr = NULL;

    nameStr = virXMLPropString(node, "name");
    if (nameStr == NULL) {
        virReportError(VIR_ERR_XML_ERROR,
                       _("Missing 'name' setting in '%s'"),
                       instname);
        goto error;
    }

    def->name = g_strdup(nameStr);

virXMLPropString already  duplicates the string, and then
we g_strdup again. So we should just get rid of the
g_autofree and directly assign def->name = nameStr.

> diff --git a/build-aux/generator/go b/build-aux/generator/go
> new file mode 100755
> index 0000000..9e30e08
> --- /dev/null
> +++ b/build-aux/generator/go
> @@ -0,0 +1,14 @@
> +# This is a command-line tool
> +
> +libclang_line=`ldconfig -p | grep libclang`

This needs to be libclang.so, otherwise it can match on the
wrong library:

   $ ldconfig -p | grep libclang
	libclang.so.10 (libc6,x86-64) => /lib64/libclang.so.10
	libclang.so (libc6,x86-64) => /lib64/libclang.so
	libclang-cpp.so.10 (libc6,x86-64) => /lib64/libclang-cpp.so.10
	libclang-cpp.so (libc6,x86-64) => /lib64/libclang-cpp.so

> +export libclang_path=`expr "$libclang_line" : '.* => \(.*\)$'`
> +if test -z "$libclang_path"; then
> +    echo "libclang is required by libvirt\n"
> +    exit -1
> +fi

> diff --git a/build-aux/generator/main.py b/build-aux/generator/main.py
> new file mode 100755
> index 0000000..78a90f1
> --- /dev/null
> +++ b/build-aux/generator/main.py
> @@ -0,0 +1,416 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2020 Shandong Massclouds Co.,Ltd.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +#

> +if __name__ == "__main__":

> +
> +    libclang_path = os.environ.get('libclang_path')
> +    assert libclang_path, 'No libclang library.'
> +    Config.set_library_file(libclang_path)

I'm wondering if we really need this at all ? AFAICT, the libclang
python library works fine without it, so this is only required if
trying to use a non-standard libclang, which I think ought to be
possible already by setting LD_LIBRARY_PATH

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|