[libvirt] [PATCH v3 05/22] build-aux: rewrite whitespace checker in Python

Ján Tomko jtomko at redhat.com
Thu Sep 26 16:08:14 UTC 2019


On Tue, Sep 24, 2019 at 03:58:46PM +0100, Daniel P. Berrangé wrote:
>As part of an goal to eliminate Perl from libvirt build tools,
>rewrite the check-spacing.pl tool in Python.
>
>This was a straight conversion, manually going line-by-line to
>change the syntax from Perl to Python. Thus the overall structure
>of the file and approach is the same.
>
>Signed-off-by: Daniel P. Berrangé <berrange at redhat.com>
>---
> Makefile.am                |   2 +-
> build-aux/check-spacing.pl | 198 --------------------------------
> build-aux/check-spacing.py | 229 +++++++++++++++++++++++++++++++++++++
> cfg.mk                     |   4 +-
> 4 files changed, 232 insertions(+), 201 deletions(-)
> delete mode 100755 build-aux/check-spacing.pl
> create mode 100755 build-aux/check-spacing.py
>
>diff --git a/build-aux/check-spacing.py b/build-aux/check-spacing.py
>new file mode 100755
>index 0000000000..6b9f3ec1ba
>--- /dev/null
>+++ b/build-aux/check-spacing.py
>@@ -0,0 +1,229 @@
>+#!/usr/bin/env python
>+#
>+# Copyright (C) 2012-2019 Red Hat, Inc.
>+#
>+# check-spacing.pl: Report any usage of 'function (..args..)'
>+# Also check for other syntax issues, such as correct use of ';'
>+#
>+# This library is free software; you can redistribute it and/or
>+# modify it under the terms of the GNU Lesser General Public
>+# License as published by the Free Software Foundation; either
>+# version 2.1 of the License, or (at your option) any later version.
>+#
>+# This library is distributed in the hope that it will be useful,
>+# but WITHOUT ANY WARRANTY; without even the implied warranty of
>+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+# Lesser General Public License for more details.
>+#
>+# You should have received a copy of the GNU Lesser General Public
>+# License along with this library.  If not, see
>+# <http://www.gnu.org/licenses/>.
>+
>+from __future__ import print_function
>+
>+import re
>+import sys
>+
>+
>+def check_whitespace(filename):
>+    errs = False
>+    with open(filename, 'r') as fh:
>+        quotedmetaprog = re.compile(r"""'[";,=]'""")
>+        quotedstringprog = re.compile(r'''"(?:[^\\\"]|\\.)*"''')
>+        commentstartprog = re.compile(r'''^(.*)/\*.*$''')
>+        commentendprog = re.compile(r'''^.*\*/(.*)$''')
>+        commentprog = re.compile(r'''^(.*)/\*.*\*/(.*)''')
>+        funcprog = re.compile(r'''(\w+)\s\((?!\*)''')
>+        keywordprog = re.compile(
>+            r'''^.*\b(?:if|for|while|switch|return)\(.*$''')
>+        functypedefprog = re.compile(r'''^.*\(\*\w+\)\s+\(.*$''')
>+        whitespaceprog1 = re.compile(r'''^.*\s\).*$''')
>+        whitespaceprog2 = re.compile(r'''^\s+\);?$''')
>+        whitespaceprog3 = re.compile(r'''^.*\((?!$)\s.*''')
>+        commasemiprog1 = re.compile(r'''.*\s[;,].*''')
>+        commasemiprog2 = re.compile(r'''.*\S; ; .*''')
>+        commasemiprog3 = re.compile(r'''^\s+;''')
>+        semicolonprog = re.compile(r'''.*;[^	 \\\n;)].*''')
>+        commaprog = re.compile(r'''.*,[^ \\\n)}].*''')
>+        assignprog1 = re.compile(r'''[^ ]\b[!<>&|\-+*\/%\^=]?=''')
>+        assignprog2 = re.compile(r'''=[^= \\\n]''')
>+        condstartprog = re.compile(r'''^\s*(if|while|for)\b.*\{$''')
>+        statementprog = re.compile(r'''^[^;]*;[^;]*$''')
>+        condendprog = re.compile(r'''^\s*}\s*$''')
>+

I think even with descriptive names for the regexes and the Python
rewrite, this script is hard to read and comes too close to be a C
parser.

Also, the execution time goes from 1.5s to 6.5s, which is longer than
all the other checks combined run on my 8-core machine.

Can we get rid of it completely in favor of some proper formatting tool?

I have played with clang-format to try to match our style, the main
problems seem to be:
* pre-processor directives are indented by the same offset as code
  (I see that cppi is specifically intended to add just one space)
* function calls sometimes leave an empty opening parenthesis
* always rewrapping function arguments might create unnecessary churn
* parameters wrapping might not be smart enough, e.g. we like to do
  virReportError(VIR_ERR_CODE, "%s",
                 _("string"));
  in a lot of places.

On the plus side, we would be able to properly check for spacing after
casts.

Jano

>+        incomment = False
>+        # Per-file variables for multiline Curly Bracket (cb_) check
>+        cb_lineno = 0
>+        cb_code = ""
>+        cb_scolon = False
>+
>+        lineno = 0
>+        for line in fh:
>+            lineno = lineno + 1
>+            data = line
>+            # For temporary modifications
>+
>+            # Kill any quoted , ; = or "
>+            data = quotedmetaprog.sub("'X'", data)
>+
>+            # Kill any quoted strings
>+            data = quotedstringprog.sub('"XXX"', data)
>+
>+            if data[0] == '#':
>+                continue
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190926/99d88921/attachment-0001.sig>


More information about the libvir-list mailing list