[Libguestfs] 1.39 proposal: Let's split up the libguestfs git repo and tarballs

Mon Jul 1 16:10:48 UTC 2019

On Monday, 10 June 2019 17:35:52 CEST Richard W.M. Jones wrote:
> Sorry for the late reply to this ...
> 
> On Tue, Apr 30, 2019 at 06:28:01PM +0200, Pino Toscano wrote:
> > On Friday, 9 February 2018 19:01:53 CEST Richard W.M. Jones wrote:
> > > My contention is that the libguestfs git repository is too large and
> > > unwieldy.  There are too many separate, unrelated projects and as a
> > > result of that the source has too many dependencies and takes too long
> > > to build and test.
> > > 
> > > The project divides (sort of) naturally into layers -- the library,
> > > the bindings, the various virt tools -- and could be split along those
> > > lines into separate projects which can then be released and evolve at
> > > their own pace.
> > 
> > As also other answers to this email say, splitting tools, and bindings
> > may be very complex, and thus for now it is still a too far goal.
> > 
> > However...
> > 
> > > My suggested split would be something like this:
> > > 
> > > [...]
> > >        virt-v2v and virt-p2v
> > 
> > I'd rather split virt-p2v in its own repository.  There are various
> > reasons for this:
> > - it does not use libguestfs (the library), just the tools for testing
> >   stuff
> > - the communication with virt-v2v is done via network, and its
> >   capabilities are dynamically probed (so theoretically virt-p2v, and
> >   virt-v2v can be used even when their versions are odd)
> > - it is written only in C
> > 
> > However, even if it looks simple, in reality there are number of common
> > things used from the rest of the libguestfs tree:
> > 1) gnulib
> 
> We hardly use gnulib in virt-p2v.  I think it's only used for
> ignore-value.h, getprogname.h, and c-ctype.h, all of which are likely
> to be easily worked around.

True, however for now it can stay, as it is one obstacle less for the split.

> > 3) auto-cleanup bits (e.g. CLEANUP_FREE), although only few are used
> >    (CLEANUP_FREE, CLEANUP_FREE_STRING_LIST, CLEANUP_PCLOSE,
> >    CLEANUP_FCLOSE, and CLEANUP_XMLFREETEXTWRITER)
> > 4) other internal macros, i.e. guestfs-utils.h
> 
> Common code is a bit tricker, as is ...

So far it is ~4K of bits of code copied, with ~9K more of straight
copies of libxml2-cleanups.c + libxml2-writer-macros.h from
common/utils.

> > 5) the list of credits generated by the generator
> >    (i.e. generator/authors.ml)
> > 6) the p2v configuration generated by the generator
> >    (i.e. generator/p2v_config.ml)
> 
> ... the generator and ...

(5) is more shared with the rest, while (6) is basically p2v-only
material.

> > 7) test images/data (phony images, and virt-tools)
> 
> test data.

Luckly this is easy to recreate locally.

> > 8) the miniexpect module, right now out of the p2v subdirectory
> 
> This is only used by virt-p2v I think, so it could go with virt-p2v or
> be made into a separate project.

Right, the upstream is somewhere else, so another "import from $URL"
commit will not be any worse than what we have now.

> > Possible solutions may/might be:
> > 1) add own submodule (use its own set of modules)
> 
> I think we should ditch gnulib as much as possible, so see above.\

Surely we can work on removing it after the split, step by step, if
needed/wanted.

> So while I'm not a massive fan of git submodules, now that I have used
> them a few times with riscv stuff, they do solve a certain problem as
> long as they are managed carefully.  I think the common code and the
> generator are cases where a submodule or two would work.

TBH I've always found submodules tricky and problematic to use:
- they are fixed to a certain revision (so no way to dynamically follow
  the branch of another repo)
- the URL is the same for all the users, meaning you cannot reuse the
  same authenticated/secure protocols that your repo has
- they create a certain burden when switching to a tag/branch/commit
  whose revision of a submodule is different than what is at the current
  branch
- even more problematic when switching commit, and in the old commit
  a subdirectory is a real directory while in the latest HEAD is a
  submodule (or viceversa)

> Does this mean we need to move immediately to a submodule if just
> splitting virt-p2v, or copy code as you suggest?  Maybe not, because
> you can imagine for just this project copying the code needed from the
> common/ directory, and creating a new "mini-generator" for the project
> which handles the little bits that need to be generated in virt-p2v.

I'm actually solving in a different way, i.e. avoiding altogether the
generator for p2v stuff.

> However in the long term if we split up everything a submodule or two
> does seem to make sense, so maybe we should start there?

ATM I have enough work needed just to split p2v, so I'd prefer to delay
this conversation to a later time...

> > The other problem is how to split the repository, as the various bits
> > are in different places:
> > a) git filter-branch --subdirectory-filter p2v
> > + very small repo with the current p2v subdirectory
> > + preserves the history of the p2v subdirectory, with branches and tags
> > - missing all the other bits, which will have no history
> > - not usable to build older releases (e.g. for bisecting)
> 
> I'm not exactly sure what this does.  Is this something to do with
> preserving the history?  TBH I don't think we need to bother with the
> history -- it exists still in libguestfs.git.

Yes, this is for preserving history, at least for the most important
parts (the sources of p2v).

-- 
Pino Toscano
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190701/0bbb4489/attachment.sig>