Heads-up: brand new RPM version about to hit rawhide

Thu Jul 17 10:49:27 UTC 2008

Dan Williams wrote:
> On Tue, 2008-07-15 at 08:15 -0800, Jeff Spaleta wrote:
>> On Tue, Jul 15, 2008 at 8:01 AM, Dan Williams <dcbw at redhat.com> wrote:
>>> Yeah, there is actually a benefit to tarball+patches approach we take
>>> right now; and that benefit is that it's extremely easy to see just what
>>> we've done to the upstream package, and it's usually really easy to
>>> extract those changes and push them upstream.  You don't want a
>>> mega-diff that includes 20 specific patches.
>> I know of at least one example currently in our cvs where we went from
>> a set of separate small patch files to one encompassing patch file.  I
>> think it was a diff from git. If we move to more advanced vcs are we
>> going to have a harder time keeping patches separated? Or is it just a
>> matter of education on how not to reach for the easy to produce mega
>> patch shortcut?
> 
> That's the problem here:  if we move to git (or any DVCS really), as a
> packager you would have to be _much_ more aware of how to use the VCS to
> achieve the same separation of patches and upstream source.  You'd need
> to do something like topic branches for each patch and then merge each
> topic branch into a "release" branch to ensure that each of the patches
> was cleanly separated from the main source.
> 

Well, I spent 4 days learning all of git's sourcecode once (mind you, it
was a long time ago) and just recently I spent that long trying, and
failing, to get the exact sources to the kernel I was running when I found
a bug in it. Since the kernel is managed in git, I was quite appalled to
find out that fedora doesn't have a repo anywhere with tags set so I could
just clone it, check the right version out and fire up a bisect-run.

In the end, I settled for hacking the source rpm to run a "git commit -a"
between each patchfile so I could at least bisect on the result of that,
and then exclude the fedora patches from the list of possible culprits to
my particular problem.

Mind you, with all the hackery I had to go through to get that working, I
can't say for sure that what I was looking at in the end was actually the
sources of my running kernel anyway so it could just as well have been
a complete and utter waste of time.

Anyways, different workflow or not, using a distributed version control
system provides three huge advantages over tarball + patches, namely:
* Endpoint-hacker access to the reason a particular patch is needed.
  Without this, it's extremely tedious to know what to test when altering
  code in the same area a particular other patch touched, and so is much
  more likely to introduce regressions.
* Easy access to the exact revision.
  I won't ever try to debug the fedora kernel again. I'll just clone the
  vanilla kernel tree and find out which version fixes my particular issue or,
  if none of them does, start hacking on the upstream one instead.
  If some issue I'm seeing isn't in the upstream, I'll just report it as being
  caused by one of the patches in fedora. Hardly any work left for the poor
  fedora kernel folks to do what with the >100 patches you apply to the tarball.
* Bisection.
  If you've never used an scm that has a bisect command, you won't know what I'm
  talking about and you won't know what you've missed. It's like telling your
  scm "find which exact revision introduced this bug", and it does it.
  Instead of looking at a sourcetree of 10k-5M LoC you get to see a single patch
  that introduced the bug you're looking for.

> Basically, moving to a DVCS and exploded source trees just raises the
> bar for packagers since they'd have to learn quite a bit about how DVCS
> works.  CVS + tarball + patches are quite easy for most people to learn,
> but a DVCS + branches + merges is much more complicated if the
> changesets are small.  And the changesets should always be small,
> otherwise we're completely failing at pushing stuff upstream.
> 

Why would you have packagers doing merges? They really shouldn't need to do
that. Only developers (and yes, package managers for really complex projects,
like the kernel) will need to know about branches and merges. Package-managers
just need to know how to extract a tarball from a repository, so that's a
single command they need to know about.

> Maybe the fix here is to let package maintainers who want to use a DVCS
> style, and those that don't want to use the old style, and provide the
> ability to switch between the two styles when a new maintainer takes
> over the package?
> 

I think that's what Doug has been after the entire time. Obviously, the
kernel is tons easier to manage if it's all in git, with patches committed
to it as changesets rather than separate files. I'd imagine the same goes
for every other project whose upstream is managed in git as well.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231