rpmreaper

Panu Matilainen pmatilai at laiskiainen.org
Thu Jun 5 07:15:43 UTC 2008


On Wed, 4 Jun 2008, Steve Grubb wrote:

> On Wednesday 04 June 2008 02:55:19 Panu Matilainen wrote:
>> On Tue, 3 Jun 2008, Steve Grubb wrote:
>>> Not really. Bash has been patched to to spit out the programs it calls
>>> (/bin/bash --rpm-requires). So, its a matter of overriding
>>> %__find_requires to run a program that gathers the information for shell
>>> scripts and falls back to the old way for others.
>>>
>>> No one should have to specify this, it can be automated easily. Without
>>> taking shell scripts into account, you run the risk of breaking
>>> unspecified requirements.
>>
>> I wish it were that simple.
>>
>> "bash --rpm-requires" does a fair job for the impossible task, but it
>> produces way too much bogus information and false positives to be
>> generally usable as is. A quick check at various scripts found on a stock
>> F9 system shows at least these problems:
>>
>> 1) It mistakes functions declared in sourced scripts as executables
>> 2) It mistakes functions used before declared as executables
>
> In my opinion, these ^^ should be fixed.

Yup, that'd be the first step in making --rpm-requires actually usable 
beyond just curiosity.

>> 3) It thinks of sourced scripts as executables
>
> In a sense, they are. My init scripts source /etc/init.d/functions, so that is
> a real dependency.

It's a real dependency yes, but sourced files need not be *executable*, 
they just need to be there. Whether the difference matters depends on 
later implementation details: if PATH or executable bits of files are 
involved, sourced files need to be separated from executables, 
file(/some/path) notation or such.

>> 4) It produces hard dependencies for conditional items
>
> I agree this is a problem. I think it gets worse the further nested a program
> would be in if staements. But as a first pass, one could fix it to only check
> files not within a if statement and add logic later to go deeper. Something
> is better than nothing as right now we do not capture shell script
> dependencies and they *are* real.

Ignoring dependencies from all conditional execution paths (except 
constant conditions like "while [ 1 -eq 1 ]") is the only 100% correct and 
safe thing you can do. Beyond that, bash simply cannot know whether 
something is a hard dependency or not at package build time. So either the 
conditional paths are ignored, or you live with the fact that you WILL 
need to filter out dependencies manually.

If bash could classify it's findings into conditional and unconditional, 
that'd at least make life easier for the human filtering the deps.

Initscripts (and mkinitrd) might well be about the worst case you can get 
for this, as they do a whole lot of things like "if x happens to be 
installed then enable/do something with it, otherwise it doesn't matter" 
which should not be turned into hard dependencies. A good example is 
rhgb-client - you can bet that lot of folks would be upset if that was 
made into hard dependency of initscripts :)

>> 5) For most executables, path is unknown
>
> There is a standard PATH that the distribution expects. So there is some
> defined search order. I solved this in the build system I wrote by keeping a
> list of all files installed by rpm as packages were built. Then the
> find-requires script would resolve the name to full path based on the
> standard PATH. This is solvable.

It's solvable by various means, yes. Anything requiring rpm to be aware of 
distribution contents at build time is not really a generic solution 
though.

>> Assuming 1-3) are fixed and ignoring 4), 5) could be dealt with, at least
>> to some extent, but it's a big can of worms too. For the dependencies to
>> be discoverable by yum & friends, there would have to be matching provides
>> for all executable(foo) items bash --rpm-requires produces.
>>
>> Rpm could automatically add Provides: executable(foo) for any file with
>> executable bits on, but it would cause *enormous* bloat of metadata.
>
> Bloat, to me, means something that would never be used. If the dependencies
> are real, they should be captured. Do you need to have the dependency at the
> file level or package level? Maybe that reduces some of the metadata?

Note that I wasn't speaking of requires, but provides to satisfy the 
requires IF automatically added as

     Provides: executable(<basename of executable>)

for all executable files (in system PATH or otherwise). Most of those 
provides would never be used by anything so they would be nothing but 
bloat.

Resolving file dependencies into packages at build time would require rpm 
to be aware of "outside world", ie what's available in repositories, and 
would require unnecessary rebuilds on package splits and renames (the good 
old "file dependencies considered harmful or not" issue)

>> So solving 5) should be possible if 1-3) were fixed, but it'd still be
>> pretty moot because 4) can't generally be solved (apart from manually
>> filtering bogus dependencies, at which point it's hardly "easily
>> automated" :)
>
> I don't think #4 is impossible. Its not easy either. But I think we could get
> a first pass that is pretty good and make it better over time. Right now, we
> capture nothing. So, a first pass solution that captures 25% accurately is
> better than where we are.

Except for unconditional execution, #4 is impossible to solve 
programmatically. The moment you start down the conditional paths, it's 
just blind poking around in the dark - heuristics based on no knowledge at 
all. Even if you assume access to the distributions file list, there's no 
way to tell if something is intentionally optional (in which case making 
it a hard requirement would be an error) or not, or if a given condition 
is supposed to ever occur on the target platform and version.

> In the build system I wrote, I lumped #4 and #5 together and solved them with
> a lookup table. It was good enough for my needs. If I resolved the path, the
> dependency was recorded. If not, I didn't record it. So,
> if /sbin/solaris-specific was not in my distribution's file list, it was
> quietly removed from the possible dependencies.

See above, this still catches all sorts of things that should not be hard 
dependencies.

Mind you, I don't disagree with the goal at all: automatically recording 
unconditional script dependencies would be a very good thing if it can be 
made reliable - fixing 1-2) would be a start.

 	- Panu -




More information about the fedora-devel-list mailing list