[Pulp-dev] Issue #2619
Milan Kovacik
mkovacik at redhat.com
Fri Apr 27 21:33:27 UTC 2018
Folks,
I'd like to poll the channel for feedback about current implementation
and possible alternative(s) to it.
Issue #2619 TL;DR: report discrepancies between information kept in
Mongo and the state of (published) data kept on the disk[1]
Recent reviews are suggesting to base the implementation on top of
relational data, kept in SQLite:
- collect traits from a filesystem walk (checksums, sizes, link targets&paths)
- store these traits in separate tables
- dump Mongo into unit, distributor (configuration) and repository tables
- query the relational data to infer any discrepancies e.g broken
symlinks, wrong sizes or checksums
- reuse the database for generating consequent reports
Current approach[2] TL;DR:
- assemble a validation scenario based on CLI arguments e.g:
--check existence --check broken_rpm_symlinks --check size --check checksum
- one by one, match the applicable content units from Mongo against
the validation scenario and filesystem traits
- optionally skip checks that would fail for a unit e.g checksum
after invalid size
- generate a report as a flat list of results per unit and
repository, in JSON format
- perform consequent queries over the generated JSON report e.g
jq '[.report[].repository] | unique' < report.json
to get a list of affected repositories
There are some caveats with the current approach, such as:
- in some cases, traits are first loaded into the memory from a
filesystem walk e.g symlink targets
- some repeated mongo queries are cached as well e.g distributors and
repositories
- detecting broken symlinks still gives false negatives in corner cases
Cheers,
milan
[1] https://pulp.plan.io/issues/2619
[2] https://github.com/pulp/pulp_rpm/pull/1104
https://github.com/pulp/pulp/pull/3465
More information about the Pulp-dev
mailing list