[libvirt PATCH] ci: gitlab: Expire artifacts after 1 day

Erik Skultety eskultet at redhat.com
Thu Jun 2 14:41:19 UTC 2022


On Thu, Jun 02, 2022 at 03:20:17PM +0200, Peter Krempa wrote:
> On Thu, Jun 02, 2022 at 15:02:24 +0200, Erik Skultety wrote:
> > With GitLab cutting down on shared resource usage it's very likely that
> > following our measure to decrease the number of CI minutes we'll also
> > need to decrease our usage of storage. Start by decreasing artifact
> > expiration time to 1 day for jobs that are currently exceeding it (by a
> > lot -> 30 days). At the same time, define expiration on the integration
> > jobs' artifacts where there currently isn't one defined.
> > Although 1 day doesn't seem to be enough of a time period, given the
> > cadency of libvirt pipeline executions it should suffice giving
> > everyone/jobs enough time to download artifacts if needed.
> > 
> > Signed-off-by: Erik Skultety <eskultet at redhat.com>
> > ---
> >  .gitlab-ci.yml              | 4 ++--
> >  ci/integration-template.yml | 1 +
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> > index 6a8b89729f..1b39047862 100644
> > --- a/.gitlab-ci.yml
> > +++ b/.gitlab-ci.yml
> > @@ -74,7 +74,7 @@ website:
> >      expose_as: 'Website'
> >      name: 'website'
> >      when: on_success
> > -    expire_in: 30 days
> > +    expire_in: 1 day
> >      paths:
> >        - website
> >  
> 
> Note that this automatically propagates into jobs run on other repos.
> Adding links to artifacts showing changes to a web page is very useful
> and thus retaining them only for 1 day will prevent reviewers from
> looking at them.
> 
> For other use as our mirroring job and such it's probably fine, but we
> should keep this at least at 2 weeks unless you figure out how to set
> this based on the repository name.
> 
> Said that I wanted to object that this is negligible even retaining a
> month of webpages but looking at the current state:
> 
> There's 8 pages of CI runs in last month, 15 pipelines per page. That
> equates to 120 pipeline runs. With 7.7MiB per run that's 924MiB of
> mostly useless copies of the same thing.
> 
> Unless we figure out how to change this per repo name, please modify the
> website job to minimum of 15 days to give reviewers some time., that's
> still halving the required space.

Since variable expansion apparently doesn't work with the expire_in
clause yet [1], the only thing that comes to mind is to define 2 jobs for the
website and with the usage of 'rules' generate the correct one depending on
whether this is upstream or a fork. Yes, it's ugly, but it would work.

On the other hand, 924MiB isn't a tragedy.

> 
> P.S:
> 
> We have FAR bigger problems with retaining logs of all builds
> indefinitely. Based on my rough calculation 1 average run of our CI
> produces ~11MiB of logs (based on ~46GiB of total reported size of
> artifacts by gitlab, ~4000 ci runs on upstream)
> 
> I've confirmed that logs are counted towards artifacts empirically by
> deleting all but 1 CI run in my repo which only has the webpage
> artifacts (7.7MiB unpacked), yet gitlab reported 28 MiB of total usage
> for artifacts.

Well, the worse news is that we cannot run a CI job to prune the artifacts
automatically, because one would need a personal access token for that. Why
personal access token? Because Project/Group access tokens are apparently
unavailable on GitLab SaaS if you're not a paying customer (which is weird,
because most other features from Ultimate are enabled) [2] and in fact I don't
see the described setting under our group/project just like this guy [3].

The problem with Personal access tokens is that there are security implications
tied to them:
- linked to a specific user
- other users with high enough privileges could see the token
- wrong settings can lead to leaking of that token which could expose all
  repositories of that user

One possible solution would be to create a member service account with no
repositories. The password would only be available to the maintainers. AFAIK
you don't need full API access to purge pipelines, IOW read_api permissions
should suffice which means it would not make this such an awfully ugly solution.

The proper solution would be to use CI/CD job tokens because these are
ephemeral by design, however they have no permission granularity settings and
so cannot be used with all of the API endpoints (purging artifacts being one of
them) :(.

The least favourable solution IMO (but 100% functional) would be for one of the
to set up a cron job on a private machine using their private access token
which nobody could see and purge them from "the outside".

[1] https://docs.gitlab.com/ee/ci/variables/where_variables_can_be_used.html#gitlab-ciyml-file
[2] https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html
[3] https://forum.gitlab.com/t/project-access-token-isnt-visible/51701

In any case I'll put this patch on hold until we have a clear idea what the
best course of action is - we still have a month™.

Erik



More information about the libvir-list mailing list