[Pulp-dev] Proposal: Changing in 3.1 that Artifact.save() will hard-link/copy, not move

Brian Bouterse bmbouter at redhat.com
Wed Jan 8 23:10:01 UTC 2020


It's not finished, but I have a POC with tests passing locally for me here:
https://github.com/pulp/pulpcore/pull/486 It does use the reflink library
and falls back to copy.

What still needs to be done is:
* To ensure files are being cleaned up in the case of uploads.
* Also I think we have a bug where some cases may not cleanup the worker
directory which would now leave those files around forever.

I will resume on those things tomorrow.

On Wed, Jan 8, 2020 at 12:32 PM Brian Bouterse <bmbouter at redhat.com> wrote:

>
>
> On Wed, Jan 8, 2020 at 12:02 PM Simon Baatz <gmbnomis at gmail.com> wrote:
>
>> On Tue, Jan 07, 2020 at 04:45:54PM -0500, Brian Bouterse wrote:
>> >    We had two bugs filed recently [0][1] which suggest that when using
>> the
>> >    default backend for Pulp, i.e. pulpcore.app.models.storage.FileSystem
>> >    Pulp should not be "moving" files. This is the default behavior
>> Django
>> >    gives us, and it destroys data when sync'ed from file:/// for example
>> >    [1].
>>
>> I am surprised that file:/// should be supported at all. This means
>> that a worker process must be able to access basically any file on the
>> system. Is this covered by the (upcoming?) SELinux policy? I would
>> expect workers to be more constrained than that. Or do we expect
>> the user to label files before importing?
>>
> The user would need to label the files ahead of time. The use case is that
> a large amount of content is stored on a hard drive which is mounted on the
> worker. A lot of setups of Pulp use this as a mass-import method before
> switching their sync's back to the CDN.
>
>
>>
>> >    I propose that with 3.1 we fix this bug by switching
>> >    pulpcore.app.models.storage.FileSystem to leave files in place and
>> >    either hard-link (same filesystem) or copy (different filesystem).
>>
>> We should not use hard links:
>>
>> - On systems with sysctl "fs.protected_hardlinks" enabled (Ubuntu,
>> Fedora),
>>   the files would have to be owned by the pulp user to be able to
>>   create hard-links at all.
>> - On systems with SELinux enabled, these hard links will share
>>   the labels.
>> - Imported hard links are not protected against modification of the
>>   original file.
>>
>> We could try to reflink the file and fall-back to copy (like "cp
>> --reflink=auto" does)
>>
> Great idea; let me try doing it this way. I'll link to my POC pr when it's
> up. Feel free also to share one.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200108/5910c1ec/attachment.htm>


More information about the Pulp-dev mailing list