[dm-devel] [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload

Damien Le Moal Damien.LeMoal at wdc.com
Thu Jan 9 05:56:07 UTC 2020


On 2020/01/09 12:19, Bart Van Assche wrote:
> On 2020-01-07 10:14, Chaitanya Kulkarni wrote:
>> * Current state of the work :-
>> -----------------------------------------------------------------------
>>
>> With [3] being hard to handle arbitrary DM/MD stacking without
>> splitting the command in two, one for copying IN and one for copying
>> OUT. Which is then demonstrated by the [4] why [3] it is not a suitable
>> candidate. Also, with [4] there is an unresolved problem with the
>> two-command approach about how to handle changes to the DM layout
>> between an IN and OUT operations.
> 
> Was this last discussed during the 2018 edition of LSF/MM (see also
> https://www.spinics.net/lists/linux-block/msg24986.html)? Has anyone
> taken notes during that session? I haven't found a report of that
> session in the official proceedings (https://lwn.net/Articles/752509/).

Yes, I think it was discussed but I do not think much progress has been
made. With NVMe simple copy added to the potential targets, I think it
is worthwhile to have this discussion again and come up with a clear plan.

> 
> Thanks,
> 
> Bart.
> 
> 
> This is my own collection with two year old notes about copy offloading
> for the Linux Kernel:
> 
> Potential Users
> * All dm-kcopyd users, e.g. dm-cache-target, dm-raid1, dm-snap, dm-thin,
>   dm-writecache and dm-zoned.
> * Local filesystems like BTRFS, f2fs and bcachefs: garbage collection
>   and RAID, at least if RAID is supported by the filesystem. Note: the
>   BTRFS_IOC_CLONE_RANGE ioctl is no longer supported. Applications
>   should use FICLONERANGE instead.
> * Network filesystems, e.g. NFS. Copying at the server side can reduce
>   network traffic significantly.
> * Linux SCSI initiator systems connected to SAN systems such that
>   copying can happen locally on the storage array. XCOPY is widely used
>   for provisioning virtual machine images.
> * Copy offloading in NVMe fabrics using PCIe peer-to-peer communication.
> 
> Requirements
> * The block layer must gain support for XCOPY. The new XCOPY API must
>   support asynchronous operation such that users of this API are not
>   blocked while the XCOPY operation is in progress.
> * Copying must be supported not only within a single storage device but
>   also between storage devices.
> * The SCSI sd driver must gain support for XCOPY.
> * A user space API must be added and that API must support asynchronous
>   (non-blocking) operation.
> * The block layer XCOPY primitive must be support by the device mapper.
> 
> SCSI Extended Copy (ANSI T10 SPC)
> The SCSI commands that support extended copy operations are:
> * POPULATE TOKEN + WRITE USING TOKEN.
> * EXTENDED COPY(LID1/4) + RECEIVE COPY STATUS(LID1/4). LID1 stands for a
>   List Identifier length of 1 byte and LID4 stands for a List Identifier
>   length of 4 bytes.
> * SPC-3 and before define EXTENDED COPY(LID1) (83h/00h). SPC-4 added
>   EXTENDED COPY(LID4) (83h/01h).
> 
> Existing Users and Implementations of SCSI XCOPY
> * VMware, which uses XCOPY (with a one-byte length ID, aka LID1).
> * Microsoft, which uses ODX (aka LID4 because it has a four-byte length
>   ID).
> * Storage vendors all support XCOPY, but ODX support is growing.
> 
> Block Layer Notes
> The block layer supports the following types of block drivers:
> * blk-mq request-based drivers.
> * make_request drivers.
> 
> Notes:
> With each request a list of bio's is associated.
> Since submit_bio() only accepts a single bio and not a bio list this
> means that all make_request block drivers process one bio at a time.
> 
> Device Mapper
> The device mapper core supports bio processing and blk-mq requests. The
> function in the device mapper that creates a request queue is called
> alloc_dev(). That function not only allocates a request queue but also
> associates a struct gendisk with the request queue. The
> DM_DEV_CREATE_CMD ioctl triggers a call of alloc_dev(). The
> DM_TABLE_LOAD ioctl loads a table definition. Loading a table definition
> causes the type of a dm device to be set to one of the following:
> DM_TYPE_NONE;
> DM_TYPE_BIO_BASED;
> DM_TYPE_REQUEST_BASED;
> DM_TYPE_MQ_REQUEST_BASED;
> DM_TYPE_DAX_BIO_BASED;
> DM_TYPE_NVME_BIO_BASED.
> 
> Device mapper drivers must implement target_type.map(),
> target_type.clone_and_map_rq() or both. .map() maps a bio list.
> .clone_and_map_rq() maps a single request. The multipath and error
> device mapper drivers implement both methods. All other dm drivers only
> implement the .map() method.
> 
> Device mapper bio processing
> submit_bio()
> -> generic_make_request()
>   -> dm_make_request()
>     -> __dm_make_request()
>       -> __split_and_process_bio()
>         -> __split_and_process_non_flush()
>           -> __clone_and_map_data_bio()
>           -> alloc_tio()
>           -> clone_bio()
>             -> bio_advance()
>           -> __map_bio()
> 
> Existing Linux Copy Offload APIs
> * The FICLONERANGE ioctl. From <include/linux/fs.h>:
>   #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
> 
> struct file_clone_range {
> 	__s64 src_fd;
> 	__u64 src_offset;
> 	__u64 src_length;
> 	__u64 dest_offset;
> };
> 
> * The sendfile() system call. sendfile() copies a given number of bytes
>   from one file to another. The output offset is the offset of the
>   output file descriptor. The input offset is either the input file
>   descriptor offset or can be specified explicitly. The sendfile()
>   prototype is as follows:
>   ssize_t sendfile(int out_fd, int in_fd, off_t *ppos, size_t count);
>   ssize_t sendfile64(int out_fd, int in_fd, loff_t *ppos, size_t count);
> * The copy_file_range() system call. See also vfs_copy_file_range(). Its
>   prototype is as follows:
>   ssize_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>      loff_t *off_out, size_t len, unsigned int flags);
> * The splice() system call is not appropriate for adding extended copy
>   functionality since it copies data from or to a pipe. Its prototype is
>   as follows:
>   long splice(struct file *in, loff_t *off_in, struct file *out,
>     loff_t *off_out, size_t len, unsigned int flags);
> 
> Existing Linux Block Layer Copy Offload Implementations
> * Martin Petersen's REQ_COPY bio, where source and destination block
>   device are both specified in the same bio. Only works for block
>   devices. Does not work for files. Adds a new blocking ioctl() for
>   XCOPY from user space.
> * Mikulas Patocka's approach: separate REQ_OP_COPY_WRITE and
>   REQ_OP_COPY_READ operations. These are sent individually down stacked
>   drivers and are paired by the driver at the bottom of the stack.
> 
> 


-- 
Damien Le Moal
Western Digital Research






More information about the dm-devel mailing list