[libvirt RFCv4 01/20] iohelper: introduce new struct to carry copy operation parameters

Claudio Fontana cfontana at suse.de
Fri Apr 29 09:24:29 UTC 2022


On 4/28/22 3:01 PM, Daniel P. Berrangé wrote:
> On Wed, Apr 27, 2022 at 11:13:20PM +0200, Claudio Fontana wrote:
>> this is in preparation for a minor refactoring of the copy
>> function itself out of runIO().
>>
>> Signed-off-by: Claudio Fontana <cfontana at suse.de>
>> ---
>>  src/util/iohelper.c | 95 +++++++++++++++++++++++++--------------------
>>  1 file changed, 53 insertions(+), 42 deletions(-)
>>
>> diff --git a/src/util/iohelper.c b/src/util/iohelper.c
>> index 2c91bf4f93..c13746a547 100644
>> --- a/src/util/iohelper.c
>> +++ b/src/util/iohelper.c
>> @@ -45,6 +45,16 @@
>>  # define O_DIRECT 0
>>  #endif
>>  
>> +struct runIOParams {
>> +    bool isBlockDev;
>> +    bool isDirect;
>> +    bool isWrite;
>> +    int fdin;
>> +    const char *fdinname;
>> +    int fdout;
>> +    const char *fdoutname;
>> +};
>> +
>>  static int
>>  runIO(const char *path, int fd, int oflags)
>>  {
>> @@ -53,13 +63,9 @@ runIO(const char *path, int fd, int oflags)
>>      size_t buflen = 1024*1024;
>>      intptr_t alignMask = 64*1024 - 1;
>>      int ret = -1;
>> -    int fdin, fdout;
>> -    const char *fdinname, *fdoutname;
>> -    unsigned long long total = 0;
>> -    bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0);
>> -    off_t end = 0;
>> +    off_t total = 0;
>>      struct stat sb;
>> -    bool isBlockDev = false;
>> +    struct runIOParams p;
>>  
>>  #if WITH_POSIX_MEMALIGN
>>      if (posix_memalign(&base, alignMask + 1, buflen))
>> @@ -77,34 +83,23 @@ runIO(const char *path, int fd, int oflags)
>>                               fd, path);
>>          goto cleanup;
>>      }
>> -    isBlockDev = S_ISBLK(sb.st_mode);
>> +    p.isBlockDev = S_ISBLK(sb.st_mode);
>> +    p.isDirect = O_DIRECT && (oflags & O_DIRECT);
>>  
>>      switch (oflags & O_ACCMODE) {
>>      case O_RDONLY:
>> -        fdin = fd;
>> -        fdinname = path;
>> -        fdout = STDOUT_FILENO;
>> -        fdoutname = "stdout";
>> -        /* To make the implementation simpler, we give up on any
>> -         * attempt to use O_DIRECT in a non-trivial manner.  */
>> -        if (!isBlockDev && direct && ((end = lseek(fd, 0, SEEK_CUR)) != 0)) {
>> -            virReportSystemError(end < 0 ? errno : EINVAL, "%s",
>> -                                 _("O_DIRECT read needs entire seekable file"));
>> -            goto cleanup;
>> -        }
>> +        p.isWrite = false;
>> +        p.fdin = fd;
>> +        p.fdinname = path;
>> +        p.fdout = STDOUT_FILENO;
>> +        p.fdoutname = "stdout";
>>          break;
>>      case O_WRONLY:
>> -        fdin = STDIN_FILENO;
>> -        fdinname = "stdin";
>> -        fdout = fd;
>> -        fdoutname = path;
>> -        /* To make the implementation simpler, we give up on any
>> -         * attempt to use O_DIRECT in a non-trivial manner.  */
>> -        if (!isBlockDev && direct && (end = lseek(fd, 0, SEEK_END)) != 0) {
>> -            virReportSystemError(end < 0 ? errno : EINVAL, "%s",
>> -                                 _("O_DIRECT write needs empty seekable file"));
>> -            goto cleanup;
>> -        }
>> +        p.isWrite = true;
>> +        p.fdin = STDIN_FILENO;
>> +        p.fdinname = "stdin";
>> +        p.fdout = fd;
>> +        p.fdoutname = path;
>>          break;
>>  
>>      case O_RDWR:
>> @@ -114,6 +109,22 @@ runIO(const char *path, int fd, int oflags)
>>                               (oflags & O_ACCMODE));
>>          goto cleanup;
>>      }
>> +    /* To make the implementation simpler, we give up on any
>> +     * attempt to use O_DIRECT in a non-trivial manner.  */
>> +    if (!p.isBlockDev && p.isDirect) {
>> +        off_t off;
>> +        if (p.isWrite) {
>> +            if ((off = lseek(fd, 0, SEEK_END)) != 0) {
>> +                virReportSystemError(off < 0 ? errno : EINVAL, "%s",
>> +                                     _("O_DIRECT write needs empty seekable file"));
>> +                goto cleanup;
>> +            }
>> +        } else if ((off = lseek(fd, 0, SEEK_CUR)) != 0) {
>> +            virReportSystemError(off < 0 ? errno : EINVAL, "%s",
>> +                                 _("O_DIRECT read needs entire seekable file"));
>> +            goto cleanup;
>> +        }
> 
> I'm puzzelled about the current code doing SEEK_END in one case
> and SEEK_CUR in the other case. I think this method should simply
> use SEEK_CUR only as a sanity check for being ablt to use O_DIRECT.

I think the answer is in the original commit:

commit 12291656b135ed788e41dadbd2d15e613ddea9b5
Author: Eric Blake <eblake at redhat.com>
Date:   Tue Jul 12 08:35:05 2011 -0600

    save: let iohelper work on O_DIRECT fds
    
    Required for a coming patch where iohelper will operate on O_DIRECT
    fds.  There, the user-space memory must be aligned to file system
    boundaries (at least 512, but using page-aligned works better, and
    some file systems prefer 64k).  Made tougher by the fact that
    VIR_ALLOC won't work on void *, but posix_memalign won't work on
    char * and isn't available everywhere.
    
    This patch makes some simplifying assumptions - namely, output
    to an O_DIRECT fd will only be attempted on an empty seekable
    file (hence, no need to worry about preserving existing data
    on a partial block, and ftruncate will work to undo the effects
    of having to round up the size of the last block written), and
    input from an O_DIRECT fd will only be attempted on a complete
    seekable file with the only possible short read at EOF.


----

Here the relevant part being "empty seekable file".
The check not only ensures the file is seekable, but also double checks that it is empty.

I can add a comment to make it explicit.


> 
> If we actually need to position the file offset, then the caller
> should do a SEEK_END itself.
> 
>> +    }
>>  
>>      while (1) {
>>          ssize_t got;
>> @@ -124,16 +135,16 @@ runIO(const char *path, int fd, int oflags)
>>           * writes will be aligned.
>>           * In other cases using saferead reduces number of syscalls.
>>           */
>> -        if (fdin == fd && direct) {
>> -            if ((got = read(fdin, buf, buflen)) < 0 &&
>> +        if (!p.isWrite && p.isDirect) {
>> +            if ((got = read(p.fdin, buf, buflen)) < 0 &&
>>                  errno == EINTR)
>>                  continue;
>>          } else {
>> -            got = saferead(fdin, buf, buflen);
>> +            got = saferead(p.fdin, buf, buflen);
>>          }
> 
> The distinction for read vs saferead is for correctness with O_DIRECT.
> 
> We only want to use read on the plain file / blockdev, and saferead
> on the socket/pipe.

Yes that's clear.

> 
> We already call fstat to let us figure out the fd type, so can use
> that decide whether to use read vs saferead.
> 
> This ties into my comment on the later patch about simplifying the
> parameters passed in, to remove the distinction of read vs write.
> 

As mentioned elsewhere I don't think it can be simplified quite that way,
but I think there is an alternative that could work, will propose in the next spin.

Thanks

Claudio



More information about the libvir-list mailing list