[vfio-users] Can't read device config space with pread

Ingrid Ribeiro Galvez inrigalvez at gmail.com
Fri Feb 24 11:29:50 UTC 2017


On Thu, Feb 23, 2017 at 5:52 PM, Alex Williamson <alex.williamson at redhat.com
> wrote:

> On Thu, 23 Feb 2017 13:15:54 +0000
> Ingrid Ribeiro Galvez <inrigalvez at gmail.com> wrote:
>
> > Hi guys,
> >
> > I've been working with qemu kvm for a while and now I need to passthrough
> > PCI devices. I did all required procedures to make this work: enabled
> > iommu, modprobed vfio module, binded device to vfio and checked that vfio
> > group was indeed created, etc... But when I start qemu with any pci
> devices
> > I get the error message:
> >
> > *vfio: Failed to read device config space*
>
> This comes from here:
>
>     /* Get a copy of config space */
>     ret = pread(vdev->vbasedev.fd, vdev->pdev.config,
>                 MIN(pci_config_size(&vdev->pdev), vdev->config_size),
>                 vdev->config_offset);
>     if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
>         ret = ret < 0 ? -errno : -EFAULT;
>         error_setg_errno(errp, -ret, "failed to read device config space");
>         goto error;
>     }
>
> So we got fewer bytes than expected and an errno.  What's the device
> look like on the host (lspci -vvv)?  Can you read the full config
> space for the device from sysfs
> (xxd /sys/bus/pci/devices/0000:01:00.0/config)?
>
>
This is the lspci -vvv on the device:

01:00.0 Ethernet controller: Intel Corporation Device 157b (rev 03)
    Subsystem: Intel Corporation Device 0000
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at dfb00000 (32-bit, non-prefetchable) [disabled]
[size=128K]
    Region 2: I/O ports at e000 [disabled] [size=32]
    Region 3: Memory at dfb20000 (32-bit, non-prefetchable) [disabled]
[size=16K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [70] MSI-X: Enable- Count=5 Masked-
        Vector table: BAR=3 offset=00000000
        PBA: BAR=3 offset=00002000
    Capabilities: [a0] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns,
L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+
TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency
L0 <2us, L1 <16us
            ClockPM- Surprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -3.5dB
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
        AERCap:    First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    Capabilities: [140 v1] Device Serial Number 00-13-f2-ff-ff-a0-01-60
    Capabilities: [1a0 v1] #17
    Kernel driver in use: vfio-pci

And this is the config space I get from sysfs:

[root at r6 /]# hexdump /sys/bus/pci/devices/0000\:01\:00.0/config
0000000 8086 157b 0407 0010 0003 0200 0000 0000
0000010 0000 dfb0 0000 0000 e001 0000 0000 dfb2
0000020 0000 0000 0000 0000 0000 0000 8086 0000
0000030 0000 0000 0040 0000 0000 0000 010b 0000
0000040 5001 c823 2008 0000 0000 0000 0000 0000
0000050 7005 0180 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0000 0000 0000 0000 0000
0000070 a011 8004 0003 0000 2003 0000 0000 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
0000090 0000 0000 0000 0000 0000 0000 ffff ffff
00000a0 0010 0002 8cc2 1000 283f 0019 5c11 0042
00000b0 0040 1011 0000 0000 0000 0000 0000 0000
00000c0 0000 0000 001f 0000 0000 0000 0000 0000
00000d0 0001 0001 0000 0000 0000 0000 0000 0000
00000e0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000100 0001 1402 0000 0000 0000 0000 2031 0046
0000110 0000 0000 2000 0000 00a0 0000 0000 0000
0000120 0000 0000 0000 0000 0000 0000 0000 0000
*
0000140 0003 1a01 0160 ffa0 f2ff 0013 0000 0000
0000150 0000 0000 0000 0000 0000 0000 0000 0000
*
00001a0 0017 0001 0205 0007 0000 0000 0000 0000
00001b0 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000



> > By looking into qemu code I found out that the error was coming from a
> call
> > to pread to read the pci device's file descriptor. It fails with errno
> > '*Illegal
> > seek*'. Offset being used is 0x70000000000, and this offset seems to be
> the
> > same for all devices and also in different machines. I also wrote some
> code
> > to test reading the pci device file descriptor from outside of the qemu
> > code and the pread also fails with 'illegal seek' error. This was done
> on a
> > generic linux kernel v4.7.8 compiled with uClibc for an embedded system.
>
> The offset for each standard region of the device is fixed, PCI config
> space is always exposed at the same offset.
>
> > If I install ubuntu 16.04 (kernel v4.4.0) on the same machine and repeat
> > the steps, pci passthrough works fine and the pread on my test code also
> > works perfectly.
> >
> > This is the code I am using to test reading the device fd with pread:
> >
> >
> > #include <unistd.h>
> > #include <stdio.h>
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <linux/vfio.h>
> > #include <sys/ioctl.h>
> > #include <sys/mman.h>
> >
> > #define BUF_SIZE 4096
>
> This presumes the device has a full PCIe config space, is the above
> sysfs file 4k in size?
>
> I used this buffer size because it is what qemu was using. And it works
fine on Ubuntu.

> int main(){
> >     char buf[BUF_SIZE], buf1[BUF_SIZE], buf2[BUF_SIZE];
> >
> >     int ret,group_fd, fd, fd2;
> >     size_t nbytes = BUF_SIZE;
> >     ssize_t bytes_read;
> >     int iommu1, iommu2;
> >     unsigned long offset;
> >     int container, group, device, i;
> >     struct vfio_group_status group_status = { .argsz =
> sizeof(group_status)
> > };
> >     struct vfio_iommu_type1_info iommu_info = { .argsz =
> sizeof(iommu_info)
> > };
> >     struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map)
> };
> >     struct vfio_device_info device_info = { .argsz = sizeof(device_info)
> };
> >     struct vfio_region_info reg = { .argsz = sizeof(reg) };
> >
> >     container = open("/dev/vfio/vfio",O_RDWR);
> >     printf("Container = %d\n",container);
> >     if(ioctl(container,VFIO_GET_API_VERSION)!=VFIO_API_VERSION){
> >         printf("Unknown api version: %m\n");
> >     }
> >     group_fd = open("/dev/vfio/1",O_RDWR);
> >     printf("Group fd = %d\n", group_fd);
> >     ioctl(group_fd, VFIO_GROUP_GET_STATUS, &group_status);
> >     if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)){
> >         printf("Group not viable\n");
> >         getchar();
> >         return 1;
> >     }
> >     ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER,&container);
> >     ret = ioctl(container,VFIO_SET_IOMMU,VFIO_TYPE1_IOMMU);
> >
> >     ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
> >
> >     /* Allocate some space and setup a DMA mapping */
> >     dma_map.vaddr = (unsigned long int) mmap(0, 1024 * 1024, PROT_READ |
> > PROT_WRITE,MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> >     dma_map.size = 1024 * 1024;
> >     dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
> >     dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> >
> >     ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
> >
> >     printf("\n\nGETTING DEVICE FD\n");
> >     fd = ioctl(group_fd,VFIO_GROUP_GET_DEVICE_FD,"0000:01:00.0");
> >
> >
> >     ioctl(fd, VFIO_DEVICE_GET_INFO, &device_info);
> >     for (i = 0; i < device_info.num_regions; i++) {
> >         reg.index = i;
> >
> >         ioctl(fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
> >
> >         /* Setup mappings... read/write offsets, mmaps
> >         * For PCI devices, config space is a region */
> >     }
> >
> >     for (i = 0; i < device_info.num_irqs; i++) {
> >         struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> >
> >         irq.index = i;
> >
> >         ioctl(fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> >
> >     }
> >
> >
> >     reg.index = VFIO_PCI_CONFIG_REGION_INDEX;
> >
> >     printf("VFIO_DEVICE_GET_REGION_INFO = %lu",VFIO_DEVICE_GET_REGION_IN
> FO);
> >     ret = ioctl(fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
> >
> >     offset = reg.offset;
> >     printf("offset is %lx\n",offset);
> >     /*ret = read(group_fd,buf,nbytes);
> >     printf("Read from group fd, ret is %d: %m\n",ret);
> >     printf("CONFIG SPACE: \n");
> >     printf("%s\n",buf);*/
> >     printf("Fd = %d\n",fd);
> >
> >     //printf("VFIO_GROUP_GET_DEV_ID = %lu\n",VFIO_GROUP_GET_DEVICE_FD);
> >     ret = read(fd,buf,nbytes);
>
> This reads from offset 0, which is BAR0, which is possibly not enabled
> since you haven't enabled I/O or MMIO access to the device in the PCI
> COMMAND register in config space.  Results here are going to depend on
> the state of the device as you receive it, and whether you can even
> read 4K from BAR0 space.
>

How do I enable that? XD

>
> >     printf("Ret from read is = %d, buf = %s\n",ret,buf);
> >     if(ret<1){
> >         printf("ERROR: %m \n");
> >     }
> >
> >     ret = pread(fd,buf,nbytes,offset);
>
> This one should actually read from config space.
>
> >     printf("Ret from pread is = %d\n",ret);
> >     if(ret<1){
> >         printf("ERROR: %m \n");
> >     }
>
> So this is where you get an ESPIPE error?  Do different sizes work?
> 256 bytes?  64 bytes?
>

No, return of pread is always -1 regardless of the buffer size =/ ...

>
> >     printf("TESTING PREAD ON A COMMON FILE\n");
> >     fd2 = open("/sys/bus/pci/devices/0000:01:00.0/device",O_RDONLY);
> >     printf("FD2 = %d\n",fd2);
> >     ret = read(fd2,buf1,nbytes);
> >     if(ret<0){
> >         printf("ERROR: %m\n");
> >     }
> >     printf("Result from read: ret = %d, content = %s\n",ret,buf1);
> >     ret = pread(fd2,buf2,nbytes,2);
> >     if(ret<0){
> >         printf("ERROR: %m\n");
> >     }
> >     printf("Result from pread: ret = %d, content = %s\n",ret,buf2);
>
> Did these work?
>

Yes, pread on 'normal' files is working without any problems.

>
> >     close(fd2);
> >     getchar();
> >     close(fd);
> >     close(container);
> >     close(group_fd);
> >     return 0;
> > }
> >
> >
> > Something weird I noticed that might be related to this  is that on
> ubuntu
> > the iommu groups for some devices are very different from the manually
> > compiled kernel. There are a few devices that on ubuntu have a large
> > iommu_group while in the generic kernel the iommu group is composed by
> only
> > one device ( and this is in the same machine btw!). Is this normal?
> > Other thing I tried was using 0 as offset to pread and this gives me the
> > same error even though a normal read works fine....
>
> The ubuntu kernel is older, perhaps it doesn't include quirks to enable
> ACS equivalent isolation on the PCH root ports.  That would explain the
> group differences.  Thanks,
>
> Alex
>

Please let me know if there is more information I can provide.
Thanks very much!

Cheers,

Ingrid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170224/0bf8032f/attachment.htm>


More information about the vfio-users mailing list