[vfio-users] Cannot register eventfd with MSI/MSI-X interrupts

Alex Williamson alex.williamson at redhat.com
Mon Aug 1 16:45:05 UTC 2016


On Mon, 1 Aug 2016 16:18:20 +0000
chris thompson <the_cartographer at hotmail.com> wrote:

>  
> From: the_cartographer at hotmail.com
> To: alex.williamson at redhat.com
> Date: Mon, 1 Aug 2016 16:16:00 +0000
> CC: vfio-users at redhat.com
> Subject: Re: [vfio-users] Cannot register eventfd with MSI/MSI-X interrupts
> 
> 
> 
> 
>  
> > Date: Mon, 1 Aug 2016 07:06:18 -0600
> > From: alex.williamson at redhat.com
> > To: the_cartographer at hotmail.com
> > CC: vfio-users at redhat.com
> > Subject: Re: [vfio-users] Cannot register eventfd with MSI/MSI-X interrupts
> > 
> > On Mon, 1 Aug 2016 10:24:52 +0000
> > chris thompson <the_cartographer at hotmail.com> wrote:
> >   
> > >  Hi,
> > >  
> > > I'm trying to talk to my PCI-e card (USB3 host controller) from userland, and have been successful accessing the PCI Config registers and device memory, and able to register eventfd file descriptors for legacy interrupts (IRQx) and Error interrupts - including stimulating these via the ioctl calls. Unfortunately the MSI and MSI-X interrupts return EINVAL (invalid argument) when trying to register them.
> > >  
> > > Sample from the code:
> > > printf("\nInterrupts:\n");
> > > int * irq_fds[VFIO_PCI_NUM_IRQS];
> > > int num_irqs[VFIO_PCI_NUM_IRQS];
> > > for (i = 0; i < device_info.num_irqs; i++) {
> > >   struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> > >   irq.index = i;
> > >   ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
> > >   printf("argsz 0x%x, flags 0x%x, index 0x%x, count 0x%x,\n", irq.argsz, irq.flags, irq.index, irq.count);
> > >   num_irqs[i] = irq.count;
> > >   irq_fds[i]=(int*)malloc(irq.count * sizeof(int));
> > >   int j;
> > >   for(j=0; j<irq.count; j++)
> > >   {
> > >     irq_fds[i][j] = eventfd(0, 0);
> > >     printf("config IRQ %u : %u\n", i, j);
> > >     struct vfio_irq_set *irq_setup;
> > >     unsigned int size = sizeof(struct vfio_irq_set) + sizeof(int);
> > >     irq_setup = malloc(size);
> > >     irq_setup->argsz = size;
> > >     irq_setup->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER ;
> > >     irq_setup->index = i;
> > >     irq_setup->start = j;
> > >     irq_setup->count = 1;
> > >     *(int*)(irq_setup->data)=irq_fds[i][j];
> > >     int ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_setup);
> > >     if(ret){ printf("failed to register interrupt %u : %u, error %i %s\n", i, j, ret, strerror(errno)); }
> > >     free(irq_setup);
> > >   }
> > > }
> > >  
> > > Which gives the output:
> > >  
> > > Interrupts:
> > > argsz 0x10, flags 0x7, index 0x0, count 0x1,
> > > config IRQ 0 : 0
> > > argsz 0x10, flags 0x9, index 0x1, count 0x40,
> > > config IRQ 1 : 0
> > > failed to register interrupt 1 : 0, error -1 Invalid argument
> > > config IRQ 1 : 1
> > > failed to register interrupt 1 : 1, error -1 Invalid argument
> > > config IRQ 1 : 2
> > > ...
> > > 
> > > Does anyone know a likely reason that I can't register these? From what I've read my IOMMUs all support IRQ remapping:
> > > [    0.016443] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020e60262 ecap f0101a
> > > [    0.016450] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a
> > > 
> > > The hardware is an Intel i5 2500 in an HP 8200 Elite small form-factor PC, running Ubuntu 14.04 on the 3.13 kernel.  
> > 
> > 
> > include/uapi/linux/vfio.h:
> >  * The NORESIZE flag indicates that the interrupt lines within the index
> >  * are setup as a set and new subindexes cannot be enabled without first
> >  * disabling the entire index.  This is used for interrupts like PCI MSI
> >  * and MSI-X where the driver may only use a subset of the available
> >  * indexes, but VFIO needs to enable a specific number of vectors
> >  * upfront.  In the case of MSI-X, where the user can enable MSI-X and
> >  * then add and unmask vectors, it's up to userspace to make the decision
> >  * whether to allocate the maximum supported number of vectors or tear
> >  * down setup and incrementally increase the vectors as each is enabled.
> > 
> > struct vfio_irq_info {
> >         __u32   argsz;
> >         __u32   flags;
> > #define VFIO_IRQ_INFO_EVENTFD           (1 << 0)
> > #define VFIO_IRQ_INFO_MASKABLE          (1 << 1)
> > #define VFIO_IRQ_INFO_AUTOMASKED        (1 << 2)
> > #define VFIO_IRQ_INFO_NORESIZE          (1 << 3)   <----
> >         __u32   index;          /* IRQ index */
> >         __u32   count;          /* Number of IRQs within this index */
> > };  
>  Hi Alex, Thanks for your suggestions, based on the same documentation I should be able to disable the interrupts for the set using:"... or the index can be disabled as a whole with: flags = (DATA_NONE|ACTION_TRIGGER), count = 0."and then bind them to eventfd as a group (I was originally trying to bind them as a group with the same results). However: printf("\nInterrupts:\n");
> int * irq_fds[VFIO_PCI_NUM_IRQS];
> int num_irqs[VFIO_PCI_NUM_IRQS];
> for (i = 0; i < device_info.num_irqs; i++) {
>     struct vfio_irq_info irq = { .argsz = sizeof(irq) };    irq.index = i;    ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>     printf("argsz 0x%x, flags 0x%x, index 0x%x, count 0x%x,\n", irq.argsz, irq.flags, irq.index, irq.count);    num_irqs[i] = irq.count;
>     irq_fds[i]=(int*)malloc(irq.count * sizeof(int));
>     int j;
>     for(j=0; j<irq.count; j++) irq_fds[i][j] = eventfd(0, 0);    printf("config IRQ set %u\n", i);
>     struct vfio_irq_set *irq_setup;
>     unsigned int size = sizeof(struct vfio_irq_set) + (irq.count*sizeof(int));
>     irq_setup = malloc(size);
>     irq_setup->argsz = size;
>     irq_setup->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER ;
>     irq_setup->index = i;
>     irq_setup->start = 0;
>     irq_setup->count = irq.count;
>     int ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_setup);

If this was an attempt to disable, then irq_setup->count should have
been set to 0 here.  This results in the -EINVAL return here.

>     if(ret){ printf("failed to unregister register interrupt set %u error %i %s\n", i, ret, strerror(errno)); }    irq_setup->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER ;
>     irq_setup->index = i;
>     irq_setup->start = 0;
>     irq_setup->count = irq.count;
>     int j;
>     for(j=0; j<irq.count; j++) ((int*)(irq_setup->data))[j]=irq_fds[i][j];    ret = ioctl(device, VFIO_DEVICE_SET_IRQS, irq_setup);
>     if(ret){ printf("failed to register interrupts %u : %u-%u, error %i %s\n", i, 0, irq.count-1, ret, strerror(errno)); }
>     free(irq_setup);
> } 
> Results in:
>  
> Interrupts:
> argsz 0x10, flags 0x7, index 0x0, count 0x1,
> config IRQ 0 : 0
> failed to unregister register interrupt set 0 error -1 Invalid argument

Here the first ioctl failed because count != 0, the second ioctl
worked, so the device is running in INTx mode.
 
> argsz 0x10, flags 0x9, index 0x1, count 0x40,
> config IRQ 1 : 0
> failed to unregister register interrupt set 1 error -1 Invalid argument

This fails because count != 0

> failed to register interrupts 1 : 0-63, error -1 Invalid argument

This fails because the device is running in INTx mode and multiple
interrupt modes cannot be simultaneously enabled.

> argsz 0x10, flags 0x9, index 0x2, count 0x8,
> config IRQ 2 : 0
> failed to unregister register interrupt set 2 error -1 Invalid argument
> failed to register interrupts 2 : 0-7, error -1 Invalid argument

Same as the MSI index.
 
> argsz 0x10, flags 0x9, index 0x3, count 0x1,
> config IRQ 3 : 0
> Killed
>  
> This Killed at the IOCtl to the error IRQ line is concerning, and a second attempt results in:
>  
> Interrupts:
> argsz 0x10, flags 0x7, index 0x0, count 0x1,
> config IRQ 0 : 0
> 
>      
> then the process hangs in the unregister ioctl. If I force kill the process then I can't open the device again without rebooting my machine.
>  
> Any ideas? I've probably done something obviously wrong.
>  
> Thanks and regards,
> Chris
> P.S. Missed something important from dmesg too:  
> [ 1334.286951] VFIO - User Level meta-driver version: 0.3
> [ 1383.701974] xhci_hcd 0000:02:00.0: remove, state 4
> [ 1383.701984] usb usb4: USB disconnect, device number 1
> [ 1383.705976] xhci_hcd 0000:02:00.0: USB bus 4 deregistered
> [ 1383.705984] xhci_hcd 0000:02:00.0: remove, state 4
> [ 1383.705989] usb usb3: USB disconnect, device number 1
> [ 1383.712050] xhci_hcd 0000:02:00.0: USB bus 3 deregistered
> [ 1498.085560] vfio_pin_pages: RLIMIT_MEMLOCK (65536) exceeded
> [ 1499.112900] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [ 1499.112954] IP: [<ffffffffa071b6f9>] vfio_pci_set_err_trigger+0x19/0x130 [vfio_pci]
> [ 1499.113000] PGD 362e7067 PUD 35172067 PMD 0
> [ 1499.113028] Oops: 0000 [#1] SMP
> [ 1499.113049] Modules linked in: vfio_iommu_type1 vfio_pci vfio nfsv3 nfsv4 autofs4 cuse dm_crypt bnep rfcomm bluetooth gpio_ich hp_wmi sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek kvm_intel kvm crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hwdep aesni_intel snd_pcm aes_x86_64 lrw snd_page_alloc gf128mul glue_helper ablk_helper snd_seq_midi cryptd snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer lpc_ich shpchp snd nfsd tpm_infineon wmi mac_hid auth_rpcgss mei_me nfs_acl mei soundcore parport_pc nfs ppdev lockd sunrpc lp fscache parport binfmt_misc btrfs xor raid6_pq libcrc32c i915 video i2c_algo_bit e1000e drm_kms_helper psmouse ahci drm libahci ptp pps_core
> [ 1499.113526] CPU: 3 PID: 2229 Comm: vfio_test Not tainted 3.13.0-87-generic #133-Ubuntu
> [ 1499.113568] Hardware name: Hewlett-Packard HP Compaq 8200 Elite SFF PC/1495, BIOS J01 v02.15 11/10/2011
> [ 1499.113618] task: ffff880222964800 ti: ffff88003630c000 task.ti: ffff88003630c000
> [ 1499.113657] RIP: 0010:[<ffffffffa071b6f9>]  [<ffffffffa071b6f9>] vfio_pci_set_err_trigger+0x19/0x130 [vfio_pci]
> [ 1499.113713] RSP: 0018:ffff88003630ddd0  EFLAGS: 00010246
> [ 1499.113742] RAX: 00000000ffffffea RBX: ffff8800ca409800 RCX: 0000000000000001
> [ 1499.113779] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8800ca409800
> [ 1499.113816] RBP: ffff88003630dde8 R08: 0000000000000021 R09: 0000000000000000
> [ 1499.113854] R10: 0000000000000021 R11: ffffffffa071b6e0 R12: 0000000000000000
> [ 1499.113891] R13: ffff8800ca409858 R14: 00000000007a81d0 R15: 0000000000000000
> [ 1499.113929] FS:  00007f3f57f07740(0000) GS:ffff88022e380000(0000) knlGS:0000000000000000
> [ 1499.113971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1499.114002] CR2: 0000000000000000 CR3: 00000000353c2000 CR4: 00000000000407e0
> [ 1499.114039] Stack:
> [ 1499.114050]  ffff8800ca409800 0000000000000000 ffff8800ca409858 ffff88003630ddf8
> [ 1499.114093]  ffffffffa071c9ec ffff88003630deb0 ffffffffa071ae7d ffff88003630de98
> [ 1499.114135]  000000018145452f 0000000000000004 ffff88021f7fba10 0000000000000202
> [ 1499.114178] Call Trace:
> [ 1499.114196]  [<ffffffffa071c9ec>] vfio_pci_set_irqs_ioctl+0x8c/0xa0 [vfio_pci]
> [ 1499.114236]  [<ffffffffa071ae7d>] vfio_pci_ioctl+0x31d/0xa30 [vfio_pci]
> [ 1499.114277]  [<ffffffff810ad624>] ? __wake_up+0x44/0x50
> [ 1499.114308]  [<ffffffff812035e1>] ? fsnotify+0x241/0x320
> [ 1499.114340]  [<ffffffffa070e193>] vfio_device_fops_unl_ioctl+0x23/0x30 [vfio]
> [ 1499.114379]  [<ffffffff811d46e0>] do_vfs_ioctl+0x2e0/0x4c0
> [ 1499.114412]  [<ffffffff8172d854>] ? __schedule+0x384/0x7f0
> [ 1499.114442]  [<ffffffff811d4941>] SyS_ioctl+0x81/0xa0
> [ 1499.114472]  [<ffffffff8173a3dd>] system_call_fastpath+0x1a/0x1f
> [ 1499.114503] Code: 48 89 e5 e8 8a 21 af e0 b8 01 00 00 00 5d c3 0f 1f 00 66 66 66 66 90 55 83 fe 03 b8 ea ff ff ff 48 89 e5 41 55 41 54 53 48 89 fb <41> 8b 39 4c 8b 23 75 64 41 f6 c0 07 74 5e 41 f6 c0 01 75 63 41
> [ 1499.114673] RIP  [<ffffffffa071b6f9>] vfio_pci_set_err_trigger+0x19/0x130 [vfio_pci]
> [ 1499.114717]  RSP <ffff88003630ddd0>
> [ 1499.114735] CR2: 0000000000000000
> [ 1499.258683] ---[ end trace 978702c604a203ce ]---

This is of course a bug, users shouldn't be able to trigger this.  I'll
sort it out and post a patch.

So your code is useful for exposing this bug, but what are you actually
trying to accomplish?  PCI devices have three different types of
interrupts, INTx, MSI, and MSI-X.  They can't all be configured
simultaneously.  The error and request interrupts are not tied to a
physical interrupt, they're just communication channels to signal an
error and release the device respectively.  Your code is great as a
unit test, but not so much if you're actually trying to write a
userspace device driver.  Thanks,

Alex




More information about the vfio-users mailing list