<div dir="ltr"><div>sorry , update infomation right now </div><div><br></div><div><br></div><div>i installed centos7.3 at my 8 gpus machine yesterday, and i made a successful passthrough, the vm guest os can use gpu with no problem. so i think this is a software problem, i need to patch some patch.</div><div><br></div><div>i also made a test in my 4 gpus machine without any software change, the result is success. the 4 gpus are attached at pci root without pcie switch , so , i think the software problem have some correlation with pcie switch .</div><div><br></div><div>thank you Alex .</div><div><div><br></div><div>[root@64 /data]# lspci|grep NV</div><div>04:00.0 3D controller: NVIDIA Corporation Device 17fd (rev a1)</div><div>05:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div><div>08:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div><div>09:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div><div>85:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div><div>86:00.0 3D controller: NVIDIA Corporation Device 17fd (rev a1)</div><div>89:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div><div>8a:00.0 3D controller: NVIDIA Corporation Device 1b38 (rev a1)</div></div><div><br></div><div><br></div><div><div>[root@64 /data]# lspci -t</div><div>-+-[0000:ff]-+-08.0</div><div> | +-08.2</div><div> | +-08.3</div><div> | +-09.0</div><div> | +-09.2</div><div> | +-09.3</div><div> | +-0b.0</div><div> | +-0b.1</div><div> | +-0b.2</div><div> | +-0b.3</div><div> | +-0c.0</div><div> | +-0c.1</div><div> | +-0c.2</div><div> | +-0c.3</div><div> | +-0c.4</div><div> | +-0c.5</div><div> | +-0c.6</div><div> | +-0c.7</div><div> | +-0d.0</div><div> | +-0d.1</div><div> | +-0d.2</div><div> | +-0d.3</div><div> | +-0d.4</div><div> | +-0d.5</div><div> | +-0f.0</div><div> | +-0f.1</div><div> | +-0f.2</div><div> | +-0f.3</div><div> | +-0f.4</div><div> | +-0f.5</div><div> | +-0f.6</div><div> | +-10.0</div><div> | +-10.1</div><div> | +-10.5</div><div> | +-10.6</div><div> | +-10.7</div><div> | +-12.0</div><div> | +-12.1</div><div> | +-12.4</div><div> | +-12.5</div><div> | +-13.0</div><div> | +-13.1</div><div> | +-13.2</div><div> | +-13.3</div><div> | +-13.6</div><div> | +-13.7</div><div> | +-14.0</div><div> | +-14.1</div><div> | +-14.2</div><div> | +-14.3</div><div> | +-14.4</div><div> | +-14.5</div><div> | +-14.6</div><div> | +-14.7</div><div> | +-16.0</div><div> | +-16.1</div><div> | +-16.2</div><div> | +-16.3</div><div> | +-16.6</div><div> | +-16.7</div><div> | +-17.0</div><div> | +-17.1</div><div> | +-17.2</div><div> | +-17.3</div><div> | +-17.4</div><div> | +-17.5</div><div> | +-17.6</div><div> | +-17.7</div><div> | +-1e.0</div><div> | +-1e.1</div><div> | +-1e.2</div><div> | +-1e.3</div><div> | +-1e.4</div><div> | +-1f.0</div><div> | \-1f.2</div><div> +-[0000:80]-+-00.0-[81]--+-00.0</div><div> | | \-00.1</div><div> | +-01.0-[82]----00.0</div><div> | +-02.0-[83-86]----00.0-[84-86]--+-08.0-[85]----00.0</div><div> | | \-10.0-[86]----00.0</div><div> | +-03.0-[87-8a]----00.0-[88-8a]--+-08.0-[89]----00.0</div><div> | | \-10.0-[8a]----00.0</div><div> | +-04.0</div><div> | +-04.1</div><div> | +-04.2</div><div> | +-04.3</div><div> | +-04.4</div><div> | +-04.5</div><div> | +-04.6</div><div> | +-04.7</div><div> | +-05.0</div><div> | +-05.1</div><div> | +-05.2</div><div> | \-05.4</div><div> +-[0000:7f]-+-08.0</div><div> | +-08.2</div><div> | +-08.3</div><div> | +-09.0</div><div> | +-09.2</div><div> | +-09.3</div><div> | +-0b.0</div><div> | +-0b.1</div><div> | +-0b.2</div><div> | +-0b.3</div><div> | +-0c.0</div><div> | +-0c.1</div><div> | +-0c.2</div><div> | +-0c.3</div><div> | +-0c.4</div><div> | +-0c.5</div><div> | +-0c.6</div><div> | +-0c.7</div><div> | +-0d.0</div><div> | +-0d.1</div><div> | +-0d.2</div><div> | +-0d.3</div><div> | +-0d.4</div><div> | +-0d.5</div><div> | +-0f.0</div><div> | +-0f.1</div><div> | +-0f.2</div><div> | +-0f.3</div><div> | +-0f.4</div><div> | +-0f.5</div><div> | +-0f.6</div><div> | +-10.0</div><div> | +-10.1</div><div> | +-10.5</div><div> | +-10.6</div><div> | +-10.7</div><div> | +-12.0</div><div> | +-12.1</div><div> | +-12.4</div><div> | +-12.5</div><div> | +-13.0</div><div> | +-13.1</div><div> | +-13.2</div><div> | +-13.3</div><div> | +-13.6</div><div> | +-13.7</div><div> | +-14.0</div><div> | +-14.1</div><div> | +-14.2</div><div> | +-14.3</div><div> | +-14.4</div><div> | +-14.5</div><div> | +-14.6</div><div> | +-14.7</div><div> | +-16.0</div><div> | +-16.1</div><div> | +-16.2</div><div> | +-16.3</div><div> | +-16.6</div><div> | +-16.7</div><div> | +-17.0</div><div> | +-17.1</div><div> | +-17.2</div><div> | +-17.3</div><div> | +-17.4</div><div> | +-17.5</div><div> | +-17.6</div><div> | +-17.7</div><div> | +-1e.0</div><div> | +-1e.1</div><div> | +-1e.2</div><div> | +-1e.3</div><div> | +-1e.4</div><div> | +-1f.0</div><div> | \-1f.2</div><div> \-[0000:00]-+-00.0</div><div> +-01.0-[01]--</div><div> +-02.0-[02-05]----00.0-[03-05]--+-08.0-[04]----00.0</div><div> | \-10.0-[05]----00.0</div><div> +-03.0-[06-09]----00.0-[07-09]--+-08.0-[08]----00.0</div><div> | \-10.0-[09]----00.0</div><div> +-04.0</div><div> +-04.1</div><div> +-04.2</div><div> +-04.3</div><div> +-04.4</div><div> +-04.5</div><div> +-04.6</div><div> +-04.7</div><div> +-05.0</div><div> +-05.1</div><div> +-05.2</div><div> +-05.4</div><div> +-11.0</div><div> +-11.4</div><div> +-14.0</div><div> +-16.0</div><div> +-16.1</div><div> +-1a.0</div><div> +-1c.0-[0a]--</div><div> +-1c.7-[0b-0c]----00.0-[0c]----00.0</div><div> +-1d.0</div><div> +-1f.0</div><div> +-1f.2</div></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">and the xml </div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra"><domain type='kvm'></div><div class="gmail_extra"> <name>win</name></div><div class="gmail_extra"> <uuid>a2021423-89d8-4a33-aaa5-07102ae7ad4e</uuid></div><div class="gmail_extra"> <memory unit='KiB'>8388608</memory></div><div class="gmail_extra"> <currentMemory unit='KiB'>8388608</currentMemory></div><div class="gmail_extra"> <vcpu placement='static' cpuset='0-8'>8</vcpu></div><div class="gmail_extra"> <sysinfo type='smbios'></div><div class="gmail_extra"> <system></div><div class="gmail_extra"> <entry name='serial'>21aa32e5-8233-40d4-b323-128824f6becf</entry></div><div class="gmail_extra"> <entry name='uuid'>a2021423-89d8-4a33-aaa5-07102ae7ad4e</entry></div><div class="gmail_extra"> </system></div><div class="gmail_extra"> </sysinfo></div><div class="gmail_extra"> <os></div><div class="gmail_extra"> <type arch='x86_64' machine='pc'>hvm</type></div><div class="gmail_extra"> <boot dev='hd'/></div><div class="gmail_extra"> <smbios mode='sysinfo'/></div><div class="gmail_extra"> </os></div><div class="gmail_extra"> <features></div><div class="gmail_extra"> <acpi/></div><div class="gmail_extra"> <apic/></div><div class="gmail_extra"> <pae/></div><div class="gmail_extra"> <hap/></div><div class="gmail_extra"> <hyperv></div><div class="gmail_extra"> <relaxed state='on'/></div><div class="gmail_extra"> </hyperv></div><div class="gmail_extra"> </features></div><div class="gmail_extra"> <cpu></div><div class="gmail_extra"> <topology sockets='2' cores='6' threads='2'/></div><div class="gmail_extra"> </cpu></div><div class="gmail_extra"> <clock offset='localtime'></div><div class="gmail_extra"> <timer name='pit' tickpolicy='delay'/></div><div class="gmail_extra"> <timer name='rtc' tickpolicy='catchup' track='guest'/></div><div class="gmail_extra"> <timer name='hpet' present='no'/></div><div class="gmail_extra"> </clock></div><div class="gmail_extra"> <on_poweroff>destroy</on_poweroff></div><div class="gmail_extra"> <on_reboot>restart</on_reboot></div><div class="gmail_extra"> <on_crash>restart</on_crash></div><div class="gmail_extra"> <devices></div><div class="gmail_extra"> <emulator>/usr/local/bin/qemu-system-x86_64</emulator></div><div class="gmail_extra"> <disk type='file' device='disk'></div><div class="gmail_extra"> <driver name='qemu' type='qcow2' cache='none'/></div><div class="gmail_extra"> <source file='/data/win.qcow2'/></div><div class="gmail_extra"> <target dev='vda' bus='virtio'/></div><div class="gmail_extra"> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/></div><div class="gmail_extra"> </disk></div><div class="gmail_extra"> <controller type='usb' index='0'></div><div class="gmail_extra"> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/></div><div class="gmail_extra"> </controller></div><div class="gmail_extra"> <serial type='pty'></div><div class="gmail_extra"> <target port='0'/></div><div class="gmail_extra"> </serial></div><div class="gmail_extra"> <console type='pty'></div><div class="gmail_extra"> <target type='serial' port='0'/></div><div class="gmail_extra"> </console></div><div class="gmail_extra"> <input type='tablet' bus='usb'/></div><div class="gmail_extra"> <input type='mouse' bus='ps2'/></div><div class="gmail_extra"> <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0' keymap='en-us'></div><div class="gmail_extra"> <listen type='address' address='0.0.0.0'/></div><div class="gmail_extra"> </graphics></div><div class="gmail_extra"> <video></div><div class="gmail_extra"> <model type='cirrus' vram='9216' heads='1'/></div><div class="gmail_extra"> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/></div><div class="gmail_extra"> </video></div><div class="gmail_extra"> <memballoon model='virtio'></div><div class="gmail_extra"> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/></div><div class="gmail_extra"> </memballoon></div><div class="gmail_extra"> <hostdev mode='subsystem' type='pci' managed='yes'></div><div class="gmail_extra"> <source></div><div class="gmail_extra"> <address domain='0x0000' bus='0x84' slot='0x00' function='0x0'/></div><div class="gmail_extra"> </source></div><div class="gmail_extra"> </hostdev></div><div class="gmail_extra"> </devices></div><div class="gmail_extra"></domain></div></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">2017-03-09 21:49 GMT+08:00 Alex Williamson <span dir="ltr"><<a href="mailto:alex.williamson@redhat.com" target="_blank">alex.williamson@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Thu, 9 Mar 2017 11:47:32 +0800<br>
rhett rhett <<a href="mailto:rhett.kernel@gmail.com">rhett.kernel@gmail.com</a>> wrote:<br>
<br>
> somebody can help me ?<br>
<br>
</span>I asked for VM commandline or XML, you haven't provided it. I asked<br>
for lspci info, you haven't provided it. Help us help you.<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
> 2017-03-08 14:34 GMT+08:00 rhett rhett <<a href="mailto:rhett.kernel@gmail.com">rhett.kernel@gmail.com</a>>:<br>
><br>
> > here's some more error log from centos guest:<br>
> ><br>
> > Mar 7 05:38:07 localhost kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel<br>
> > Module 375.39 Tue Jan 31 20:47:00 PST 2017 (using threaded interrupts)<br>
> > Mar 7 05:38:08 localhost kernel: nvidia-modeset: Loading NVIDIA Kernel<br>
> > Mode Setting Driver for UNIX platforms 375.39 Tue Jan 31 19:41:48 PST 2017<br>
> > Mar 7 05:39:27 localhost kernel: NVRM: RmInitAdapter failed!<br>
> > (0x24:0x51:1060)<br>
> > Mar 7 05:39:27 localhost kernel: NVRM: rm_init_adapter failed for device<br>
> > bearing minor number 0<br>
> > Mar 7 05:43:40 localhost kernel: NVRM: RmInitAdapter failed!<br>
> > (0x24:0x51:1060)<br>
> > Mar 7 05:43:40 localhost kernel: NVRM: rm_init_adapter failed for device<br>
> > bearing minor number 0<br>
> > Mar 8 05:07:47 localhost kernel: nvidia: module license 'NVIDIA' taints<br>
> > kernel.<br>
> > Mar 8 05:07:47 localhost kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel<br>
> > Module<br>
> ><br>
> > 2017-03-08 14:31 GMT+08:00 rhett rhett <<a href="mailto:rhett.kernel@gmail.com">rhett.kernel@gmail.com</a>>:<br>
> ><br>
> >> i have two guest , a windows 2008 server and a centos 7.2 . in windows,<br>
> >> the device manager said the gpu can't start ,error code 10.<br>
> >> in centos, when i run nvidia-smi, it said no device found.<br>
> >><br>
> >> no specil vm configurations, whit the same config, i can use gpu<br>
> >> successfully in my two gpu server. the biggest different is , that server<br>
> >> is no pcie switcher.<br>
> >><br>
> >> 2017-03-08 11:55 GMT+08:00 Alex Williamson <<a href="mailto:alex.williamson@redhat.com">alex.williamson@redhat.com</a>>:<br>
> >><br>
> >>> On Wed, 8 Mar 2017 11:26:17 +0800<br>
> >>> rhett rhett <<a href="mailto:rhett.kernel@gmail.com">rhett.kernel@gmail.com</a>> wrote:<br>
> >>><br>
> >>> > two gpus share the same irq , i found the reason. because the msi be<br>
> >>> > disabled later , so irq 140 is being reused.<br>
> >>> ><br>
> >>> > but i don't know why somebady calls vfio_pci_ioctl to disable the msi.<br>
> >>><br>
> >>> vfio just does what the guest requests, but you're really providing<br>
> >>> hardly any more information than when you asked off list. My wild<br>
> >>> guess, is that maybe you're running a Windows guest and not configuring<br>
> >>> the VM for a vCPU type where Windows supports MSI. For more<br>
> >>> assistance, please provide basic information, like the QEMU command<br>
> >>> line or VM XML, also the PCI information from the host (sudo lspci<br>
> >>> -vvv), and of course any error codes in the guest or an actual<br>
> >>> description of how the device doesn't work in the guest. Thanks,<br>
> >>><br>
> >>> Alex<br>
> >>><br>
> >>><br>
> >>> > 2017-03-08 10:55 GMT+08:00 rhett rhett <<a href="mailto:rhett.kernel@gmail.com">rhett.kernel@gmail.com</a>>:<br>
> >>> ><br>
> >>> > > i have a question about vfio , here is my description.<br>
> >>> > ><br>
> >>> > > i have 8 gpus in my server machine , but they are all behind a pcie<br>
> >>> > > bridge. when i make a vfio passthrough , i can't use the gpus in my<br>
> >>> guest<br>
> >>> > > os.<br>
> >>> > > dmesg shows the following message<br>
> >>> > ><br>
> >>> > > [ 662.208072] vfio-pci 0000:87:00.0: irq 140 for MSI/MSI-X<br>
> >>> > > [ 725.761623] vfio-pci 0000:04:00.0: irq 140 for MSI/MSI-X<br>
> >>> > ><br>
> >>> > > i started two vm , one use 87 and another use 04, dmesg shows that<br>
> >>> they<br>
> >>> > > share the same irq 140 . is this normal ?<br>
> >>> > ><br>
> >>> > > i also saw the iommu groups, each gpu stays in a separate group, and<br>
> >>> with<br>
> >>> > > no other device in group. so this means ACS works correctly ?<br>
> >>> > ><br>
> >>> > > hope to get your helps !<br>
> >>> > ><br>
> >>><br>
> >>><br>
> >><br>
> ><br>
<br>
</div></div></blockquote></div><br></div></div>