[vfio-users] An epic wall of text with questions and comments from a Xen user

Tue Dec 1 16:13:49 UTC 2015

Hello. I'm replying your mail with comments:

> The UEFI GOP needs some minimal ROM capacity in order for the manufacturer to provide it for us. I found that also when I was browsing for the UEFI for my friend’s MSI GTX 560 Ti Lightning.

Based on random comments found in Google, I don't expect Sapphire to provide anything for a 5 years old card. I have to deal with it myself.
Some of the UEFI GOP mods I saw earlier were for GPUs that had a working UEFI GOP implementation in at least a similar Video Card, as they used those ROMs as donors. In my case, the only possible donors for a Juniper may come from two Apple branded Video Cards. If I recall correctly, Apple EFI implementation for those MAC-specific Video Cards was different than that of standard UEFI GOP, but I may be wrong. Regardless, the other issue is the Flash ROM size. At some point I found some guy trying to mod his Radeon 5770, and he couldn't do it successfully since adding the UEFI GOP to the VBIOS ROM was like an additional 20 KiB, producing a 140 KiB ROM binary or so that didn't fit on the Video Card 128 KiB Flash ROM, or something around those lines. Basically, Junipers don't have the spare ROM capacity to fit UEFI GOP.
Now that I think about that, it would be extremely interesing if I can sideload a bigger-than-possible ROM file using KVM, as that could workaround the physical Flash ROM size doing that. It would also make trying out VBIOS mods easier if I don't have to flash the Video Card ROM  everytime I want to try out something.

> I’m interested in trying out Gentoo Linux because of this [2]. Arch Linux is great but we need to update it really often. But since I don’t have any significant issues, I’m still holding up with Arch Linux. Probably I’ll try to familiarize myself with Gentoo in my Atom machine.

You aren't forced to update AL, it can work perfectly fine indefinitely for as long as you don't update. The issues usually appear after doing so, and worsens if its after an extended time gap.
For what I read of your link, one of Gentoo advantages is to do a full OS update that doesn't stall even if there is a huge time gap. Some of the other mentioned advantages are because the author doesn't seem to be aware of Arch Build System (Anything that relies on a PKGBUILD), as you can easily recompile packages with it so you can enable the missing options or use host CPU optimizations. The rest of the advantages is in the features and versatility supported by their Package Manager, portage, vs AL's pacman. Check here:
http://blog.srvthe.net/archlinux-vs-gentoo/
...this is where Gentoo seems to crush AL. Not mentioning that proper support for multiple versions of libraries, applications or the Kernel could be extremely useful at times.

> I have 2 monitors, both of them connect to both GTX 980 (DisplayPort + DVI) and Intel HD 4600 (DVI + HDMI). At boot time, both monitors display my Linux host desktop. When I start the guest, I issue a xrandr command inside my QEMU script [4] to disable the DVI of Intel HD 4600 and enable GTX 980 DisplayPort, thus shrinking the Linux desktop to about half of the original screen real estate, and enabling my Windows 10 guest to have a dedicated monitor. And when I’ve done playing games with it and want to shutdown the guest, I put another xrandr line to restore the display to its original state, making the Linux host occupy both monitors again.

I never thought about that. I forgot that you can tell a Monitor with multiple inputs what source you want it to display, and switch between them. So far, my 19' only has a single input, and the 23' has a combined VGA/DVI input but also a separate HDMI input, but its sad that I don't have a cable to test. This could mean that at most, I could use the Linux host as Dual Monitor, with the 23' one switching input between the main VM or the host with a VM with emulated GPU (Still a pain in the butt to Alt + Ctrl + Fx, but at least I would have tons of screen surface).
The syntax of your file doesn't looks overly complex compared to a Xen XL DomU config file. I suppose that after you execute the script, it doesn't finish until you close your main VM, so it can execute the last xrandr, correct? That means that the Terminal where it is executed must remain open, so is best to open a Terminal exclusively to launch each VM?

> I have 2 sets of keyboard+mouse, one set for the host and the other better set connected to a USB switch [3] which I can switch either connect to the host or to the guest with a button press. If I want to access the hardware features of my mouse and keyboard, I can just press the switch to direct the USB connection to the guest’s USB card. But I also setup Synergy on both guest and host, so if I just want to do daily update etc, I don’t have to press the switch. The Synergy performance was laggy (I actually one of the first early adopters and lucky to have Synergy Pro for just US$1.00), but these days the performance is very good. I tried using the Synergy mouse and keyboard to test out one of Call of Duty games and the response is good.

The problem with a KVM-style switch is that I don't have physical desktop space to add another keyboard and mouse pair (I feel it subpar for Dual Monitor already), besides that I will have to figure out how to get such device (I doubt that any typical vendor stocks these here, I will have to import it like I did with the Xeon and Supermicro Motherboard).
I'm also trying to avoid Synergy since I don't want to add third party applications for something that I believe that should be possible to manage from the host the way I proposed originally, but I suppose should reconsider as many people are happy using Synergy.

> This was my idea also when I use Xen VGA passthrough. But personally I kinda dislike the idea, since if I need ZFS storage I have to setup and run another dedicated guest for service storage (just like All-in-One concept in VMware ESXi platform). Instead of throwing the memory for running a dedicated storage, I went to run the host as both Hypervisor and ZFS storage. This way I can allocate the memory for ZFS operation, I can store all my guest image on the ZFS, guest which needs ISO images, installers, etc can access my file server on the ZFS, etc.

I agree that so far using the host as File Server seems like a better idea, unless you're using a VM with another specific distribution like unRAID or something like that. Also, all your comments seems to point that ZFS is better, but that it isn't that great at all for a single disk. I suppose that I may initially let the HD stay as it is while I experiment how to set the File Server. Same with using the host as server instead of the extra VM.
Another thing which I was reading about when researching File Servers, was that you could use either SMB or NFS protocols. SMB is Windows native, and Linux compatibility is good enough with Samba, is the standard solution. NFS is Linux native, has better thoroughput, lower CPU overhead, Multithreading support, but NFS client support for Windows is all over the place, with some Windows versions having them builtin and others requiring obscure optional components or third party applications to use.

> Hmm, a single disk to use as host OS and multiple guests? Sounds like possible I/O hunger for me.

Diskses cost money :) I purchased a single 4 TB HD since I got it extremely cheap compared to 2 * 2 TB. I was also lucky to get my hands on a SSD. With a SSD, I can take advantage of much lower latency and vastly increased IOPS. For starting Windows 10 and some games it will be HUGELY noticeable.
So far, I don't feel more I/O bottlenecked by a single HD that I would be in any native system, even if I'm losing performance due to overhead. My others VMs are usually Idle so they don't use the HD at all, is mainly the reserved RAM for them that is wasted.

A reelevant thing that I have noticed, is that something is always accessing the HD even when my main VM is in Idle, so I hear it working ocassionally even from the bed. As far that I was able to figure out with Windows 10 Resource Monitor, Chrome and Skype are disk thrashers since they seem to use the HD for cache very often (At least once per minute) even when minimized and not in use. I tried closing them and disk activity is pretty much zero, as expected. I suppose than these are the sort of applications that could benefit from a RAMDisk and figuring out how to force their caches there.
On a SSD I wouldn't notice that anyways since there would be no noise due activity, and the wear would be minimal as it seems that they just like to read/writes a few KiBs. Is nothing heavy, how annoying those stupid minimal read and writes are when they have to wake up the HD.

> I haven’t tried myself putting my guests on a ZVOL (ZFS Volume), but I think it’s simpler to migrate if I use ZFS dataset right now. I’m intrigued to try ZVOL, but haven’t got the chance to do it.

I don't know how ZFS behaves, but as far that I remember, ZVol was similar to LVM. There is a HUGE performance difference between LVM and file backed VMs with small files, not sure how much it would change if the file backed VM sits on a ZFS File System instead of EXT4.
Your main VM sits on both a file-backed disk for the OS itself and the two dedicated HDs, maybe you would notice the performance difference migrating from file backed to volume for boot times. However, in SSDs, maybe the difference isn't that ridiculous noticeable compared to HDs.

> My setup is as follows, I have my Arch Linux host OS as the hypervisor on an Intel 530 120 GB (EXT4), 500 GB Hitachi for Steam/Origin/Uplay/GOG games (NTFS), 2 TB Hitachi for redownloadable contents (such as my iTunes music, Steam/Origin/Uplay/GOG backups, my game recordings, etc.), 160 GB Hitachi for my guest OS clone (in case I want to boot natively), and 8 x 1 TB WD Caviar Blue configured as a single ZFS pool using striped of 2-way-mirror (RAID10) configuration, thus I have the benefit of up to 8 times read performance and up to 4 times write performance of a single WD Caviar Blue.

Do you have the Intel 530 and the Hitachi HDs formatted as a single, big partition?
Wouldn't it be better if you switched to 2/4 high density HDs in RAID 1/10 instead of the 8 WDs? That many HDs would mean lot of power consumption, noise, vibrations, and increased chances that one of them dies (More HDs, more chances for one of them to fail). Unless you do something that can benefit from sustained sequential read or write performance and need a ton of capacity (For example, high definition video recording), HDs are no match for a single SSD simply because access latency, the SSD will always feel faster - and NVMe SSDs are setting the bar much higher, through those are vastly more expensive.
Assuming money being no issue, I would have HDs as slow storage with redundancy and not care that much about performance, just reliability. If I require performance, I would copy the neccesary stuff to the SSD. HDs aren't mean to compete with them in that area.

> I think you have to allocate at least a dedicated core for the Dom0 itself to use it exclusively. So probably you can try Dom0 1 physical core, main VM 2 physical cores, and your family’s VM 1 physical core. Though I imagine it would still hurt since your family’s VM ran a CPU-intensive application.

This is the situation that I want to avoid, having to reserve exclusive CPU resources that may, or may not, be used. Actually, the idea of that arrangement is that the there is never any scheduler conflict between host or any of the VMs, and while I didn't tested, I believe than using it, you may be able to put the VM under heavy CPU load without any impact on the host or the other VM. However, it would mean lots of reboots to add/remove resources. At least using the host as File Server could mitigate that...
I was expecting that this could be achieved doing something like setting priority in the host CPU scheduler, so if I'm actively using my main VM, the other VM are lower priority and just gets spare cycles (Which may mean a lot of lag in them, but shouldn't be noticeable at all if you're just doing a render. It just mean more time, no change to user experience). This way my main VM runs "real time", and the secondary VMs gets the job done on the idle cycles. Its conceptually similar to using Windows XP Task Scheduler during the Single Core age to set a lower Priority for CPU intensive Threads, this way you could achieve better Multitasking even on load. I suppose than doing it this way MAY work, sad that I never learned how to use Xen CPU scheduler to see if it fixes my issues.

In some other mails of this Mailing Lists, I see that there are some mentions regarding CPU pinning, priority, and DPC Latency, so it should be extremely useful to figure out how to get both the less possible overhead and still have enough CPU sharing capabilites to be flexible.

> I’ve managed to read your complete email. Hooray for me I guess. :)
> If I copy-paste them on a word processor, here are the statistics:

Congrats. I was expecting at least one or two more replies, including one from AW himself, since he could be the only one that can actually say if the VGA Arbitration issue can be workarounded the Xen way (Not using VGA since you wait for the OS to load the Drivers instead of VBIOS initializing the GPU), or the sideload workaround for an UEFI GOP mod that I mentioned in this mail.

To be honest, I don't know if its better to drop one huge wall of text, or several smaller ones. The problem is that I usually write about lots of thing simultaneously, so I would regardless drop like 4-8 mails the same day. I would pretty much spam your mailbox if I do it that way, so I prefer the single, big one. Also, I prefer by a LONG shot the Forum style to Mailing Lists. Too many issues regarding HTML/text format, line wrap, not being able to preview how it would look on the archive, or edit after sending.