[vfio-users] An epic wall of text with questions and comments from a Xen user

Fri Dec 4 02:34:37 UTC 2015

On December 1, 2015 at 23:20:12, Zir Blazer (zir_blazer at hotmail.com) wrote:
Hello. I'm replying your mail with comments: 
Hi Zir,

...
Now that I think about that, it would be extremely interesing if I can sideload a bigger-than-possible ROM file using KVM, as that could workaround the physical Flash ROM size doing that. It would also make trying out VBIOS mods easier if I don't have to flash the Video Card ROM  everytime I want to try out something. 
I just googled and found this on InsanelyMac [1], there are some people who had success on flashing their GT2xx and GT6xx based cards to UEFI. I think it’s interesting if we could achieve that size boundaries through romfile option.

You aren't forced to update AL, it can work perfectly fine indefinitely for as long as you don't update. The issues usually appear after doing so, and worsens if its after an extended time gap. 
Yes, I’m not forced, what I meant was that if we don’t update often, there’s a possibility that the next pacman -Syu will break something.

For what I read of your link, one of Gentoo advantages is to do a full OS update that doesn't stall even if there is a huge time gap. Some of the other mentioned advantages are because the author doesn't seem to be aware of Arch Build System (Anything that relies on a PKGBUILD), as you can easily recompile packages with it so you can enable the missing options or use host CPU optimizations. The rest of the advantages is in the features and versatility supported by their Package Manager, portage, vs AL's pacman. Check here: 
http://blog.srvthe.net/archlinux-vs-gentoo/ 
I personally use ABS exclusively for my patched linux kernels and QEMU. Works great and I can fine tune it to work for my native processor features.

...this is where Gentoo seems to crush AL. Not mentioning that proper support for multiple versions of libraries, applications or the Kernel could be extremely useful at times. 
Now supporting multiple libraries, applications, and kernels are what I want. Currently if I want to try out another kernel, I need to export ZFS pool, uninstall the kernel module, restart to new kernel, compile and install ZFS kernel modules, and finally import ZFS pool again before starting any guests. If somehow Gentoo can support multiple ZFS versions, this would be interesting. DKMS would probably solves this for me in Arch Linux, but I’m not really a fan of it.

...
I never thought about that. I forgot that you can tell a Monitor with multiple inputs what source you want it to display, and switch between them. So far, my 19' only has a single input, and the 23' has a combined VGA/DVI input but also a separate HDMI input, but its sad that I don't have a cable to test. This could mean that at most, I could use the Linux host as Dual Monitor, with the 23' one switching input between the main VM or the host with a VM with emulated GPU (Still a pain in the butt to Alt + Ctrl + Fx, but at least I would have tons of screen surface). 
The syntax of your file doesn't looks overly complex compared to a Xen XL DomU config file. I suppose that after you execute the script, it doesn't finish until you close your main VM, so it can execute the last xrandr, correct? That means that the Terminal where it is executed must remain open, so is best to open a Terminal exclusively to launch each VM? 
Actually the original idea came from [2]. I get that xrandr idea from him. Yes, I left the Terminal window open since I can issue QEMU monitor commands to it directly. Cases when the guest got stuck etc, I can just issue “quit” or “system_reset” on the monitor to shutdown or reboot the guest. If you want to dismiss the Terminal window, I think you need to create the whole script running in background, instead of just -daemonize the QEMU guest. Since we need to re-enable the left monitor AFTER the guest shutdowns, not AFTER the guest is daemonized. 

The problem with a KVM-style switch is that I don't have physical desktop space to add another keyboard and mouse pair (I feel it subpar for Dual Monitor already), besides that I will have to figure out how to get such device (I doubt that any typical vendor stocks these here, I will have to import it like I did with the Xeon and Supermicro Motherboard). 
I'm also trying to avoid Synergy since I don't want to add third party applications for something that I believe that should be possible to manage from the host the way I proposed originally, but I suppose should reconsider as many people are happy using Synergy. 
You can try Gerd Hoffman’s patches on passing through mouse and keyboard when passing through GPU [3]. With that patch, you can switch mouse and keyboard between the guest and the host by issung Left Ctrl + Right Ctrl at the same time (currently it is hardcoded). But somehow I still find using Synergy is still convenient, you should try it, but that patch works great if we haven’t configure the networking and Synergy on the guest.

I agree that so far using the host as File Server seems like a better idea, unless you're using a VM with another specific distribution like unRAID or something like that. 
Actually VFIO GPU passthrough technology came later to me after I learn about ZFS. Like I said in my previous email, one of my objectives in doing VFIO GPU passthrough is because I want to have a solid gaming machine (almost always Windows) but I prefer to store all my data in ZFS, so VFIO and ZFSOnLinux solves this for me. Since I already established a ZFS storage array for storing my data, I might as well utilize it also for my gaming needs (Windows disk image).

Also, all your comments seems to point that ZFS is better, but that it isn't that great at all for a single disk. 
ZFS on a single disk still provides benefit such as automatic on-disk compression, online checksum (just detect though, cannot self-heal in case all of the copies are corrupted), multiple copies of data (in case parts of the disk surface is bad, there are still backup copies of the data on the other side of the surface), snapshots, clones, ZVOL. 

But if you put multiple disks in your array, ZFS can also self-heal, improves I/O throughput (depends on your storage topology whether its a Stripe, Mirror, Stripe of Mirrors, RAIDZ{123}, Stripe of RAIDZ{123}). ZFS can also add another write cache and read cache device.

I suppose that I may initially let the HD stay as it is while I experiment how to set the File Server. Same with using the host as server instead of the extra VM.
File server is storage agnostic, so yes you can experiment all those file service protocol options without setting up complex storage solution. What the file server sees is only directories and permissions.

Another thing which I was reading about when researching File Servers, was that you could use either SMB or NFS protocols. SMB is Windows native, and Linux compatibility is good enough with Samba, is the standard solution. NFS is Linux native, has better thoroughput, lower CPU overhead, Multithreading support, but NFS client support for Windows is all over the place, with some Windows versions having them builtin and others requiring obscure optional components or third party applications to use. 
I think if you’re sharing with Windows, it’s best to utilize Windows Share (SMB or Samba) in your file server, since it is natively supported by Windows. Also if you know Windows Previous Versions, we can combine ZFS snapshots and Samba to provide Windows Previous Versions on top of ZFS share [4][5]. I haven’t tried it though.

Diskses cost money :) I purchased a single 4 TB HD since I got it extremely cheap compared to 2 * 2 TB. I was also lucky to get my hands on a SSD. With a SSD, I can take advantage of much lower latency and vastly increased IOPS. For starting Windows 10 and some games it will be HUGELY noticeable. 
So far, I don't feel more I/O bottlenecked by a single HD that I would be in any native system, even if I'm losing performance due to overhead. My others VMs are usually Idle so they don't use the HD at all, is mainly the reserved RAM for them that is wasted. 
I agree that disks cost money, but it depends on how much you appreciate your data. A single 4 TB disk is cheaper than 4 x 1 TB disks. But the higher capacity does not decrease the error rate. ZFS pool scubbing on a 1 TB is faster than a 4 TB disk also. I’m more afraid to have a failed 4 TB disk instead of one of my 4 disks to fail. Since I can purchase the replacement and recover the pool.

True that SSD will almost win on every competition against platter disks, except for price/GB currently. I create the storage array to compensate between performance and capacity, plus I have the realiability and self-healing. Putting multiple VM in an SSD drive should be OK, but the capacity will still be very small comparing with a storage array.

A reelevant thing that I have noticed, is that something is always accessing the HD even when my main VM is in Idle, so I hear it working ocassionally even from the bed. As far that I was able to figure out with Windows 10 Resource Monitor, Chrome and Skype are disk thrashers since they seem to use the HD for cache very often (At least once per minute) even when minimized and not in use. I tried closing them and disk activity is pretty much zero, as expected. I suppose than these are the sort of applications that could benefit from a RAMDisk and figuring out how to force their caches there. 
You can configure how guest I/O cache behaves on each virtual disk. Since I don’t have a ZFS Log device, I put the guest cache to follow when ZFS flushes the disk (by default I think it’s every 5 seconds). With this configuration, whenever the guest wants to flush the disk, the virtual disk tells the guest that it is already flushed. But in reality the cache is still waiting for ZFS to flush it in order. I think you do it also without ZFS, it’s a QEMU thing. You need to have a UPS though in case there’s a power failure.

On a SSD I wouldn't notice that anyways since there would be no noise due activity, and the wear would be minimal as it seems that they just like to read/writes a few KiBs. Is nothing heavy, how annoying those stupid minimal read and writes are when they have to wake up the HD. 
Some drive has this “green” features to sleep whenever they can. Drives like WD Green have this. Some drive firmwares offer to disable this feature.

I don't know how ZFS behaves, but as far that I remember, ZVol was similar to LVM. There is a HUGE performance difference between LVM and file backed VMs with small files, not sure how much it would change if the file backed VM sits on a ZFS File System instead of EXT4. 
Your main VM sits on both a file-backed disk for the OS itself and the two dedicated HDs, maybe you would notice the performance difference migrating from file backed to volume for boot times. However, in SSDs, maybe the difference isn't that ridiculous noticeable compared to HDs. 
Hmm, I’ll look onto the performance comparison with ZVOL if I have a chance. Probably I need to create the ZVOL, image it with the raw disk image. Currently I’m more than pleased with my current guest boot time. :)

Do you have the Intel 530 and the Hitachi HDs formatted as a single, big partition? 
My Intel 530 is purely for the Arch Linux host. It consist of 2 partitions, the EFI /boot partition (512 MB, FAT32), and the / root partition (the rest GB, EXT4). Since I want it to boot with UEFI, I need to split the /boot partition.

Wouldn't it be better if you switched to 2/4 high density HDs in RAID 1/10 instead of the 8 WDs? That many HDs would mean lot of power consumption, noise, vibrations, and increased chances that one of them dies (More HDs, more chances for one of them to fail). Unless you do something that can benefit from sustained sequential read or write performance and need a ton of capacity (For example, high definition video recording), HDs are no match for a single SSD simply because access latency, the SSD will always feel faster - and NVMe SSDs are setting the bar much higher, through those are vastly more expensive. 
My GPU power consumption is more than all of my disks combined. :) So far I have no noise and vibration issues, at least not noticable by me. ZFS loves many disks. The more the disks is actually decrease the failure rate there are more copies to backup each other. Just a single 4 TB hard disk is a more failure prone in my opinion. In case that particular disk fails, all of our data are gone. I’m not looking for the performance alone, but the combination of performance and realiability with good amount of capacity and price.

Assuming money being no issue, I would have HDs as slow storage with redundancy and not care that much about performance, just reliability. If I require performance, I would copy the neccesary stuff to the SSD. HDs aren't mean to compete with them in that area. 
Copy pasting all over the place is actually making redundancy. But I feel a single pool of storage would be more beneficial to me, as I don’t have to manage copy pasting all over the place and have a single root directory for my data wherever the physical data are stored. In case I need to expand the pool, I can just replace each of my 1 TB disk with a 2 TB ones and the pool capacity automatically expanded without moving all my data around.

...
To be honest, I don't know if its better to drop one huge wall of text, or several smaller ones. The problem is that I usually write about lots of thing simultaneously, so I would regardless drop like 4-8 mails the same day. I would pretty much spam your mailbox if I do it that way, so I prefer the single, big one. Also, I prefer by a LONG shot the Forum style to Mailing Lists. Too many issues regarding HTML/text format, line wrap, not being able to preview how it would look on the archive, or edit after sending. 
I think if you want to put all of them as 1 message, you need to put a TL;DR section at the beginning and put your summary there. But I think the latter one is more suggested.

[1] http://www.insanelymac.com/forum/topic/299614-asus-eah6450-video-bios-uefi-gop-upgrade-and-gop-uefi-binary-in-efi-for-many-ati-cards

[2] https://www.youtube.com/watch?v=37D2bRsthfI

[3] https://www.kraxel.org/cgit/qemu/log/?h=work/input-dev-event 

[4] https://github.com/zfsonlinux/zfs-auto-snapshot/wiki/Samba

[5] https://blogs.oracle.com/amw/entry/using_windows_previous_versions_to1

-- 
Okky Hendriansyah
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151204/f070a2ee/attachment.htm>