<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div></div></div></div></div></div></div></div></div></div> <br><div class="gmail_quote">2017-09-04 23:57 GMT+08:00 Daniel P. Berrange <span dir="ltr"><<a href="mailto:berrange@redhat.com" target="_blank">berrange@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On Mon, Sep 04, 2017 at 04:14:00PM +0200, Martin Kletzander wrote:<br> > * The current design (finally something libvirt-related, right?)<br> ><br> > The discussion ended with a conclusion of the following (with my best<br> > knowledge, there were so many discussions about so many things that I<br> > would spend too much time looking up all of them):<br> ><br> > - Users should not need to specify bit masks, such complexity should be<br> > abstracted. We'll use sizes (e.g. 4MB)<br> ><br> > - Multiple vCPUs might need to share the same allocation.<br> ><br> > - Exclusivity of allocations is to be assumed, that is only unoccupied<br> > cache should be used for new allocations.<br> ><br> > The last point seems trivial but it's actually very specific condition<br> > that, if removed, can cause several problems. If it's hard to grasp the<br> > last point together with the second one, you're on the right track. If<br> > not, then I'll try to make a point for why the last point should be<br> > removed in 3... 2... 1...<br> ><br> > * Design flaws </div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"> ><br> > 1) Users cannot specify any allocation that would share only part with<br> > some other allocation of the domain or the default group.<br> ></div></div></blockquote><div><br></div><div>yep, There's no share cache ways support. </div><div><br></div><div>I was thinking that create a cache resource group in libvirt, and user can</div><div>add vms into that resource group, this is good for those who would like to</div><div>have share cache resource, maybe NFV case.</div><div><br></div><div>but for case:</div><div><br></div><div>VM1: fff00</div><div>VM2: 00fff</div><div>which have a `f` (4 cache ways) share, seems have not really meanful.</div><div>at least, I don't heart that we have that case. This was mentioned by </div><div><em style="color:rgb(0,0,0);font-family:Times;font-size:medium">Marcelo Tosatti </em>before too.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"> > 2) It was not specified what to do with the default resource group.<br> > There might be several ways to approach this, with varying pros and<br> > cons:<br> ><br> > a) Treat it as any other group. That is any bit set for this group<br> > will be excluded from usable bits when creating new allocation<br> > for a domain.<br> ><br> > - Very predictable behaviour<br> ><br> > - You will not be able to allocate any amount of cache without<br> > previous setting for the default group as that will have all<br> > the bits set which will make all the cache unusable<br> ><br> > b) Automatically remove the appropriate amount of bits that are<br> > needed for new domains.<br> ><br> > - No need to do any change to the system settings in order to<br> > use this new feature<br> ><br> > - We would have to change system settings, which is generally<br> > frowned upon when done "automatically" as a side effect of<br> > starting a domain, especially for such scarce resource as<br> > cache<br> ><br> > - The change to system settings would not be entirely<br> > predictable<br> ><br> > c) Act like it doesn't exist and don't remove its allocations from<br> > consideration<br> ><br> > - Doesn't really make sense as system processes might be<br> > trashing the cache as any VM, moreover when all VM processes<br> > without allocations will be based in the default group as<br> > well<br> ><br> > 3) There is no way for users to know what the particular settings are<br> > for any running domain.<br></div></div></blockquote><div><br></div><div>I think you are going to expose what the current CBM looks like for</div><div>a given VM? That's fair enough.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"> ><br> > The first point was deemed a corner case. Fair enough on its own, but<br> > considering point 2 and its solutions, it is rather difficult for me to<br> > justify it. Also, let's say you have domain with 4 vCPUs out of which<br> > you know 1 might be trashing the cache, but you don't want to restrict<br> > it completely, but others will utilize it very nicely. Sensible<br> > allocations for such domain's vCPUs might be:<br> ><br> > vCPU 0: 000f<br> > vCPUs 1-3: ffff<br> ><br> > as you want vCPUs 1-3 to utilize even the part of cache that might get<br> > trashed by vCPU 0. Or they might share some data (especially<br> > guest-memory-related).<br> ><br> > The case above is not possible to set up with only per-vcpu(s) scalar<br> > setting. And there are more as you might imagine now. For example how<br> > do we behave with iothreads and emulator threads?<br> <br></div></div></blockquote><div>This is kinds of hard to implement, but possible.</div><div><br></div><div>is 1:1 mapping of resource group to VM?</div><div><br></div><div>if you want to have iothreads and emulator threads to have separated</div><div>cache allocation, you may need to create resource group to associated with</div><div>VM's vcpus and iothreads and emulator thread.</div><div><br></div><div>but COS number is limited, does it worth to have so fine granularity control? <br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"> </div></div>Ok, I see what you're getting at. I've actually forgotten what<br> our current design looks like though :-)<br> <br> What level of granularity were we allowing within a guest ?<br> All vCPUs use separate cache regions from each other, or all<br> vCPUs use a share cached region, but separate from other guests,<br> or a mix ?<br> <span class="gmail-"><br> > * My suggestion:<br> ><br> > - Provide an API for querying and changing the allocation of the<br> > default resource group. This would be similar to setting and<br> > querying hugepage allocations (see virsh's freepages/allocpages<br> > commands).<br> <br> </span>Reasonable<br></blockquote><div><br></div><div>+1, but another API should be exposed the cache ways usage on the host</div><div>e.g.</div><div><br></div><div>grp1: 0ff00</div><div>grp2: 00ff0</div><div>default: 0000f</div><div><br></div><div>Since you are going to support shared mode, so you may need to expose this.</div><div><br></div><div>free ways : f000</div><div>group list [grp1: 0ff00</div><div> grp2: 00ff0</div><div> default: 0000f]</div><div><br></div><div>by doing this, user can have sense on where he can start from.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <span class="gmail-"><br> > - Let users specify the starting position in addition to the size, i.e.<br> > not only specifying "size", but also "from". If "from" is not<br> > specified, the whole allocation must be exclusive. If "from" is<br> > specified it will be set without checking for collisions. The latter<br> > needs them to query the system or know what settings are applied<br> > (this should be the case all the time), but is better then adding<br> > non-specific and/or meaningless exclusivity settings (how do you<br> > specify part-exclusivity of the cache as in the example above)<br> <br> </span>I'm concerned about the idea of not checking 'from' for collisions,<br> if there's allowed a mix of guests with & within 'from'.</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> eg consider<br> <br> * Initially 24 MB of cache is free, starting at 8MB<br> * run guest A from=8M, size=8M<br> * run guest B size=8M<br> => libvirt sets from=16M, so doesn't clash with A<br> * stop guest A<br> * run guest C size=8M<br> => libvirt sets from=8M, so doesn't clash with B<br> * restart guest A<br> => now clashes with guest C, whereas if you had<br> left guest A running, then C would have<br> got from=24MB and avoided clash<br> <br> IOW, if we're to allow users to set 'from', I think we need to<br> have an explicit flag to indicate whether this is an exclusive<br> or shared allocation. That way guest A would set 'exclusive',<br> and so at least see an error when it got a clash with guest<br> C in the example.<br></blockquote><div><br></div><div>+1 </div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-"> > - After starting a domain, fill in any missing information about the<br> > allocation (I'm generalizing here, but fro now it would only be the<br> > optional "from" attribute)<br> ><br> > - Add settings not only for vCPUs, but also for other threads as we do<br> > with pinning, schedulers, etc.<br> <br></span></blockquote><div><br></div><div>Thanks Martin to propose this again.</div><div><br></div><div>I have started this RFC since the beginning of the year, and made several</div><div>junior patches, but fail to get merged.</div><div><br></div><div>While recently I (together with my team) have started a software "Resource</div><div>Management Daemon" to manage resource like last level cache, do cache</div><div>allocation and cache usage monitor, it's accept tcp/unix socket REST API</div><div>request and talk with /sys/fs/resctrl interface to manage all CAT stuff.</div><div><br></div><div>RMD will hidden the complexity usage in CAT and it support not only VM</div><div>but also other applications and containers.</div><div><br></div><div>RMD will open source soon in weeks, and could be leveraged in libvirt</div><div>or other management software which want to have control of fine granularity</div><div>resource.</div><div><br></div><div>We have done an integration POC with OpenStack Nova, and would like</div><div>to get into integrate too.</div><div><br></div><div>Would like to see if libvirt can integrate with RMD too.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-"> </span>Regards,<br> Daniel<br> <span class="gmail-HOEnZb"><font color="#888888">--<br> |: <a href="https://berrange.com" rel="noreferrer" target="_blank">https://berrange.com</a> -o- <a href="https://www.flickr.com/photos/dberrange" rel="noreferrer" target="_blank">https://www.flickr.com/photos/<wbr>dberrange</a> :|<br> |: <a href="https://libvirt.org" rel="noreferrer" target="_blank">https://libvirt.org</a> -o- <a href="https://fstop138.berrange.com" rel="noreferrer" target="_blank">https://fstop138.berrange.com</a> :|<br> |: <a href="https://entangle-photo.org" rel="noreferrer" target="_blank">https://entangle-photo.org</a> -o- <a href="https://www.instagram.com/dberrange" rel="noreferrer" target="_blank">https://www.instagram.com/<wbr>dberrange</a> :|<br> </font></span></blockquote></div><br></div></div>