<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 2017年12月15日 17:06, Martin Kletzander
wrote:<br>
</div>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">On
Thu, Dec 14, 2017 at 07:46:27PM +0800, Eli wrote:
<br>
<blockquote type="cite">
<br>
<blockquote type="cite">
<blockquote type="cite">
<br>
@Eli: Can you help with the testing?
<br>
<br>
</blockquote>
</blockquote>
<br>
It seems the interface is only implement the isolated case, I
remember
<br>
that you have proposed that for some overlap case?
<br>
<br>
</blockquote>
<br>
Hi, yes. It got a bit more complicated so I want to do this
<br>
incrementally. First enable the easiest cases, then add APIs to
manage
<br>
the system's default group, make type='both' allocation work on
<br>
CDP-enabled hosts, add APIs for modifying cachetunes for live and
<br>
stopped domains, add support for memory bandwidth allocation, and
so
<br>
on. This is too much stuff to add in one go.
<br>
<br>
I guess I forgot to add this info to the cover letter (I think I
did at
<br>
least for the previous version).
<br>
<br>
I also wasted some time on the tests, some of them are not even in
the
<br>
patchset, have a look at previous version if you want to see them.
<br>
<br>
</blockquote>
ok, sorry for not watching for libvirt list for some time..<br>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">
<blockquote type="cite">I have not see the whole patch set yet,
but I have some quick testing on
<br>
you patch, will try to find more time to review patches
(Currently I am
<br>
maintain another daemon software which is dedicated for RDT
feature
<br>
called RMD)
<br>
<br>
Only the issue 1 is the true issue, for the others, I think they
should
<br>
be discussed, or be treat as the 'known issue'.
<br>
<br>
My env:
<br>
<br>
L1d cache: 32K
<br>
L1i cache: 32K
<br>
L2 cache: 256K
<br>
L3 cache: 56320K
<br>
NUMA node0 CPU(s): 0-21,44-65
<br>
NUMA node1 CPU(s): 22-43,66-87
<br>
<br>
<br>
virsh capabilities:
<br>
<br>
171 <cache>
<br>
172 <bank id='0' level='3' type='both' size='55'
unit='MiB'
<br>
cpus='0-21,44-65'>
<br>
173 <control granularity='2816' unit='KiB'
type='both'
<br>
maxAllocs='16'/>
<br>
174 </bank>
<br>
175 <bank id='1' level='3' type='both' size='55'
unit='MiB'
<br>
cpus='22-43,66-87'>
<br>
176 <control granularity='2816' unit='KiB'
type='both'
<br>
maxAllocs='16'/>
<br>
177 </bank>
<br>
178 </cache>
<br>
<br>
*Issue:
<br>
<br>
*1. Doesn't support asynchronous cache allocation. e.g, I need
provide
<br>
all cache allocation require ways, but I am only care about the
<br>
allocation on one of the cache id, cause the VM won't be
schedule to
<br>
another cache (socket).
<br>
<br>
</blockquote>
<br>
Oh, really? This is not written in the kernel documentation.
Can't the
<br>
unspecified caches just inherit the setting from the default
group?
<br>
That would make sense. It would also automatically adjust if the
<br>
default system one is changed.
<br>
<br>
</blockquote>
Maybe I express myself not clearly, yes the caches will be added to
default resource group<br>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">Do
you have any contact to anyone working on the RDT in the kernel?
I
<br>
think this would save time and effort to anyone who will be using
the
<br>
feature.
<br>
</blockquote>
Sure, <em><a class="moz-txt-link-abbreviated" href="mailto:fenghua.yu@intel.com">fenghua.yu@intel.com</a> and Tony Luck
<a class="moz-txt-link-rfc2396E" href="mailto:tony.luck@intel.com"><tony.luck@intel.com></a><br>
<br>
kernel doc
<a class="moz-txt-link-freetext" href="https://github.com/torvalds/linux/blob/master/Documentation/x86/intel_rdt_ui.txt">https://github.com/torvalds/linux/blob/master/Documentation/x86/intel_rdt_ui.txt</a><br>
<br>
</em><em style="color: rgb(221, 75, 57); font-style: normal;
font-weight: 400; font-family: arial, sans-serif; font-size:
small; font-variant-ligatures: normal; font-variant-caps: normal;
letter-spacing: normal; orphans: 2; text-align: left; text-indent:
0px; text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-style:
initial; text-decoration-color: initial;"></em><span style="color:
rgb(84, 84, 84); font-family: arial, sans-serif; font-size: small;
font-style: normal; font-variant-ligatures: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing:
normal; orphans: 2; text-align: left; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-style:
initial; text-decoration-color: initial; display: inline
!important; float: none;"></span><em style="color: rgb(221, 75,
57); font-style: normal; font-weight: 400; font-family: arial,
sans-serif; font-size: small; font-variant-ligatures: normal;
font-variant-caps: normal; letter-spacing: normal; orphans: 2;
text-align: left; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255); text-decoration-style: initial; text-decoration-color:
initial;"></em><span style="color: rgb(84, 84, 84); font-family:
arial, sans-serif; font-size: small; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
left; text-indent: 0px; text-transform: none; white-space: normal;
widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-style:
initial; text-decoration-color: initial; display: inline
!important; float: none;"></span><em style="color: rgb(221, 75,
57); font-style: normal; font-weight: 400; font-family: arial,
sans-serif; font-size: small; font-variant-ligatures: normal;
font-variant-caps: normal; letter-spacing: normal; orphans: 2;
text-align: left; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255); text-decoration-style: initial; text-decoration-color:
initial;"></em><span style="color: rgb(84, 84, 84); font-family:
arial, sans-serif; font-size: small; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
left; text-indent: 0px; text-transform: none; white-space: normal;
widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration-style:
initial; text-decoration-color: initial; display: inline
!important; float: none;"></span>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">
<br>
<blockquote type="cite">So I got this error if I define the domain
like this:
<br>
<br>
<vcpu placement='static'>6</vcpu>
<br>
<cputune>
<br>
<emulatorpin cpuset='0,37-38,44,81-82'/>
<br>
<cachetune vcpus='0-4'>
<br>
* <cache id='0' level='3' type='both' size='2816'
unit='KiB'/>
<br>
^^^ not provide cache id='1'
<br>
* </cachetune>
<br>
<br>
<br>
root@s2600wt:~# virsh start kvm-cat
<br>
error: Failed to start domain kvm-cat
<br>
error: Cannot write into schemata file
<br>
'/sys/fs/resctrl/qemu-qemu-13-kvm-cat-0-4/schemata': Invalid
argument
<br>
<br>
</blockquote>
<br>
Oh, I have to figure out why is there 'qemu-qemu' :D
<br>
<br>
<blockquote type="cite">This behavior is not correct.
<br>
<br>
I expect the CBM will be look like:
<br>
<br>
root@s2600wt:/sys/fs/resctrl# cat qemu-qemu-14-kvm-cat-0-4/*
<br>
000000,00000000,00000000
<br>
L3:0=80;1=fffff *(no matter what it is, cause my VM won't be
schedule on
<br>
it, ether I have deinfe the vcpu->cpu pining or, I assume
that kernel
<br>
won't schedule it to cache 1)
<br>
<br>
</blockquote>
<br>
Well, it matters. It would have to have all zeros there so that
that
<br>
part of the cache is not occupied.
<br>
</blockquote>
Well, the hardware won't allow you to specify 0 ways , at least 1
(some of the platform it's 2 ways)<br>
From my previous experence, I set it to fffff (it will be treat as 0
in the code)<br>
<br>
it's decided by min_cbm_bits<br>
<br>
see
<a class="moz-txt-link-freetext" href="https://github.com/torvalds/linux/blob/master/Documentation/x86/intel_rdt_ui.txt#L48:14">https://github.com/torvalds/linux/blob/master/Documentation/x86/intel_rdt_ui.txt#L48:14</a><span style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: pre; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 251, 221); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">
</span><br>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">
<br>
<blockquote type="cite">*Or at least, restrict xml when I define
this domain, tell me I need to
<br>
provide all cache ids (even if I have 4 cache but I only run my
VM on
<br>
'cache 0')
<br>
*
<br>
</blockquote>
<br>
We could do that. It would allow us to make this better (or lift
the
<br>
restriction) in case this is "fixed" in the kernel.
<br>
<br>
Or at least in the future we could do this to meet the users
half-way:
<br>
<br>
- Any vcpus that have cachetune enabled for them must also be
pinned
<br>
<br>
- Users need to specify allocations for all cache ids that the
vcpu
<br>
might run on (according to the pinning acquired from before), for
all
<br>
others we'd just simply set it to all zeros or the same bitmask
as the
<br>
system's default group.
<br>
<br>
But for now we could just copy the system's setting to unspecified
<br>
caches or request the user to specify everything.
<br>
<br>
<blockquote type="cite">*2. cache way fragment (no good answers)
<br>
<br>
I see that for now we allocate cache ways start from the low
bits, newly
<br>
created VM will allocate cache from the next way, if some of the
VM
<br>
(allocated ways in the middle, eg it's schemata is 00100)
destroyed, and
<br>
that slot (1 cache way) may not fit others and it will be
wasted, But,
<br>
how can we handle this, seems no good way, rearrangement? That
will lead
<br>
cache missing in a time window I think.
<br>
<br>
</blockquote>
<br>
Avoiding fragmentation is not a simple thing. It's impossible to
do
<br>
without any moving, which might be unwanted. This will be solved
by
<br>
providing an API that will tell you move the allocation if you so
<br>
desire. For now I at least try allocating the smallest region
into
<br>
which the requested allocation fits, so that the unallocated parts
are
<br>
as big as possible.
<br>
<br>
</blockquote>
Agree<br>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">
<blockquote type="cite">3. The admin/user should manually operate
the default resource group,
<br>
that's is to say, after resctrl is mounted, the admin/user
should
<br>
manually change the schemata of default group. Will libvirt
provide
<br>
interface/API to handle it?
<br>
<br>
</blockquote>
<br>
Yes, this is planned.
<br>
<br>
<blockquote type="cite">4. Will provide some APIs like
`FreeCacheWay` to end user to see how
<br>
many cache ways could be allocated on the host?
<br>
<br>
</blockquote>
<br>
Yes, this should be provided by an API as well.
<br>
<br>
<blockquote type="cite"> For other users/orchestrator (nova),
they may need to know if a VM
<br>
can schedule on the host, but the cache ways is not liner, it
may have
<br>
fragment.
<br>
<br>
5, What if other application want to have some shared cache ways
with
<br>
some of the VM?
<br>
Libvirt for now try to read all of the resource group
(instead of
<br>
maintain the consumed cache ways itself), so if another resource
group
<br>
was created under /sys/fs/resctl, and the schemata of it is
"FFFFF",
<br>
then libvirt will report not enough room for new VM. But the
user
<br>
actually want to have another Appliation(e.g. ovs, dpdk pmds)
share
<br>
cache ways with the VM created by libvirt.
<br>
</blockquote>
<br>
Adding support for shared allocations is planned as I said before,
<br>
however this is something that will be needed to be taken care of
<br>
differently anyway. I don't know how specific the use case would
be,
<br>
but let's say you want to have 8 cache ways allocated for the VM,
but
<br>
share only 4 of them with some DPDK PMD. You can't use "shared"
because
<br>
that would just take some 8 bits even when some of them might be
shared
<br>
with the system's default group. Moreover it means that the
allocation
<br>
can be shared with machines ran in the future. So in this case
you need
<br>
to have the 8 bits exclusively allocated and then (only after the
<br>
machine is started) pin the PMD process to those 4 cache ways.
<br>
<br>
For the missing issue from the other email:
<br>
<br>
<blockquote type="cite">If the host enabled CDP, which is to see
the host will report l3 cache type
<br>
code and data. when user don't want code/data cache ways
allocated
<br>
separated, for current implement, it will report not support
`both` type
<br>
l3 cache.
<br>
</blockquote>
<br>
<blockquote type="cite">But we can improve this as make code and
data schemata the same
<br>
e.g, if host enabled CDP, but user request 2 `both` type l3
cache.
<br>
</blockquote>
<br>
<blockquote type="cite">We can write the schemata looks like:
<br>
</blockquote>
<br>
<blockquote type="cite">L3DATA:0=3
<br>
L3CODE:0=3
<br>
</blockquote>
<br>
Yes, that's what we want to achieve, but again, in a future
patchset.
<br>
<br>
Hope that answers your questions. Thanks for trying it out, it is
<br>
really complicated to develop something like this without the
actual
<br>
hardware to test it on.
<br>
</blockquote>
Yep.<br>
<blockquote type="cite" cite="mid:20171215090605.GA9959@wheatley">
<br>
Have a nice day,
<br>
Martin
<br>
</blockquote>
<br>
</body>
</html>