[libvirt][PATCH v4 0/3] introduce 'restrictive' mode in numatune

Zhong, Luyao luyao.zhong at intel.com
Wed Mar 31 06:33:28 UTC 2021



>-----Original Message-----
>From: Martin Kletzander <mkletzan at redhat.com>
>Sent: Wednesday, March 31, 2021 12:21 AM
>To: Zhong, Luyao <luyao.zhong at intel.com>
>Cc: Daniel P. Berrangé <berrange at redhat.com>; libvir-list at redhat.com
>Subject: Re: [libvirt][PATCH v4 0/3] introduce 'restrictive' mode in numatune
>
>On Tue, Mar 30, 2021 at 08:53:19AM +0000, Zhong, Luyao wrote:
>>
>>
>>> -----Original Message-----
>>> From: Martin Kletzander <mkletzan at redhat.com>
>>> Sent: Thursday, March 25, 2021 10:28 PM
>>> To: Daniel P. Berrangé <berrange at redhat.com>
>>> Cc: Zhong, Luyao <luyao.zhong at intel.com>; libvir-list at redhat.com
>>> Subject: Re: [libvirt][PATCH v4 0/3] introduce 'restrictive' mode in
>>> numatune
>>>
>>> On Thu, Mar 25, 2021 at 02:14:47PM +0000, Daniel P. Berrangé wrote:
>>> >On Thu, Mar 25, 2021 at 03:10:56PM +0100, Martin Kletzander wrote:
>>> >> On Thu, Mar 25, 2021 at 09:11:02AM +0000, Zhong, Luyao wrote:
>>> >> >
>>> >> >
>>> >> > > -----Original Message-----
>>> >> > > From: Martin Kletzander <mkletzan at redhat.com>
>>> >> > > Sent: Thursday, March 25, 2021 4:46 AM
>>> >> > > To: Daniel P. Berrangé <berrange at redhat.com>
>>> >> > > Cc: Zhong, Luyao <luyao.zhong at intel.com>;
>>> >> > > libvir-list at redhat.com
>>> >> > > Subject: Re: [libvirt][PATCH v4 0/3] introduce 'restrictive'
>>> >> > > mode in numatune
>>> >> > >
>>> >> > > On Tue, Mar 23, 2021 at 09:48:02AM +0000, Daniel P. Berrangé wrote:
>>> >> > > >On Tue, Mar 23, 2021 at 10:59:02AM +0800, Luyao Zhong wrote:
>>> >> > > >> Before this patch set, numatune only has three memory modes:
>>> >> > > >> static, interleave and prefered. These memory policies are
>>> >> > > >> ultimately set by mbind() system call.
>>> >> > > >>
>>> >> > > >> Memory policy could be 'hard coded' into the kernel, but
>>> >> > > >> none of above policies fit our requirment under this case.
>>> >> > > >> mbind() support default memory policy, but it requires a
>>> >> > > >> NULL nodemask. So obviously setting allowed memory nodes is
>cgroups'
>>> mission under this case.
>>> >> > > >> So we introduce a new option for mode in numatune named
>>> 'restrictive'.
>>> >> > > >>
>>> >> > > >> <numatune>
>>> >> > > >>    <memory mode="restrictive" nodeset="1-4,^3"/>
>>> >> > > >>    <memnode cellid="0" mode="restrictive" nodeset="1"/>
>>> >> > > >>    <memnode cellid="2" mode="restrictive" nodeset="2"/>
>>> >> > > >> </numatune>
>>> >> > > >
>>> >> > > >'restrictive' is rather a wierd name and doesn't really tell
>>> >> > > >me what the memory policy is going to be. As far as I can
>>> >> > > >tell from the patches, it seems this causes us to not set any
>>> >> > > >memory alllocation policy at all. IOW, we're using some
>>> >> > > >undefined host default
>>> policy.
>>> >> > > >
>>> >> > > >Given this I think we should be calling it either "none" or "default"
>>> >> > > >
>>> >> > >
>>> >> > > I was against "default" because having such option possible,
>>> >> > > but the actual default being different sounds stupid.
>>> >> > > Similarly "none" sounds like no restrictions are applied or
>>> >> > > that it is the same as if nothing was specified.  It is funny
>>> >> > > to imagine the situation when I am explaining to someone how to
>achieve this solution:
>>> >> > >
>>> >> > >    "The default is 'strict', you need to explicitly set it to 'default'."
>>> >> > >
>>> >> > > or
>>> >> > >
>>> >> > >    "What setting did you use?"
>>> >> > >    "None"
>>> >> > >    "As in no mode or in mode='none'?"
>>> >> > >
>>> >> > > As I said before, please come up with any name, but not these
>>> >> > > that are IMHO actually more confusing.
>>> >> > >
>>> >> >
>>> >> > Hi Daniel and Martin, thanks for your reply, just as Martin said
>>> >> > current default mode is "strict", so "default" was deprecated at
>>> >> > the beginning when I proposed this change.  And actually we have
>>> >> > cgroups restricting the memory resource so could we call this a
>>> >> > "none" mode? I still don't have a better name. ☹
>>> >> >
>>> >>
>>> >> Me neither as figuring out the names when our names do not
>>> >> precisely map to anything else (since we are using multiple
>>> >> solutions to get as close to the desired result as possible) is
>>> >> difficult because there is no similar pre-existing setting.  And using anything
>like "cgroups-only"
>>> >> would limit us in the future, probably.
>>> >
>>> >What I'm still really missing in this series is a clear statement of
>>> >what the problem with the current modes is, and what this new mode
>>> >provides to solve it. The documentation for the new XML attribute is
>>> >not clear on this and neither are the commit messages. There's a
>>> >pointer to an enourmous mailing list thread, but reading through
>>> >50 messages is a not a viable way to learn the answer.
>>> >
>>> >I'm not even certain that we should be introducing a new mode value
>>> >at all, as opposed to a separate attribute.
>>> >
>>>
>>> Yes, Luyao, could you summarize the reason for the new mode?  I think
>>> that the difference in behaviour between using cgroups and memory
>>> binding as opposed to just using cgroups should be enough for others
>>> to be able to figure out when to use this mode and when not.
>>>
>>Sure.
>>Let me give a concrete use case first. There is a new feature in kernel
>>but not merged yet, we call it memory tiering.
>>(https://lwn.net/Articles/802544/). If memory tiering is enabled on
>>host, DRAM is top tier memory, and PMEM(persistent memory)  is second
>>tier memory, PMEM is shown as numa node without cpu. Pages can be
>>migrated between DRAM node and PMEM node based on DRAM pressure and
>how
>>cold/hot they are. *this memory policy* is implemented in kernel. So we
>>need a default mode here, but from libvirt's perspective, the "defaut"
>>mode is "strict", it's not MPOL_DEFAULT
>>(https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel.
>>Besides, to make memory tiering works well, cgroups setting is
>>necessary, since it restricts that the pages can only be migrated between the
>DRAM and PMEM nodes that we specified (NUMA affinity support).
>>
>>Except for upper use case, we might have some scenarios that only requires
>cgroups restriction.
>>That's why "restrictive" mode is proposed.
>>
>>In a word, if a user requires default mode(MPOL_DEFAULT) and require
>>cgroups to restrict memory allocation, "restrictive" mode will be useful.
>>
>
>Yeah, I also seem to recall something about the fact that just using cgroups with
>multiple nodes in the nodeset makes kernel decide on which node (out of those in
>the restricted set) to allocate on, but specifying "strict" basically allocates it
>sequentially (on the first one until it is full, then on the next one and so on). I do
>not have anything to back this, so do you remember if this was that the case as
>well or does my memory serve me poorly?
>
Yeah, exactly. 😊

cpuset.mems just specify the list of memory nodes on which the processes are allowed to allocate memory.
https://man7.org/linux/man-pages/man7/cpuset.7.html

This link gives a detailed introduction of "strict" mode: https://man7.org/linux/man-pages/man2/mbind.2.html

>>BR,
>>Luyao
>>
>>> >Regards,
>>> >Daniel
>>> >--
>>> >|: https://berrange.com      -o-
>https://www.flickr.com/photos/dberrange :|
>>> >|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>>> >|: https://entangle-photo.org    -o-
>>> https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list