[EnMasse] AddressController performance / ConfigMap lookups

Ulf Lilleengen lulf at redhat.com
Tue Mar 13 10:13:38 UTC 2018

On 03/13/2018 10:53 AM, Lohmann Carsten (INST/ECS4) wrote:
> Hi,
>>> When thinking about a scenario with 10000+ addresses, having that many
>>> ConfigMaps seems .. odd. Kind of going beyond what the ConfigMap
>>> mechanism as Pod configuration data is probably intended for (although
>>> there don't seem to be hard limits in that sense).
>> I'm not fully convinced that this is a problem at 10k, but it depends of course on
>> what other components are running on the same cluster. I think we should
>> benchmark this before drawing any conclusions (or maybe you have some
>> numbers?) I don't know in detail how a configmap maps to etcd, but I think the
>> difference would only be in the amount of data that is written.
> I agree that there should be some benchmarks being done here first. We will do them shortly.
> Do you think it would be better to do these tests with current master (including the "Refactor controllers in standard and address controller" change) ?

I would recommend the current master snapshot: 

>>> Database-persistence could possibly offer better performance and
>>> simplify backup-strategies. >
>> My impression is that etcd is quite performant for this kind of use, but I don't have
>> any numbers to back that up with. Again, should be benchmarked.
>>> Also when updating EnMasse by re-deploying the EnMasse components
>>> (having deleted the K8s-namespace first), it seems easier to have the
>>> addresses in the database untouched by this instead of having to
>>> re-create the addresses/ConfigMaps.
>> One of the goals for EnMasse is to support rolling upgrades of components like
>> the router, so deleting a recreating should not be the long-term strategy IMO. I
>> understand that this is a problem for you at present though.
> Yes, our current mechanism there is for sure not the long-term strategy yet.
>>> I think from an architectural point of view, the question is, whether
>>> the addresses are considered cluster-state information (therefore
>>> belonging in the etcd datastore) or application-specific data.
>>> With the current address ConfigMap content structure modelled after
>>> K8s resources, it's obviously handled more as cluster-state information.
>>>> (In that sense, is it planned to switch addresses to be proper K8s
>>> custom resources instead of ConfigMaps?).
>> That was be the plan, yes, but there are some restrictions typically on OpenShift
>> that prevents custom resource definitions from being deployed without cluster-
>> admin access. Ideally enmasse would support being deployed without cluster-
>> admin for some use cases.
>>> But, I think the addresses could also be viewed as
>>> application-specific data and in that sense better be stored externally.
>>> WDYT? Have you thought about replacing the ConfigMap persistence? Do
>>> you see limitations with the current ConfigMap-based approach thinking
>>> about
>>> 10000+ addresses? >
>>> Or would you see other persistence options?
>> I remember exploring that in the early phases of enmasse just before we
>> introduced the configmap-based configuration, but we didn't explore that on the
>> basis that we believed k8s would meet our demands and didn't want to introduce
>> another stateful component. Maybe the time has come to rethink that.
> I agree that not introducing another stateful component in the standard setup makes sense.
> There may be setups where there is already a database present and it would be good to have the persistence be configurable in order to use that.
> But I agree that first an evaluation of the current setup should be done with benchmark tests.

Yes, I agree that makes sense.

>>>   From an implementation standpoint, having a separate persistence
>>> implementation seems quite straightforward in the address/standard
>>> controller with the AddressApi interface already in place.
>>> Then the agent component would have to be changed as well (maybe
>>> changing it to use the AddressController REST API?).
>> Funny you should mention that. We just made it so it does not use the REST API
>> for the purposes of making it more independent of the address controller. I.e. it
>> would allow you to deploy the standard address space without the address
>> controller if you would so wish.
> Ah, ok. But what could be the motivation, to deploy the standard address space and not use the address controller?

If you do not need multiple address spaces, it would be a way to reduce 
the amount of components you need to deploy and it could give you more 
control over configuration compared to when deploying the full system 
(but also require you to do more of that work yourself) We're not there 
yet, and its not a high priority at present. The introduction of plans 
and the resource definitions gives an increased level of configurability 
also, so I'm not sure if there is a need at present.

>> I'm not opposed to make the persistence configurable, but I think it would be a
>> significant undertaking.
>> Thanks,
>> Ulf
>>> Best regards
>>> Carsten
>>> *Von:*Ulf Lilleengen [mailto:lulf at redhat.com]
>>> *Gesendet:* Montag, 5. März 2018 15:00
>>> *An:* Lohmann Carsten (INST/ECS4) <Carsten.Lohmann at bosch-si.com>
>>> *Cc:* enmasse at redhat.com
>>> *Betreff:* Re: [EnMasse] AddressController performance / ConfigMap
>>> lookups
>>> Hi Carsten,
>>> Yes, the getSchema() will lookup the configmaps every time. I
>>> suspected that we might need to cache it, but decided to see how often
>>> this would actually be used before optimizing it. Sounds like it
>>> should be optimized  :). I'm a bit surprised that it takes this long
>>> to get this information from the kubernetes master though.
>>> Using watchResources instead to cache it sounds like a sensible thing
>>> to do to me.
>>> Best regards,
>>> Ulf
>>> On Mon, Mar 5, 2018 at 2:17 PM, Lohmann Carsten (INST/ECS4)
>>> <Carsten.Lohmann at bosch-si.com <mailto:Carsten.Lohmann at bosch-si.com>>
>> wrote:
>>>      Hi,
>>>      We have noticed some performance issues when using the
>>>      AddressController REST API.
>>>      Here is a log excerpt with added debug output concerning the
>>>      addition of 2 addresses:
>>>      ---------------------
>>>      2018-03-02 16:31:43.281 [vert.x-worker-thread-15] DEBUG
>>>      HttpAddressService:94 - appendAddresses:
>>>      [telemetry/tst_8432b9a6d2194c3f8c6328706eb455a0,
>>>      event/tst_8432b9a6d2194c3f8c6328706eb455a0]
>>>      2018-03-02 16:31:43.290 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapAddressSpaceApi:38 - getAddressSpaceWithName: get() took
>>> 8ms
>>>      2018-03-02 16:31:43.291 [vert.x-worker-thread-15] DEBUG
>>>      AddressApiHelper:47 - verifyAuthorized took 0ms
>>>      2018-03-02 16:31:43.305 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-space-plan,
>>>      resultList.size=1 took 14ms
>>>      2018-03-02 16:31:43.321 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-plan,
>>>      resultList.size=6 took 15ms
>>>      2018-03-02 16:31:43.336 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.351 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-plan,
>>>      resultList.size=6 took 14ms
>>>      2018-03-02 16:31:43.365 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.380 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.397 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 16ms
>>>      2018-03-02 16:31:43.412 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 15ms
>>>      2018-03-02 16:31:43.427 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.442 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.456 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 14ms
>>>      2018-03-02 16:31:43.456 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapSchemaApi:166 - getSchema took 165ms
>>>      2018-03-02 16:31:43.473 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapAddressApi:89 - listAddresses: list() took 16ms
>>>      2018-03-02 16:31:43.494 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapAddressApi:103 - createAddress: create() took 20ms
>>>      2018-03-02 16:31:43.506 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapAddressApi:103 - createAddress: create() took 12ms
>>>      2018-03-02 16:31:43.524 [vert.x-worker-thread-15] DEBUG
>>>      ConfigMapAddressApi:89 - listAddresses: list() took 17ms
>>>      2018-03-02 16:31:43.524 [vert.x-worker-thread-15] DEBUG
>>>      HttpAddressService:48 - appendAddresses
>>>      [telemetry/tst_8432b9a6d2194c3f8c6328706eb455a0,
>>>      event/tst_8432b9a6d2194c3f8c6328706eb455a0] end (result: 56 items) -
>>>      requestProcessing took 242ms
>>>      --------------------
>>>      What becomes obvious here is that the "
>>>      ConfigMapSchemaApi.getSchema" invocation is quite expensive with its
>>>      "listConfigMaps" calls.
>>>      The duration of 160ms is quite typical in our environment.
>>>      We even had times where the API server took longer for the requests
>>>      and where the output looked like this:
>>>      --------------------
>>>      2018-03-02 17:20:07.197 [vert.x-worker-thread-13] DEBUG
>>>      HttpAddressService:94 - appendAddresses:
>>>      [telemetry/tst_a77831ab936849c4b8a1ba8d15d2e018,
>>>      event/tst_a77831ab936849c4b8a1ba8d15d2e018]
>>>      2018-03-02 17:20:07.345 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapAddressSpaceApi:38 - getAddressSpaceWithName: get() took
>>> 147ms
>>>      2018-03-02 17:20:07.345 [vert.x-worker-thread-13] DEBUG
>>>      AddressApiHelper:47 - verifyAuthorized took 0ms
>>>      2018-03-02 17:20:07.444 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-space-plan,
>>>      resultList.size=1 took 98ms
>>>      2018-03-02 17:20:07.591 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-plan,
>>>      resultList.size=6 took 147ms
>>>      2018-03-02 17:20:07.714 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 122ms
>>>      2018-03-02 17:20:07.834 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=address-plan,
>>>      resultList.size=6 took 120ms
>>>      2018-03-02 17:20:07.981 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 146ms
>>>      2018-03-02 17:20:08.131 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 149ms
>>>      2018-03-02 17:20:08.254 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 122ms
>>>      2018-03-02 17:20:08.374 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 120ms
>>>      2018-03-02 17:20:08.494 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 119ms
>>>      2018-03-02 17:20:08.641 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 146ms
>>>      2018-03-02 17:20:08.761 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:55 - listConfigMaps type=resource-definition,
>>>      resultList.size=4 took 120ms
>>>      2018-03-02 17:20:08.761 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapSchemaApi:166 - getSchema took 1416ms
>>>      2018-03-02 17:20:08.913 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapAddressApi:89 - listAddresses: list() took 151ms
>>>      2018-03-02 17:20:09.145 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapAddressApi:103 - createAddress: create() took 231ms
>>>      2018-03-02 17:20:09.265 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapAddressApi:103 - createAddress: create() took 119ms
>>>      2018-03-02 17:20:09.426 [vert.x-worker-thread-13] DEBUG
>>>      ConfigMapAddressApi:89 - listAddresses: list() took 160ms
>>>      2018-03-02 17:20:09.426 [vert.x-worker-thread-13] DEBUG
>>>      HttpAddressService:48 - appendAddresses
>>>      [telemetry/tst_a77831ab936849c4b8a1ba8d15d2e018,
>>>      event/tst_a77831ab936849c4b8a1ba8d15d2e018] end (result: 66 items) -
>>>      requestProcessing took 2229ms
>>>      --------------------
>>>      Possible performance improvements concerning the API server aside,
>>>      there is the question whether there is room to make "getSchema" faster.
>>>      To me it looks like the K8s resources requested there (address space
>>>      plans, address plans, resource definitions) are fairly static and
>>>      could therefore be cached/kept in memory.
>>>      I guess updates on these K8s resources could be handled via
>>>      ConfigMapAddressAPI.watchResources (?).
>>>      WDYT? Would that be feasible?
>>>      Best regards
>>>      *Carsten Lohmann
>>>      *
>>>      (INST/ECS4)
>>>      Bosch Software Innovations GmbH | Ullsteinstr. 128 | 12109 Berlin |
>>>      GERMANY| www.bosch-si.com <http://www.bosch-si.com>
>>>      Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
>>>      Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke;
>>>      Geschäftsführung: Dr. Stefan Ferber, Michael Hahn
>>>      _______________________________________________
>>>      enmasse mailing list
>>>      enmasse at redhat.com <mailto:enmasse at redhat.com>
>>>      https://www.redhat.com/mailman/listinfo/enmasse


More information about the enmasse mailing list