[EnMasse] AddressController performance / ConfigMap lookups

Tue Mar 13 09:16:50 UTC 2018

On 03/12/2018 03:46 PM, Gordon Sim wrote:
> On 12/03/18 13:48, Lohmann Carsten (INST/ECS4) wrote:
>> Hi Ulf,
>>
>> in general, we were thinking whether it wouldn't be better to use a 
>> database (e.g. MongoDB) instead of ConfigMaps for address-persistence.
>>
>> When thinking about a scenario with 10000+ addresses, having that many 
>> ConfigMaps seems .. odd. Kind of going beyond what the ConfigMap 
>> mechanism as Pod configuration data is probably intended for (although 
>> there don't seem to be hard limits in that sense).
>>
>> Database-persistence could possibly offer better performance and 
>> simplify backup-strategies.
>>
>> Also when updating EnMasse by re-deploying the EnMasse components 
>> (having deleted the K8s-namespace first), it seems easier to have the 
>> addresses in the database untouched by this instead of having to 
>> re-create the addresses/ConfigMaps.
>>
>> I think from an architectural point of view, the question is, whether 
>> the addresses are considered cluster-state information (therefore 
>> belonging in the etcd datastore) or application-specific data.
>>
>> With the current address ConfigMap content structure modelled after 
>> K8s resources, it's obviously handled more as cluster-state information.
>>
>> (In that sense, is it planned to switch addresses to be proper K8s 
>> custom resources instead of ConfigMaps?).
>>
>> But, I think the addresses could also be viewed as 
>> application-specific data and in that sense better be stored externally.
>>
>> WDYT? Have you thought about replacing the ConfigMap persistence? Do 
>> you see limitations with the current ConfigMap-based approach thinking 
>> about 10000+ addresses?
>>
>> Or would you see other persistence options?
> 
> One other option would be to allow configmaps to contain address-lists 
> rather than individual addresses. (E.g. 10+ configmaps of 1000 addresses 
> seems a lot more manageable than 10000+ configmaps).
> 

If one address status needs to be changed, the controller then have to 
read/write 1000 addresses worth of data, so I wouldn't say that it is 
more manageable. If the problem is that the kubernetes master is 
overloaded by enmasse use, this could even make it worse.

It would also make synchronization a lot more  coarse grained. If we at 
some point need multiple controllers to read/write addresses, the 
optimistic locking scheme done at the configmap resource could become a 
bottleneck.

Mapping 1 address to one resource/configmap, which then maps to 1 etcd 
key (which corresponds to 1 database table row if we would use a 
database). Intuitively I think that is a good model.

> My instinct is that it is the 'operator' components (e.g. agent ensuring 
> that the addresses configured match those defined) that are the 
> bottleneck at present though.
> 

I agree (standard-controller is probably also a bottleneck). I think we 
would see an improvement if we moved to custom resources, as we would 
remove the extra encode/decode of the data in the configmap.

>>  From an implementation standpoint, having a separate persistence 
>> implementation seems quite straightforward in the address/standard 
>> controller with the AddressApi interface already in place.
>>
>> Then the agent component would have to be changed as well (maybe 
>> changing it to use the AddressController REST API?).
> 
> _______________________________________________
> enmasse mailing list
> enmasse at redhat.com
> https://www.redhat.com/mailman/listinfo/enmasse

-- 
Ulf