[Freeipa-devel] Locations design v2: LDAP schema & user interface

Tue Feb 23 17:59:23 UTC 2016

On 23.2.2016 18:14, Simo Sorce wrote:
>> > Petr Vobornik mentioned an important question:
>> > Should we care about non-IPA services?
>> > 
>> > IMHO it is a valid point. It complicates things a lot as soon as we start
>> > introducing 'locations per service'. It is certainly doable but I would like
>> > to avoid it.
>> > 
>> > It seems easy enough to support custom services as long as there is only one
>> > set of locations (which match IPA locations). It would be a management
>> > nightmare to support N parallel locations for distinct sets of services.
>> > 
>> > As far as I can tell, AD can live with only 1 set of locations and that sounds
>> > reasonable thing to support in the IPA management interface to me.
> I think one set of Locations is fine, but we need to be able to assign
> services to locations independently from "servers" in some cases, I
> think.
> 
> Mostly because a server can have service 1 and 3 but not service 2 and
> another server can have service 2 and 3 but not 1.

Hmm, I do not follow. Where is the problem? You can assign both servers to the
location and get all three services available, the SRV records from all
servers will just combine.

We should have checks in IPA so it will not allow you to create a Frankenstein
IPA server which is missing LDAP or so but this will be needed only when we
decide to containerize - or when we allow to select individual services
instead of servers :-)

For custom services, you are on your own. Still, I would say that assigning
services (instead of servers) to location is making it *more* error prone as
there is bigger chance of omitting something (like selecting only service 1
from server A) and not the other way around.

What did I miss?

[...]

>>>> > >> Priority groups are harder because they express metric based on:
>>>> > >> * communication costs,
>>>> > >> * fail-over requirements,
>>>> > >> * other political requirements in given deployment.
>>>> > >> These are hard things to see from layer 7.
>>>> > >>
>>>> > >> Theoretically we can provide ipa-advise plugin to generate some initial set of
>>>> > >> groups but this is going to be complicated and error prone.
>>>> > >>
>>>> > >> E.g. we can use ICMP ping or LDAP base DN search timings and use some
>>>> > >> clustering algorithm to create priority groups using measured values.
>>>> > >> This could work if we use some smart-enough clustering algorithm (= AI
>>>> > >> library). And of course, we would have to do measurements from at least one
>>>> > >> server in each location to properly define groups for each location ...
>>>> > >>
>>>> > >> It is not that easy as it might seem and I do not see an easy solution.
>>>> > >>
>>>> > >>
>>>> > >> Maybe we should take evolutional approach:
>>>> > >> Implement this 'expert' UI which exposes groups & weights to the user first.
>>>> > >> (It will be necessary for special cases anyway.) When this is done, we can
>>>> > >> play with it, do some usability testing (we can ask RH IT to see if it makes
>>>> > >> sense to them, for example.)
>>>> > >>
>>>> > >> Later we can extend this with a 'simple' variant of UI based on feedback or
>>>> > >> add the generator). This does not even need to happen in the same release.
>>>> > >>
>>>> > >> IMHO it would be better to start with something and refine it later on because
>>>> > >> right now we are just hand-waving and have no idea what users actually do and
>>>> > >> want.
>>> > > 
>>> > > As long as we establish a proper CLI I am ok with implementing a very
>>> > > bare bone UI first and improving it only later.
>>> > > 
>>> > > Btw we probably want to have this information reported by the topology
>>> > > view, and used to automatically group servers there based on location,
>>> > > so I CCed Petr to see if there is anything that would make that job
>>> > > easier/harder.
>> > 
>> > 
>> > We were kicking ideas around the drawing board in the Brno office. Finally,
>> > after many iterations we arrived to this:
>> > 
>> > Wouldn't it be easier to implement concept of sites and links between sites at
>> > the same time? (In the AD spirit.)
>> > 
>> > If we knew the locations/sites and links between them, we could compute
>> > priority groups etc. algorithmicaly. Then only remaining thing is weight,
>> > which can have default and admin do not have to touch it if not necessary.
> You would need to add weights to links, because just the fact there is a
> link tells you nothing about how big the link is between 2 locations, it

Oh, sure, I was thinking about link metric implicitly :-)

> also tells you nothing abut the number of clients in a location which
> may influence how you want to distribute them around.

Do you have an example in mind? It sounds weird to me that you want to
distribute clients outside of local site. If I understand you correctly it
means that the local site is not able to handle the load by itself.

In this case you can either assign 'remote' servers to the location as a
workaround (as IPA does not care where the server actually is), or define link
cost = 0 which could threat two locations as equal.

In any case such setup does not have sufficient power to handle clients
locally so there is nothing to optimize. In other words splitting this setup
into locations does not bring you anything because it is simply overloaded.

(Just keep in mind that we are not building map of a real network but vastly
simplified view usable for IPA servers and clients.)

> I think we need to let the admin decide for now (as you proposed
> earlier) and get fancier slowly by thinking carefully about what kind of
> information we can use to do any automatic priority/weighting
> computation.
> 
>> > Even better, this can be later used to generate optimal replication topology
>> > (possibly as recommendation).
> Maybe, but see above, the problem is we do not have metrics to determine
> what is the real weight of links in a topology, we need to develop these
> metrics first to be able to gt there. Let's start smaller.

Yes, we have to be careful. On the other hand, I do not believe that there is
a way how to universally and automatically determine the metric. IMHO we will
end up with 'administrative' metric as most of the IP routing protocols.

As far as I remember from my network engineering studies routing is more about
politics than anything else. There is ton of prior art about routing and this
can (and should!) be re-used because we are talking about routing clients
requests to servers.

>> > As far as I can see in AD 2012 they do this:
>> > * define which servers belong to which sites
>> > * define site-to-site communication costs
> The first is easy, the second is not.

Unless it is an administrative metric, which is I believe the only feasible
approach anyway.

1. The metric can have a default (pick your random number ;-) for links
without explicit configuration.
2. We can use default all-to-all links with default metric when there is no
explicit configuration.

This should just work and does not add any administrative overhead for the
simplest case:
"Prefer local servers. If all of local servers fail, pick some other server
randomly".

>> > If we had this information we could take suitable graph algorithm generate
>> > priority groups without admin's intervention.
>> > 
>> > It seems that it would make the user interface easier to use and potentially
>> > even simplify the schema.
>> > 
>> > IMHO it is worth exploring because especially with large topologies manual
>> > priority (and later replication) management will become burden.
>> > 
>> > 
>> > What do you think?
> I think this has been my long term secret plan, now you spoiled it!! :-)
> 
> More seriously I think it is a great idea, but too premature to get all
> the way there now. We need to build schema and CLI that will allow us to
> get there without having to completely change interfaces if at all
> possible or minimizing any disruption in the tools.

Actually the backwards compatibility is the main worry which led to this idea
with links.

If we release first version of locations with custom priorities etc. we will
have support the schema (which will be different) and API (which will be later
unnecessary) forever.

If we skip this intermediate phase with hand-made configuration we can save
all the headache with upgrades to more automatic solution later on.

Maybe we should invert the order:
Start with locations + links with administrative metric and add hand-tweaking
capabilities later (if necessary).

IMHO locations + links with administrative metric will be easier to implement
than the first version.

Just thinking aloud ...

-- 
Petr^2 Spacek