[Freeipa-devel] DNSSEC support design considerations: migration to RBTDB

Thu Jun 27 16:23:19 UTC 2013

On 21.6.2013 16:19, Simo Sorce wrote:
> On Thu, 2013-06-20 at 14:30 +0200, Petr Spacek wrote:
>> On 23.5.2013 16:32, Simo Sorce wrote:
>>> On Thu, 2013-05-23 at 14:35 +0200, Petr Spacek wrote:
>>>> It looks that we agree on nearly all points (I apologize if
>>>> overlooked
>>>> something). I will prepare a design document for transition to RBTDB
>>>> and then
>>>> another design document for DNSSEC implementation.
>>
>> The current version of the design is available at:
>> https://fedorahosted.org/bind-dyndb-ldap/wiki/BIND9/Design/RBTDB
>
> Great write-up, thanks.
>
>> There are several questions inside (search for text "Question", it should find
>> all of them). I would like to get your opinion about the problems.
>>
>> Note that 389 DS team decided to implement RFC 4533 (syncrepl), so persistent
>> search is definitely obsolete and we can do synchronization in some clever way.
>
>
> Answering inline here after quoting the questions for the doc:
>
>          > Periodical re-synchronization
>          >
>          > Questions
>
>                * Do we still need periodical re-synchronization if 389 DS
>                  team implements RFC 4533 (syncrepl)? It wasn't
>                  considered in the initial design.
>
> We probably do. We have to be especially careful of the case when a
> replica is re-initialized. We should either automatically detect that
> this is happening or change ipa-replica-manage to kick named some how.
>
> We also need a tool or maybe a special attribute in LDAP that is
> monitored so that we can tell  bind-dyndb-ldap to do a full rebuild of
> the cache on demand. This way admins can force a rebuild if they end up
> noticing something wrong.
Is it acceptable to let admin to delete files & restart named manually? I 
don't wont to overcomplicate things at the beginning ...

>                * What about dynamic updates during re-synchronization?
>
> Should we return a temporary error ? Or maybe just queue up the change
> and apply it right after the resync operation has finished ?
Unfortunately, the only reasonable error code is SERVFAIL. It is completely up 
to client if it tries to do update again or not.

I personally don't like queuing of updates because it confuses clients: Update 
is accepted by server but the client still can see an old value (for limited 
period of time).

>                * How to get sorted list of entries from LDAP? Use LDAP
>                  server-side sorting? Do we have necessary indices?
>
> We can do client side sorting as well I guess, I do not have a strong
> opinion here. The main reason why you need ordering is to detect delete
> records right ?
Exactly. I realized that server-side sorting doesn't make sense because we 
plan to use syncrepl, so there is nothing to sort - only the flow of 
incremental updates.

 > Is thee a way to mark rdtdb records as updated instead
> (with a generation number) and then do a second pass on the rbtdb tree
> and remove any record that was not updated with the generation number ?
There is no 'generation' number, but we can extend the auxiliary database 
(i.e. database with UUID=>DNS name mapping) with generation number. We will 
get UUID along with each update from LDAP, so we can simply use UUID for 
database lookup.

Then we can go though the UUID database and delete all records which don't 
have generation == expected_value.

> This would also allow us to keep accepting dynamic updates by simply
> marking records as generation+1 so that the resync will not overwrite
> records that are updated during the resync phase.
I agree. The simplest variant can solve the basic case where 1 update was 
received during re-synchronization.

Proposed (simple) solution:
1) At the beginning of re-synchronization, set curr_gen = prev_gen+1
2) For each entry in LDAP do (via syncrepl):
- Only if entry['gen'] <  curr_gen:
--  Overwrite data in local RBTDB with data from LDAP
--  Overwrite entry['gen'] = curr_gen
- Else: Do nothing

In parallel:
1) Update request received from a client
2) Write new data to LDAP (syncrepl should cope with this)
3) Read UUID from LDAP (via RFC 4527 controls)
4) Write curr_gen to UUID database
5) Write data to local RBTDB
6) Reply 'update accepted' to the client

Crash at any time should not hurt: Curr_gen will be incremented on restart and 
re-sychronization will be restarted.

The worst case is that update will be stored in LDAP but client will not get 
reply because of crash (i.e. client times out).

There is a drawback: Two or more successive updates to a single entry can 
create race condition, as described at 
https://fedorahosted.org/bind-dyndb-ldap/wiki/BIND9/Design/RBTDB#Raceconditions1 .

The reason is that generation number is not incremented each time, but only 
overwritten with current global value (i.e. old + 1).

I don't like the other option with incrementing generation number. It could 
create nasty corner cases during re-synchronization and handling updates made 
directly in LDAP/by other DNS server.

It is not nice, but I think that we can live with it. The important fact is 
that consistency will be (eventually) re-established.

>          > (Filesystem) cache maintenance
>
>          > Questions: How often should we save the cache from operating
>          memory to disk?
>
> Prerequisite to be able to evaluate this question. How expensive is it
> to save the cache ?
My test zone contains 65535 AAAA records, 255 A records, 1 SOA + 1 NS record.

Benchmark results:
zone dump   < 0.5 s (to text file)
zone load   < 1 s (from text file)
zone delete < 9 s (LOL. This is caused by implementation details of RBTDB.)

LDAP search on the whole sub-tree: < 15 s
Load time for bind-dyndb-ldap 3.x: < 120 s

 > Is DNS responsive during the save or does the
> operation block updates or other functionality ?
AFAIK it should not affect anything. Internal transaction mechanism should 
handle all these situations and allow queries/updates to proceed.

>                * On shutdown only?
>
> NACK, you are left with very stale data on crashes.
>
>                * On start-up (after initial synchronization) and on
>                  shutdown?
>
> It makes sense to dump right after a big synchronization if it doesn't
> add substantial operational issues. Otherwise maybe a short interval
> after synchronization.
>
>                * Periodically? How often? At the end of periodical
>                  re-synchronization?
>
> Periodically is probably a good idea, if I understand it correctly it
> means that it will make it possible to substantially reduce the load on
> startup as we will have less data to fetch from a syncrepl requiest.
We probably misunderstood each other. I thought that re-synchronization will 
trigger full re-load from LDAP, so the whole sub-tree will be transferred on 
each re-synchronization. (I.e. syncrepl will be started again without the 
'cookie'.)

For example:
time|event
0:00 BIND start, changes from the last known state requested
0:02 changes were applied to local copy - consistency should be restored
0:05 incremental update from LDAP came in
0:55 DNS dynamic update came in, local copy & LDAP were updated
0:55 incremental update from LDAP came in (i.e. the update from previous line)
1:05 incremental update from LDAP came in
4:05 incremental update from LDAP came in
8:00 full reload is started (by timer)
8:05 full reload is finished (all potential inconsistencies were corrected)
9:35 incremental update from LDAP came in
...

It is pretty demanding game. That is the reason why I asked if we want to do 
re-synchronizations automatically...

Originally, I planed to write a script which would compare data in LDAP with 
zone file on disk. This script could be used for debugging & automated 
testing, so we can assess if the code behaves correctly and decide if we want 
to implement automatic re-synchronization when necessary.

In all cases, the admin can simply delete files on disk and restart BIND - 
everything will be downloaded from LDAP again.

>                * Each N updates?
>
> I prefer a combination of each N updates but with time limits to avoid
> doing it too often.
> Ie something like every 1000 changes but not more often than every 30
> minutes and not less often than 8 hours. (Numbers completely made up and
> need to be tuned based on the answer about the prerequisites question
> above).
Sounds reasonable.

>                * If N % of the database was changed? (pspacek's favorite)
>
> The problem with using % database is that for very small zones you risk
> getting stuff saved too often, as changing a few records quickly makes
> the % big compared to the zone size. For example a zone with 50 records
> has a 10% change after just 5 records are changed. Conversely a big zone
> requires a huge amount of changes before the % of changes builds up
> leading potentially to dumping the database too infrequently. Example,
> zone with 100000 records, means you have to get 10000 changes before you
> come to the 10% mark. If dyndns updates are disabled this means the zone
> may never get saved for weeks or months.
> A small zone will also syncrepl quickly so it would be useless to save
> it often while a big zone is better if it is up to date on disk so the
> syncrepl operation will cost less on startup.
>
> Finally N % is also hard to compute. What do you consider into it ?
> Only total number of record changed ? Or do you factor in also if the
> same record is changed multiple times ?
> Consider fringe cases, zone with 1000 entries where only 1 entry is
> changed 2000 times in a short period (malfunctioning client (or attack)
> sending lots of updates for their record.

I will add another option:
* After each re-synchronization (including start-up) + on shutdown.

This is my favourite, but it is dependent on re-synchronization intervals. It 
could be combined with 'each N updates + time limits' described above.

> Additional questions:
>
> I see you mention:
> "Cache non-existing records, i.e. do not repeat LDAP search for each
> query"
>
> I assume this is fine and we rely on syncrepl to give us an update and
> override the negative cache if the record that has been negatively
> cached suddenly appears via replication through another master, right ?
Yes. The point is that there will not be any 'cache', but authoritative copy 
of the DNS sub-tree in LDAP. Hit or miss in the 'local copy' will be 
authoritative.

> If we rely on syncrepl, are we going to ever make direct LDAP searches
> at all ? Or do we rely fully on having it send us any changes and
> therefore we always reply directly from the rbtdb database ?
Basically yes, we don't need to do any search. We will use separate 
connections only LDAP modifications (DNS dynamic updates).

The only 'search-like' operation (except syncrepl) will be Read Entry Controls 
after modification (RFC 4527). This allows us to read UUID of newly created 
entry in LDAP without an additional search.

-- 
Petr^2 Spacek