[Freeipa-devel] DNSSEC key metadata handling

Thu Jun 12 15:08:32 UTC 2014

Hello list,

I have realized that we need to store certain DNSSEC metadata for every 
(zone,key,replica) triplet. It is necessary to handle splits in replication 
topology.

DNSSEC key can be in one of following states:
- key created
- published but not used for signing
- published and used for signing
- published and not used for signing but old signatures exist
- unpublished

Every state transition has to be postponed until relevant TTL expires, and of 
course, we need to consider TTL on all replicas.

Example of a problem
====================
DNS TTL=10 units
Key life time=100 units

Replica=1 Key=1 Time=0   Published
Replica=2 Key=1 Time=0   Published
Replica=1 Key=1 Time=10  Published, signing
Replica=2 Key=1 Time=10  Published, signing
Replica=1 Key=2 Time=90  Generated, published, not signing yet
Replica=2 Key=2 Time=90  <replication is broken: key=2 is not on replica=2>
Replica=1 Key=1 Time=100
^^^ From time=100, all new signatures should be created with key=2 but that 
can break DNSSEC validation because key=2 is not available on replica=2.

Proposal 1
==========
- Store state and timestamps for (zone,key,replica) triplet
- Do state transition only if all triplets (zone,key,?) indicate that all 
replicas reached desired state so the transition is safe.
- This implicitly means that no transition will happen if one or more replicas 
is down. This is necessary otherwise DNSSEC validation can break mysteriously 
when keys got out of sync.

dn: cn=<some-replica-id>,ipk11Label=zone1_keyid123_private, cn=keys, cn=sec, 
cn=dns, dc=example
idnssecKeyCreated: <timestamp>
idnssecKeyPublished: <timestamp>
idnssecKeyActivated: <timestamp>
idnssecKeyInactivated: <timestamp>
idnssecKeyDeleted: <timestamp>

Effectively, state machine will be controlled by max(attribute) over all 
replicas (for given key).

Replication traffic estimation
------------------------------
Number of writes to LDAP = (State transitions per key) * (Keys per zone) * 
(Number of zones) * (Number of replicas)

The obvious problem is that amount of traffic grows linearly with all variables.

State transitions per key: 5
Keys per zone: 10
Zones: 100
Replicas: 30
Key life time: 1 month

5*10*100*30 / 1 month
i.e.
150 000 writes / 1 month
i.e.
~ 1 write / 17 seconds

It seems like that this generates a lot of replication traffic. (Please note 
that number of replicas/zones/keys per zone is also quite high but it will be 
hard to improve scalability later if we decide to use LDAP in this way.)

And ... our favorite question :-)
What should I use for cn=<some-replica-id> ? I would propose use either FQDN 
of replica or value returned by LDAP whoami.

Proposal 2
==========
Another possibility is to make timestamp attributes non-replicated and 
(somehow) use DNS queries to determine if the desired key is available on all 
other replicas before any state transition is allowed.

That would require:
- Full-mesh replica-to-replica connectivity
- Similar amount of DNS query/response round trips (multiply <small int>)
- Security is questionable: (Maybe, I'm not sure!) Attacker could spoof DNS 
answers and break key rotation mechanism during bootstrap (when no keys are 
available) and maybe even later.

It is easy to detect that key is:
- published
- unpublished
- used for signing

The problem is that there is no reliable way to detect that is a key was 
created/is available on replica but is not published yet and similarly that 
the key is published but not used for signing anymore (it would require to 
check all names published in the zone).

I will think about it a bit more but I would like to know if full-mesh 
replica-to-replica connectivity is acceptable requirement or not.

Almost-joke-proposal
====================
The other alternative is to invent other mechanism for synchronous 
replica-to-replica communication...

-- 
Petr^2 Spacek