[Freeipa-devel] [SECURITY] Analysis of OTP Attack Vectors

Tue Jul 8 17:08:04 UTC 2014

Dmitri, Ludwig and I had a meeting today to discuss the details of OTP
attack vectors. This is the results of that meeting. We would love
review as far and as wide as possible.

=== Rewind Attack ===

This attack is entirely theoretical. It works based on the premise that
simultaneous writes to multiple servers in a replication set might, via
delayed replication, cause a counter/watermark to go backwards. This
attack is limited by the fact that the attack requires two successful
authentications of the same token to occur on different replicas before
the state from the first authentication is replicated to the second. As
the attacker cannot control replication ordering or speed, the ability
to perform this attack is also non-deterministic.

This attack can be entirely eliminated by ensuring that
counters/watermarks never move backwards via custom replication
strategies. Implementing these would be rather difficult.

Lacking custom replication strategies, the ratio of attack success is
essentially the maximum replication time divided by the total
authentication timing variance. Hence, mitigation of the attack can be
achieved by either ensuring timely replication or by introducing random
delays of sufficient duration to the login process.

This attack can take two forms. In both forms, the attacker has managed
to compromise a block of tokens of size N which, without a rewind
attack, permits at most N authentications. Hence, the purpose of this
attack is to expand an existing compromise into a worse compromise.

In the first form of the attack, X simultaneous logins are executed
against distinct replicas in order to obtain more than N
authentications. If every attack was successful, the maximum number of
authentications would be XN-1. If attacking Kerberos, there is no
advantage to obtaining multiple simultaneous authentications. Hence, the
effective attack rate of a Kerberos attack is (XN-1) / X, or, after
rounding, N. In the case of logging into discrete, non-ticketed
services, such as an HTTP authentication via LDAP, there may be some
value to simultaneous logins, placing the expansion at XN-1. However, in
this case, the authentication timing variance will generally increase,
limiting the effectiveness of the attack.

In the second form of the attack, the attacker will attempt to time his
authentication with the user's authentication. Where the first form of
the attack is "offensive" in attempting to expand the usable size of the
compromised block, the second form is "defensive" in attempting to
prevent the user's legitimate login from shrinking the usable size of
the block. This attack is extremely difficult to perform. Not only is it
extremely difficult to time an authentication with another user on top
of infrastructure timing variances, the usable state of the block is
also generally unknown. This creates a situation where there is an
inverse relationship between the success rate of the used code and the
value of of the resulting success. In other words, the more likely the
attack is to succeed, the less valuable the successful attack is.

In short, we concluded that a rewind attack while theoretically possible
is both extremely difficult to implement and of limited value to an
attacker. Hence, aside from an earnest attempt to ensure timely
replication, we do not plan to implement specific countermeasures.

=== Replay Attack ===

Currently, this attack is somewhat practical and a known limitation of
the FreeIPA TOTP implementation due to a lack of high watermark support.
This attack has similar characteristics to the rewind attack in that it
permits the expansion of a single OTP code into X authentications where
X is the number of nodes in the replication set.

Four options exist for mitigation.

1. Move the counter/watermark to an external solution. This could
implement local storage at the cost of a single point of
failure/bottleneck. It could also implement a more complex locking or
replication solution.

2. Restrict reads/writes of the counter/watermark to a single 389DS
master. In this case, replay would be limited to the case where failover
has occurred before the writes could complete. The main cost of this
option is a performance bottleneck.

3. Add a mechanism for priority replication to 389DS. While this
wouldn't eliminate the replay attack, it could greatly reduce its
probability. This option trades absolute certainty against the replay
attack for authentication throughput.

4. Add a mechanism for synchronous replication to 389DS. This option
would guarantee that any modification involving the counter would be
replicated to all nodes before succeeding. This provides absolute
certainty at an extreme performance penalty.

It is also possible that some research into academic articles may reveal
some better strategies for counter replication.

In order to adequately assess the current situation, some replication
performance testing is required. This would ideally measure the
replication throughput of a single integer value across at least three
nodes under load.

=== Action Items ===

Nathaniel will develop a patch implementing TOTP High Watermark support.
This will only presume the current replication infrastructure. This will
close the current TOTP replay hole. We will address the performance
considerations as a second step.

Ludwig will devise and publish some replication scalability testing
using the above parameters. This should at least give us a good
assessment of where we are in terms of replication performance and the
size replay attack window size.

Anyone familiar with the current state of research regarding counter
replication should feel free to provide additional suggestions.