Why didn't NSA notice when their backdoor stopped working?

The US National Security Agency’s backdoored pseudo random generator, Dual_EC_DRBG, being subverted in Juniper Network’s NetScreen is devices back in the news again. In brief, Bloomberg is reporting that Chinese state-sponsored hackers were responsible for inserting code in Juniper’s ScreenOS operating system which rekeyed an existing backdoor which allowed passive decryption of network traffic and installing a separate backdoor which allowed administrator access to the devices running ScreenOS.

This has raised the question, how long did it take the NSA to notice that their backdoor no longer worked and did they report this to Juniper? I don’t know the answer to those questions, but I think the most plausible explanation given by Tony Arcieri and Matthew Green is that the NSA did notice that their access degraded over time as targets upgraded their ScreenOS to newer versions. Nevertheless, I think there are other possible explanations, not all of which I’ve seen discussed so far.

I’m going to start with a very brief explanation of Dual EC.¹ Then, I’ll give some background on how ScreenOS uses Dual EC. And finally, I’ll give a list of possibilities. For more details on the history of Dual EC, I recommend giving Matt’s Twitter thread a read.

Dual EC background

Dual EC is a “deterministic random bit generator” which means that starting with a small seed value, it can produce a stream of pseudorandom bits. Dual EC is a little unusual in that it uses public key cryptography to generate the output. It has two parameters $P$ and $Q$ which are points on an elliptic curve. $P$ generates a group under the elliptic curve addition law and $Q$ is a member of that group. In order to come up with the point $Q$ , the NSA (almost certainly) picked a large random number $e$ and computed $Q = eP$ . It’s essential for $e$ to remain secret because anyone who knows $e$ and can see output from Dual EC can recover its internal state and then predict all future outputs. For more detail, see our paper on exploiting Dual EC in TLS.

Whenever someone speaks of picking a $Q$ value, this is the process they mean: pick a big random number and multiply $P$ by it.

Dual EC is described as an NSA backdoor because the NSA generated $e$ and thus can learn the generator’s internal state if they can see its output. Why is that a big deal? Well, lots of crypto algorithms depend on secret or unpredictable numbers. For example, if I know what bits your pseudorandom generator (PRG) is going to spit out and you use that PRG to generate a Diffie–Hellman secret, then I know the secret value and can decrypt any network traffic protected by the result of the Diffie–Hellman key exchange.

Dual EC in ScreenOS

That brings us to Dual EC’s use in Juniper’s ScreenOS. As described in Bloomberg’s reporting, the US government insisted that Juniper use Dual EC in ScreenOS. Dual EC is used as part of a cascade of PRGs where Dual EC is supposed to generate the seed for the ANSI X9.31 PRG. The output of X9.31 was supposed to be used for for generating random bits for things like Diffie–Hellman key exchange and nonces. However, this doesn’t happen. When Dual EC was introduced into ScreenOS, a collection of other suspicious changes happened at the same time² which results in X9.31 never being used and the output of Dual EC being used directly and in an exploitable manner. Juniper has not said why any of these changes were made and the recent reporting doesn’t make clear if any of them other than the introduction of Dual EC were requested by the US government.

ScreenOS uses Internet Key Exchange (IKE) to generate ephemeral traffic keys for its VPN. Decrypting VPN traffic depends on the version of IKE used and the mode in which it is used. IKEv2 is always vulnerable to the passive decryption attack. IKEv1 is more complicated.

IKEv1 has four authentication modes, a digital signature mode, a preshared keys mode, and two public key encryption modes. The digital signature mode is always vulnerable to the passive decryption attack. The preshared key mode is vulnerable, but only if the preshared key is known to the attacker.³ The two encryption modes are not vulnerable to this attack.

Finally, the $Q$ value used in ScreenOS is not the value specified by the NSA. Instead, Juniper picked their own value of $Q$ .

Some possible answers

This brings us to the question: Why didn’t the NSA notice that they had lost their backdoor? Here are some possibilities, in no particular order.

The NSA probably did notice that they lost the ability to decrypt some of their targets’ traffic, and likely more over time. It likely depends on how quickly the firmware was updated after the attack.
The NSA never had the capability to decrypt ScreenOS VPN traffic because they didn’t know the secret key $e$ such that $Q = eP$ for Juniper’s choice of $Q$ .
In fact, it’s entirely possible that Juniper selected their $Q$ value is a completely safe way: You generate a 32-byte random number $x$ then you use the curve equation $y^2 = x^3 - 3x + b\pmod p$ (where $b$ is a large integer) to solve for $y$ . If you find a $y$ that works, then $Q = (x, y)$ is your new point. (This is called point decompression and is fast to do.) If you do this, there is some $e$ such that $Q=eP$ , but Juniper wouldn’t know what it is and it’s intractable to compute.
The NSA targets who updated their firmware to a version of ScreenOS with the Chinese (rather than US or Juniper) backdoor were using one of the modes of IKEv1 which defeat the passive decryption attack. There would be no way to detect that $Q$ had changed in this case.
The NSA targets disabled Dual EC entirely using an undocumented ScreenOS configuration command set key one-stage-rng. This actually exposes another security vulnerability that lets the X9.31 PRG be attacked, in some cases. See the end of Section 4 of our paper about the Juniper breach for details.
I think this is highly unlikely as the command is not documented.
No high-value targets were using Juniper hardware with ScreenOS so the NSA simply wasn’t using this backdoor. I have no idea how plausible that is.
The NSA didn’t notice because they couldn’t see the traffic from their targets. I find this unlikely, but I have no actual knowledge.
Successfully carrying out the attack can be tricky depending on the network load. Because Dual EC is so slow, when it was added to ScreenOS, a queue of pre-generated nonce values was added, presumably to not slow down the connection. In fact, multiple queues were added, the number depends on the configuration and version of ScreenOS. See Section 5 of our paper. A timer fires to generate a new nonce or a Diffie–Hellman key pair every second. With a large number of connections, you can end up with nonces generated after DH secret keys.
Depending on how often traffic key rekeying occurs, and depending on how that rekeying is configured, nonces can be consumed more quickly than DH key pairs leading to connections using fresh nonces and very old key pairs. This all makes the attack more difficult.

I’m sure there are other possibilities I haven’t thought of. I’d love to hear about any I’ve missed.

Dual_EC_DRBG is a pain to type, or say for that matter. I’m just going to call it Dual EC. ↩
A bunch of changes happened all at once: Dual EC was introduced; reseeding changed from every 10,000 calls to the generator to every call; a loop variable was changed from local to global which caused Dual EC output to be used instead of X9.31 output; nonces were changed from 20 bytes to 32 bytes (the perfect size for a Dual EC backdoor); and a nonce pregeneration queue was added which makes it likely that nonces will be generated before Diffie–Hellman secret keys. ↩
An attacker who can capture the network traffic and perform an offline attack on the preshared key. ↩

Why didn’t NSA notice when their backdoor stopped working?

Dual EC background

Dual EC in ScreenOS

Some possible answers