NSA's "multi communication transactions" claims are suspicious

During a conference call between the Office of the Director of National Intelligence (ODNI), the NSA, and reporters regarding the recently declassified FISA Court (FISC) order, the Obama Administration explained how it was over collecting data. The problem, they claim, has to do with what the FISC opinion called MCTs or “multi communication transactions.” They give an example:

One example of this is if you have a webmail email account, like Gmail or Hotmail or something like that, you know that when you go and you open up your email program, you will get a screenshot of some number of emails that are sitting in your inbox. In the case of my server, what I get is the date of the email, the sender, the subject line, and the size of the email message. But I may get 15 of them at one time.
Those are all transmitted across the Internet as one communication, even though there are 15 separate emails mentioned in them. And for technological reasons, NSA was not capable of breaking those down into their — and still is not capable — of breaking those down into their individual components.

I believe this is merely an analogy: As this collection is their “upstream collection” and not the collection directly from service providers, I doubt they’re capturing and decrypting HTTPS traffic. Instead, it’s far more likely that they’re capturing unencrypted Simple Mail Transfer Protocol (SMTP) traffic. As its name suggests, SMTP is used to transfer email. For example, if I have a Gmail account and I send mail to someone with a Yahoo account, the Gmail server is going to make a connection to the Yahoo server and transfer my email using SMTP.

As you might imagine, there are many people sending messages between Gmail accounts and Yahoo accounts and so it seems wasteful to create one connection per email sent. Fortunately, SMTP has the ability to send multiple messages at the same time. I believe this to be the MCT that the FISC opinion is referring to.

So let’s take a look at what such a MCT could look like. The box below shows my sending two emails to myself in the same connection. This is the raw SMTP communication. Following the example shown on Wikipedia, I have prefixed all of the lines I wrote with a C: (for client) and all of the SMTP server’s responses with a S: (for server). The very first line is the server saying hello after I connected.

S: 220 blaze.cs.jhu.edu ESMTP Sendmail 8.14.4/8.14.4; Thu, 22 Aug 2013 10:38:24 -0400
C: HELO example.com
S: 250 blaze.cs.jhu.edu Hello [132.162.127.239], pleased to meet you
C: MAIL FROM: s@pahtak.org
S: 250 2.1.0 s@pahtak.org... Sender ok
C: RCPT TO: s@cs.jhu.edu
S: 250 2.1.5 s@cs.jhu.edu... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: From: "Stephen Checkoway" <s@pahtak.org>
C: To: "Stephen Checkoway" <s@cs.jhu.edu>
C: Date: Thu, 22 August 2013 10:39:00 -0400
C: Subject: This is the first message's subject line
C: 
C: This is the start of the mail body. The NSA claims it doesn't get to see
C: this part. They claim they can only see the part before the blank line.
C: 
C: Let's end this message by putting a period on the line by itself.
C: .
S: 250 2.0.0 r7MEcOnR016623 Message accepted for delivery
C: RSET
S: 250 2.0.0 Reset state
C: MAIL FROM: s@pahtak.org
S: 250 2.1.0 s@pahtak.org... Sender ok
C: RCPT TO: s@cs.jhu.edu
S: 250 2.1.5 s@cs.jhu.edu... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: From: "Stephen Checkoway" <s@pahtak.org>
C: To: "Stephen Checkoway" <s@cs.jhu.edu>
C: Date: Thu, 22 August 2013 10:42:00 -0400
C: Subject: This is the second message's subject line
C: 
C: This is the start of the second message's body. Again, the NSA claims it
C: doesn't get to see this part of the message.
C: .
S: 250 2.0.0 r7MEcOnT016623 Message accepted for delivery
C: QUIT
S: 221 2.0.0 blaze.cs.jhu.edu closing connection

This transcript can be broken down into several components.

First, I send a hello to the server. At this point, the server is waiting for me to send it commands. The MAIL FROM command tells the server who the message is from, the RCPT TO command tells the server that I’d like to send the message to the given recipient. (Note that Wikipedia’s example sends a message to two different recipients by giving two RCPT TO commands.) Next, the DATA command informs the server that I’m about to send it the contents of the email.

The contents of the email consists of two discrete portions: the headers and the body. These are separated by a single blank line with the headers before the blank line and the body afterward. The body continues until the server sees a line containing a single period.

Once the message has been sent, the server responds with success and waits for more commands. The next command, RSET tells the server to reset the connection state so that I can send it more messages.

The second message is similar to the first. It starts with the “envelope” data given by MAIL FROM and RCPT TO. Then it has DATA followed by the headers, a blank line, the body, and a period.

Finally, QUIT terminates the connection.

So why am I skeptical of NSA’s claims that it cannot separate the MCTs into separate messages? First, we don’t know if the NSA gets to see both sides of the conversation or if it only gets to see a single side, but in a sense, it doesn’t matter. The server only sends responses, but they’re really not very interesting. So let’s assume that the NSA only sees the data the client sends to the server.

For the two message MCT above, the NSA gets to see:

The envelope FROM and TO for the first message;
The headers for the first message;
The body of the first message;
The envelope FROM and TO for the second message;
The headers for the second message; and
The body of the second message.

According to the administration’s example, it is only looking at the “metadata,” that is, the headers of the communications. This means that the NSA has the technical ability to separate the headers from the bodies but they claim they cannot separate the headers from each other.

This is shockingly unrealistic. Even the notion that they don’t see the body of emails is unrealistic. The only differentiation between the two is a blank line. Even if they were to not store the bodies of emails, they must be parsing them to look for the period so that they can begin collecting metadata again.

As footnote 14 of the FISC order linked above makes clear, the government has been making “substantial misrepresentation[s] regarding the scope” of this and two other “major collection program[s].” Without more information, I cannot say for sure, but I suspect that the government is continuing to make substantial misrepresentations to the public as well.

NSA’s “multi communication transactions” claims are suspicious