You think you have a problem downloading streams of Game of Thrones? Imagine the bandwidth pain of receiving audio from thousands of phone calls received through recording gear in a faraway country on a DSL connection.
The National Security Agency documents were released this week by Them The Washington Post gives an overview of an NSA program that allows the agency to capture the audio content of nearly every phone call in an unnamed country and conduct searches against the stored calls’ metadata to find and listen to the communication for a month after they happened. The show is the latest revelation of how the NSA has used big data technology to make its foreign surveillance more and more manageable.
Just as the NSA and GCHQ used Xkeyscore to make it possible to search through the streams of Internet traffic collected by its Turmoil monitoring systems scattered around the world, a program called Retrospective (or Retro) allows the analyst to find out through enough phone calls. 30 days old based on call metadata. Originally developed for the NSA’s global Mystic phone monitoring effort as a “one-off” capability, Retro can be used in a number of other countries, intercepting unsuspecting calls as well as those that have nothing to do with the the NSA’s foreign intelligence target.
Therefore, perhaps that realization can be considered our screening to semantics. In the NSA’s opinion, it’s not “surveillance” until someone eavesdrops. And because many of the calls retrieved by Retrospective are removed from its “cache” after a month without inquiry, the NSA can argue that the calls have never been noticed. .
Press down
The source of all the calls in an unnamed country is a “signals intelligence asset” the NSA refers to as Scalawag. According to an NSA pamphlet published by the Post, Scalawag has “long since reached the point where it’s collecting and sending home far more than the bandwidth can handle.” That’s because the NSA sends every call picked up by Scalawag back to the US for processing, regardless of its priority—even though the calls have to be kept locally first.
The massive accumulation slows down the retrieval and processing of the most critical intelligence data from Scalawag, making it extremely difficult for the NSA to conduct timely collection and analysis. In the summer of 2011, the brief says, the NSA’s Special Resource Operations Division (a division within the agency that oversees surveillance conducted by “partnerships”) took steps to “reduce pressure on bandwidth and reduce latency of (high priority) data.”
First, the SSO went through “tasks” against Scalawag and the house that was cleaned, deleting monitoring tasks that brought back data that the analysts had not touched. Other ongoing monitoring tasks are prioritized based on how often the returned data is used. But the volume of data sent back to the NSA in the US continues to grow, and the breathing room afforded by those volumes is quickly disappearing.
In December of 2011, the SSO specifically banned low-priority claims against Scalawag. But to really take a bite out of the bandwidth crunch, SSO also moved to improve the discovery and processing of audio data closer to the source, with the “recovery tool” Retrieval.
Just go out and touch someone
Based on what we already know about the NSA’s approach to remote data storage and the contents of the brief, here’s our best guess at how Retrospective and Mystic work:
Being able to receive the audio content of almost every phone call within a country, regardless of its size, means having unlimited access to its communications infrastructure. Either the targeted country’s communications system is too centralized, or the NSA needs to install surveillance gear in every telephone exchange in the country. In theory, a Repository “back-end” could sit in each of these exchanges, processing data to reduce the amount that needs to be dumped back into the central database.
The digitally collected voice data and related call metadata are stored and indexed, possibly in a “big data” data store such as Accumulo, a “NoSQL” database developed at the NSA and contributed to the Apache Foundation as open source in 2011. As with Xkeyscore, Backup Recovery can then perform various indexing and processing operations on the recorded data—eliminating work that would need to be done later by an analyst to compile through soak. It is possible that tasks such as word-to-word processing can be performed on calls for indexing as well as by a software agent associated with the data store.
Back at the NSA, an analyst can launch a search against calls in one of the Recovery departments using the Universal Task Tool (UTT), a front-end tool that can use a variety of identifiers: phone number, location , the customer’s name, the time of the call, or practically anything else that can be derived from the call’s metadata and pre-analytics to determine its content. Search parameters are then sent from UTT to the Retrospective server.
Each search that UTT sends to Retrospective is turned into an “official” task — a software robot that constantly checks the indexes of the call archive for matching calls. Since the archive holds up to 30 days’ worth of calls, you can effectively prevent callbacks and immediately initiate callbacks that have already occurred. As new calls are recorded, the staff system will pick them up in near real time, and line them up to send back to the NSA in the US.
How quickly calls are sent back to the NSA depends on the priority assigned to the monitoring. The 1 most important tasks—the most urgent surveillance needs—will jump to the top of the queue, while the 2 or 3 most important tasks will be added to the queue behind them. All calls in the queue will be transferred back to the NSA network in the US and added to their callers’ logs on the NSA’s audio communications analysis platform, Nucleon.
According to the budget documents received by the Post, Retrospective technology—as part of the Mystic program—could already be deployed in as many as five additional countries beyond Scalawag. The beauty (if you can call it that) of using distributed detection technology is that it makes it easier for the NSA to measure this type of surveillance, while keeping the impact on your low-bandwidth network connections to objects. the application that hosts taps to just less. The biggest limitation of the method is the local storage capacity required, but that can be significantly reduced by noise reduction. At worst, the NSA would have to reduce the amount of time your call is “saved” to accommodate a larger call volume, or add more disk drives.