PRISM: Here's how the NSA wiretapped the Internet

UPDATED 5: The National Security Agency's "PRISM" program is able to collect, in real time, intelligence not limited to social networks and email accounts. But the seven tech companies accused of opening 'back doors' to the spy agency could well be proven innocent.
Written by ZDNet Community, Contributor and  Zack Whittaker, Contributor

Editor's note: The following article should be treated as strictly hypothetical. It has been editorialized to simplify the content in certain areas, while maintaining as much technical detail as we can offer. Companies named in this article have been publicly disclosed, or used in example only. This piece should not be taken necessarily as fact but as a working theory that portrays only one possible implementation of the U.S. National Security Agency's PRISM program as it may exist today. Several ZDNet writers contributed to this report.

The privacy scandal embroiling the Obama administration.
Image: National Security Agency

Let's start off with what we know, and then we'll explain what we have discovered.

A secret court known as the Foreign Intelligence Surveillance Court (FISC), created under the Foreign Intelligence Surveillance Act 1978 and subsequently amended by the Patriot Act in 2001, forced Verizon to hand over "tangible things" to the U.S. National Security Agency (NSA).

The news was first reported by London, U.K.-based newspaper The Guardian.

A day later, another leak pointed to a surveillance program known only as PRISM, which was funded by the NSA. A classified document in form of a PowerPoint deck, designed to train new operatives, was published online. Only four out of 41 slides were published in The Washington Post.

It was later revealed, on Saturday, June 8, that the source of the NSA document leak was 29 year-old Edward Snowden, an employee of government security contractor Booz Allen Hamilton who was stationed at the NSA's operations center in Hawaii and had since fled to Hong Kong.

The slides indicated that AOL, Apple, Facebook, Yahoo, Google and YouTube, Microsoft and Skype, and little-known company PalTalk were involved in some way. The slides described how these companies were "current providers" but did not explicitly state that these firms knowingly or directly handed over data to the intelligence agency.

The wording on the fourth slide described the "dates when PRISM collection began for each provider," and not, for example, "dates when each provider began PRISM collection."

One by one, nearly all of the named companies denied knowledge of either knowing about PRISM, or providing any government agency user content, data or information without a court order or a search warrant.

But during that time, almost everyone forgot about Verizon. It's the cellular and wireline giant that makes the whole thing come together.

Update at 2:30 p.m. ET on June 8: A new PRISM slide has been released by The Guardian

New PRISM slide, released June 8
Image: The Guardian

The newspaper believes the new slide "clearly distinguishes PRISM," which collects data "directly" from these technology companies, from a separate set of four different programs involving the collection of data from "fiber cables and infrastructure as data flows past."

It also says the slide suggests that the NSA also collects some data under the Section 702 of FISA — but that these four programs, two of which have been redacted, are "distinct from PRISM."

Section 702 of FISA effectively says the U.S. Justice Dept. must show that its proposed snooping will not intentionally target U.S. residents or U.S. citizens abroad, and it must comply with the Fourth Amendment. This recipient of an order served under Section 702 of FISA can in fact be appealed, but it has proven difficult based on a 2009 case [PDF], because there were "several layers of [...] safeguards."

That said, we still believe PRISM, as we suggest later, to be an application of sorts that sits on top of, or across a vast constantly updating data set. CNET's Declan McCullagh notes that PRISM also happens to be the acronym of an existing data processing tool, which has long been in common military use. PRISM stands for "Planning Tool for Resource Integration, Synchronization, and Management."

We do not know if the two are related or connected.

Because the slide says that analysts "should use both" the upstream data collection and PRISM collection, it does indicate that there may in fact be two methods of acquiring private user data. 

And here's what we think. We believe the new slide published on Saturday does not alter what is in this article, which of course remains a hypothetical working theory.

However, based on this leaked material so far, we strongly suspect that the leaked PowerPoint slides are probably not written by technical people. It's likely that these slides were prepared as a internal marketing tool for new recruits. So, when the slides say: "direct access to servers," that statement may well be an oversimplification of the facts, and we, the media, are latching too much onto it.

The "direct" server data from these named companies may well be retrieved from cached copies maintained by the content delivery networks, which are located in the Tier 1 provider's datacenter.

Because the infrastructure required to deliver media and Web applications, for instance, from these content delivery networks worldwide is so immense, many of them need to lease datacenter space offered by Tier 1 providers, such as AT&T and Verizon. 

It's possible that a network equipment maker has built a router that looks indistinguishable from other core routers in that datacenters, which contains a beam splitter that literally splits the Tier 1 fiber connection — with one split beam passing a copy of that data to an external NSA datacenter or storage. 

Update at 5:00 p.m. ET on June 8: The U.S. Director of National Intelligence James Clapper has released a statement addressing the "collection of intelligence" under Section 702 of FISA.

In a published document [PDF], it highlights certain key facts, according to the U.S. government:

"PRISM is not an undisclosed collection or data mining program," the document says, adding that it is an "internal government computer system" designed to "facilitate [...] authorized collection of foreign intelligence." It notes that PRISM was "widely known and publicly discussed" since its inception in 2008. However, according to the leaked slides, collection of data began for Microsoft in late 2007. It seems to corroborate CNET Declan McCullagh's article published on Friday.

It's also worth noting that most of this document considers Section 702 of FISA, rather than PRISM directly or any related NSA application or system. As follows:

Section 702 of FISA "does not unilaterally obtain information from the servers of U.S. electronic communication service providers." It notes that such data is collected is under the authority of the FISC and with the "knowledge of the provider."

This bit is interesting. An "electronic communications service provider," according to the EFF, in regards to the Wiretap Act: "As a rule, a communication is an electronic communication if it is neither carried by sound waves nor can fairly be characterized as one containing the human voice (carried in part by wire)."

Separately, the EFF also notes that anyone from ISPs to message boards and some websites are conisdered electronic communications service providers. On a side note, an Ars Technica article from 2009 says that the definition remains vague and under scrutiny by the U.S. courts.

We thought that this meant the U.S. government is saying it doesn't wiretap optical cables, such as those provided by Tier 1 companies.

But then we read it again.

The U.S. government [emphasis ours] "does not unilaterally obtain information from the servers of U.S. electronic communication service providers." This means the servers, such as those in the datacenters, "owned" by the named seven companies. Except many of those servers are in fact managed by the datacenter company — the Tier 1 companies.

Other interesting snippets from the document:

  • "In short, Section 702 facilitates the targeted acquisition of foreign intelligence information concerning foreign targets located outside the United States under court oversight." 
  • "Service providers supply information to the Government when they are lawfully required to do so." This means court orders and FISC orders, which in some cases cannot be appealed, and always come with gagging orders.

The document also says the U.S. government cannot target "anyone" under Section 702 "unless there is an appropriate, and documented, foreign intelligence purpose for the acquisition." This includes for the prevention of terrorism. 

"In addition, Section 702 cannot be used to intentionally target any U.S. citizen, or any other U.S. person, or to intentionally target any person known to be in the United States," it says, adding: "cannot be used to target a person outside the United States if the purpose is to acquire information from a person inside the United States."

The rest of the document, which can be read online [PDF], continues on for another page or two about accountability and the minimization procedures of how the intelligence agencies treat information.

[Update ends.]

Verizon Business was at the heart of a FISC order that invoked Section 215 of the Patriot Act [PDF] which forced the company to hand over any "tangible things," which was effectively anything it had.

Verizon Business Network Services, or simply "Verizon Business," is what is known as a Tier 1 network provider, after it acquired a number of firms during the late-1990s and early 2000s. It offers Tier 1 services under the brand UUNET.

We believe the FISA court order authorized the NSA to place a wiretap device on Verizon Business' Tier 1 network, which effectively vacuumed up every bit and byte of data that flowed through its networks. If this is the case, Verizon would have been forced to comply, with no grounds to appeal.

The key to this is what a Tier 1 network actually does, how it works, and which companies use it. Because all of the aforementioned companies use Tier 1 networks, and as a result they may have unknowingly had their customers' data siphoned off simply by being connected to the Internet.

Tier 1s: The super-fast network arteries that power the Web

To use Edward Snowden's own words "We hack network backbones – like huge Internet routers, basically – that give us access to the communications of hundreds of thousands of computers without having to hack every single one."

The Internet may be distributed and decentralized in nature, but there is a foundation web of connectivity that enables major sites and services to operate. These are referred to as "Tier 1" network providers. Think of these as pipes of the main arteries of the Internet, in simple terms.

The data that flows on them goes directly to the location they are needed, which ultimately allow datacenters to communicate with each other across oceans in the matter of microseconds. Businesses and their datacenters do not miss a beat.

There are only just over a dozen Tier 1 network providers in the world, including AT&T, Level 3, and Sprint in the U.S.; Deutsche Telekom in Germany; NTT Communications in Japan; and Telefonica in Spain, just to name a few major brand names. Verizon Business is, of course, also on that list as a U.S.-based Tier 1 network provider.

These networks allow major businesses, television networks, science labs, and governments, for instance, to share vast amounts of data across the Internet in a very short space of time. This isn't being done on the public Internet, in which data "hops" about different networks looking for the cheapest path. Instead data flowing on Tier 1 networks take the simplest path. 

Plus, many of the aforementioned companies have datacenters in multiple locations around the world. These need to communicate instantaneously to ensure geo-redundancy, so if one datacenter goes down, the data is stored elsewhere safely.

Edge devices, known as "peers," are entry points of Tier 1 Internet service providers to their enterprise customers.

For example: CBS (which owns ZDNet) is connected to a Tier 1 network via a peering connection so it can broadcast material instantly without delays or hitches. Verizon and AT&T, as examples of home and business Internet providers, are also hooked into the Tier 1 network and offer similar peering connections. 

Companies with peering connections to Tier 1 networks include corporations like AOL, Apple, Facebook, Yahoo, Google and YouTube, Microsoft and Skype. Peering connections to Tier 1 networks not only allow these companies to participate as enterprises to the wider Web with the fastest connection possible, but also to enable users sitting at home on their broadband providers' network to access various services and included content without routing through the public, slower Internet.

Simply put, it's why Facebook and Google load so quickly and function instantly for so many users.

Take Facebook as a good example. Users expect extremely fast response times. As you sit at home browsing the site, at each request your copper telephone wire or fiber connection then links up to your Internet provider's network, which is likely a Tier 2 network, the most common kind of network. That data then travels through a private optical carrier link to Facebook, which will have an edge connection connecting the Tier 1 connection to its network or its datacenter. The data is pulled for the user and sent back over the Tier 1 connection. 

In even simpler terms, Facebook and other companies have created a private connection to your Internet provider at home or work so that these sites can load up almost instantly without using the public Internet at all.

How can the NSA capture this user data? Good ol' fashioned wiretapping

The chances are that the aforementioned companies have indeed had their customers' data intercepted by the NSA. It is almost entirely the case that these companies had no idea about PRISM before it broke in the media, as their respective statements have claimed, or that any data was passed by these companies directly to the NSA or any other intelligence agency. 

The easiest way to acquire this data — with as few people know about it — would be to simply wiretap the data as it's traveling along the Tier 1 optical carrier lines.

How the NSA can do it —>

<— What we know; what we think

By tapping into the connection between the Tier 1 network and the edge connection, the NSA would be able to literally view and copy data transmitted over every single session from a user to an application in realtime, and then stored and processed appropriately.

You can't walk into, say, Apple's iCloud datacenter and install a wiretap. Apple would notice it. It would have to be done out of band: such as when the data leaves the datacenter and begins its journey on the way to the user sitting at home on their laptop or mobile device.

Microsoft's Hotmail service — now defunct, and rebranded as Outlook.com — was on the list of PRISM services that were being accessed by the NSA. But the NSA didn't need to seek Microsoft's permission, or even to serve it with a court order or a ruling from the FISC. Because of the sheer size of the company, someone would have eventually either said something to someone else and broken the law by breaching the gagging clauses in the process — or someone would've noticed a backdoor in the systems somewhere.

And, using Hotmail as an example, if the NSA was acquiring all the data since September 2007 — the time the leaked slides show the data harvesting began — the NSA would in theory now have all of everyone's Hotmail data to date.

But that would be almost useless to the NSA. The agency wants to know about the "here and now," not "then." They want information that is immediately actionable.

There's the issue of encryption, such as an SSL connection, which offers a HTTPS secure pipe between the user's computer and the website providing the service. It's like a metal pipe that stretches end-to-end. The port that's opened up on your computer is encrypted and everything that flows through it is completely unreadable.

But if the NSA were intercepting traffic and decrypting it somehow on the edge connection between the application service provider — such as Facebook, Gmail, Amazon, for example — and the Tier 1 network, the application service provider would be unaware that this was happening.

There are a number of wiretap-related laws, and which one is used depends on the case. Of course the main one is the Wiretap Act. But it all depends on which law may sway the judge that must hand the order down to authorize such an act.

According to the Electronic Frontier Foundation (EFF), the Wiretap Act requires police, law enforcement or intelligence agencies to seek a warrant — often called a "super-warrant" — to intercept "electronic communications," such as Internet activity and cell activity. This includes emails, Web history, text messages and instant messaging, and more.

The privacy group states that under the Wiretap Act, although a wiretap order is needed to intercept your electronic communications, only your oral and wire communications — such as voice communications — are covered by the statute's "exclusionary rule." If your phone calls are illegally intercepted, such as without a warrant, that evidence can't be introduced against you in a criminal trial. But, the statute will not prevent the introduction of illegally intercepted emails and text messages in court.

Section 215 of the Patriot Act, which amended the Foreign Intelligence Surveillance Act 1978, allows the government to acquire "tangible things" — so long as the FISC court is aware that it is for an "authorized investigation" to "prevent terrorism" or "clandestine intelligence activities." 

Also, the Communications Assistance for Law Enforcement Act (CALEA), passed in 1994, requires U.S. telecoms firms and manufacturers to ensure their equipment is able to implement government wiretaps. This not only includes traditional telephone lines and broadband connections, but also voice-over-IP (VoIP) traffic.

This should be enough for the NSA to wiretap Tier 1 companies.

Update at 9:00 a.m. ET on June 8: SSL section edited for clarification. Thanks for the feedback; we'll address this in the comments section.

Although SSL-encrypted data is still unreadable at its current destination, the NSA likely has the capabilities to break this encryption later at its datacenter, presumably using vast computational resources. This would have to be done for each session, and likely only for targets of interest since the ability to do this would be extremely computationally expensive, as both public key and symmetric keys would have to be cracked.

Alternatively, the U.S. government could issue a FISA order against the certificate authorities themselves. FISA may well negate any SSL-decryption methods — whether they exist or not.

Let's explore both.

Today most Web services use 128-bit or 256-bit key encryption, both of which would be child's play for an advanced NSA supercomputer optimized specifically for cryptographic work to crack. Facebook and Google, for example, use 128-bit RSA encryption with TLS 1.1 connections for their Web servers. (Google is planning to move to a 2048-bit RSA key later this year.)

Issuing certificate authorities, as well as the National Institute of Standards (NIST), have already recommended that businesses move their Web servers to more complex encryption methods. This is because these sessions are crackable by conventional computer technology, let alone something exotic that the NSA might have in its possession.

However, the SSL encryption appliances and co-processors which must do this at the Web server end without significantly compromising application or server performance are extremely expensive. Corporations have been lax in moving to these standards. 

Cracking the encrypted SSL sessions could also be achieved through compromised certificates from the issuing certificate authority, making decryption of vast amounts of sessions that much easier.

Recommendations for digital signature encryption hash length, NIST

Having someone on the inside leaking the certificate's private key to a third party like the NSA is unlikely. It could also be discovered relatively easily. What's more likely is that the U.S. government could petition the FISC in order to seek a secret warrant against the certificate authorities. 

The chief executives of Google, Facebook, Microsoft and so on would be none the wiser because they would not have been told by the certificate authorities, as per the gagging order. It's possible that a FISC order could even prevent the certificate authority's chief executive's from knowing. From here, they can forge any SSL certificate. This would allow the NSA to conduct a man-in-the-middle attack without the user or the company involved even knowing.

[Update ends.]

In addition to the direct tap of the Tier 1 edge connections, the NSA is also likely making direct copies of application databases, their contents and files stored at content delivery networks (CDNs).

In many cases this is the exact same thing as the Tier 1 edge peer because this is how the content is distributed in the first place. Ultimately this would allow the NSA to reverse engineer the information as it was stored in the original application, and would not require nearly as much computational power to break than individual SSL sessions, one at a time.

There are two main benefits to wiretapping the Tier 1 edge connections.

Firstly, the companies involved that provide Web services and applications are unaware of the data gathering because it happens outside of their networks. Secondly, these Tier 1 network providers have a far smaller employee base working in these divisions than the aforementioned companies. This allows the NSA to either send its own employees in as "virtual" employees — working under the guise of these companies — while the NSA gags those companies from disclosing this fact to other staff. They could look like special contractors that only work with the special wiretapping routers.

With this technique, the number of people who actually know about the wiretapping would remain low.

Only those who actively use the PRISM system to examine the wiretap-collected data as well as a few people within the Tier 1 companies and the network equipment manufacturer that develops the wiretapping hardware would be directly complicit in this scheme.

All of those involved could be delivered gagging orders under a FISC order, such as the one Verizon received in April, which was published by The Guardian, and face prosecution and jail time if they talk.

The likelihood is that, should this theory prove true, other governments and nations may also be complicit in NSA's wiretapping scheme. The U.K. government has already been implicated with its listening station, the Government Communications Headquarters (GCHQ), reportedly using PRISM, in spite of intelligence sharing and mutual legal assistance treaties between the two countries.

Perhaps this is even happening as far as the UKUSA Agreement, in which the U.K., the U.S., Canada, Australia, and New Zealand agreed on signals intelligence sharing. Or, it could go as far as NATO countries. But there are some doubts over the "NOFORN" classification tags on some of the leaked documents, which indicates that foreign nations — including those in the UKUSA Agreement — are not allowed to view them.

PRISM and data mining: What data is being collected?

PRISM could be considered the "ECHELON 2.0" signals intelligence gathering system between the countries in the UKUSA Agreement. The intelligence system allegedly monitors almost anything carried over telephone wires and intercepts satellite communications.

But telephone wires are in a dying category. Datacenters and cloud services are the new norm. And amid the privacy scandal, it still isn't clear what the wiretapped data actually is.

PRISM is probably more like a Web-based application — like a search engine — than a "program" or an "operation." Behind the scenes there will be a vast big data operation that uses algorithms, natural language queries and search syntax to extrapolate the data the NSA operative needs.

It's also likely that, like a tabbed Web application that you would see in Google's range of services across the top of the page, it would not be surprising if PRISM was just one application out of many in the NSA's toolshed.

It probably sits like a layer on top of the NSA's shared resources and infrastructure. Everything the NSA is doing — at least data pertinent to the operative or analyst — could be fed in, like a data mining application. The NSA collects all kinds of data — from phone call metadata and email content data to radio waves and satellite communications. It may just need a FISC warrant to actively access some of it.

What's also not clear is just how much data is being harvested and stored. The NSA started to capture data from Microsoft in 2007, the leaked documents say. Following on from that, Yahoo was next in 2008 and Google, Facebook and PalTalk in 2009. And so on.

But does the NSA still hold that data? Or does it wipe its storage after six months or so — once the data has proven to be no longer relevant or useful? It's possible that there are daily snapshots, as per the Verizon court order, and takes a copy for later on-the-fly searches. Or perhaps an algorithm is constantly searching for key terms or user-specific data, using only a portion of the overall space, like a surgical search rather than downloading everything.

We simply don't know, and can only hypothesize until further leaks emerge, if any.

One source speaking to ZDNet under the condition of anonymity said $20 million — the amount quoted by the NSA in the leaked document that covers the cost of the PRISM program — wouldn't even cover the air conditioning costs and the electrical bill for the datacenter. Taking the datacenter out of the equation, $20 million would even not cover 3-6 months worth of data storage required to store keep copies of the wiretap data, they said. The storage the NSA would be procuring is most likely the most expensive, high-speed storage taxpayer money can buy.

That said, the Digital Collection System Network (DCSNet) wiretapping system, which connects, stores, indexes and analyzes metadata — such as sender and recipient email addresses, outgoing and incoming phone numbers, and time and date details — cost the Federal Bureau of Investigation (FBI) double that at $39 million by 2007.

The NSA and what it does with telephone call data

But what about the NSA harvesting of Verizon telephony data that sparked off this entire controversy in the first place? PRISM may not have necessarily been designed for that. PRISM may well be just one application out of a suite of NSA-created applications that perform different things. While PRISM focuses entirely on application sessions in the cloud, another application may in fact focus on the recording of phone calls.

President Obama, speaking on Thursday, said: "When it comes to telephone calls, nobody is listening to your telephone calls. That's not what this program was about. As indicated, what the intelligence community is doing is looking at phone numbers and durations of calls."

It would be easy to suggest that in fact nobody is listening to phone calls, at least semantically speaking. It was likely a very carefully considered sentence. But even logistically, there are too many calls to listen to anyway. It's entirely possible that algorithms are being used to transcribe and detect certain words, but in this case that's not too important and goes a little off base.

Obama also referenced "the program." This could strictly mean the examining of telephony metadata, such as the phone numbers and the durations of calls. This could be part of a broader set of tools developed by the NSA to distribute wiretapped data to the appropriate databases, completely separate from what PRISM does, which is to wiretap Internet sessions to cloud-based applications.

In practice, not everyone working at the NSA wants to listen to actual phone calls because this is a labor and time intensive activity. All they need is the metadata which describes how the call occurred. This is enough to establish a connection between two people, and therefore "reasonable suspicion." From here, it's enough to seek a warrant and prosecute as and where necessary using the legal tools that the U.S. government has at its disposal.

The NSA will have different applications doing different things. If there is the ability to record voice conversations — so long as the law allowed it, under FISA or the Wiretap Act — it would probably be in there somewhere. 

What makes PRISM tick?

While we can never be completely sure what infrastructure and software makes up the PRISM system, we have some reasonable ideas. While there would certainly be some custom and exotic hardware involved — as the NSA has its own chip-making capabilities — many of the components would be the same off-the-shelf enterprise hardware and software that powers line of business applications and services at major corporations. 

In some cases, they may be versions of these off-the-shelf systems on "steroids" using early or bleeding edge versions of processors and other components built by vendors under secret contracts.

IBM, for example, has already demonstrated sophisticated natural language abilities in Watson, which participated on the "Jeopardy!" game show in 2011. All the NSA would need to do is buy a tremendous amount of this equipment. Searching through wiretapped application data is a relatively simple exercise compared to having Watson participate on the fly in a trivia game show. 

The NSA supercomputer at the heart of PRISM likely resembles a gigantic Watson using advanced cryptographic co-processors which may employ nanophotonics like those announced by IBM in 2010 — and digs through this information at incredible speeds — petascale and exacale levels — so that only the "cream" rises to the top using key phrases and other patterns as triggers. This would be an evolution of what already existed in ECHELON, but would have more advanced natural language processing capabilities. 

And why should you believe this? It's been done before

In 2006, Room 641A became headline news. It was inside a building in San Francisco that AT&T owned, which fed in fiber optic cables from other telecoms switch buildings carrying Internet backbone traffic. Though this building was only three floors high, it had "the capability to enable surveillance and analysis of internet content on a massive scale, including both overseas and purely domestic traffic," according to Internet expert J. Scott Marcus at the time.

The "beam splitter" was used to — quite simply — split the fiber optic beam to redirect duplicate copies of all phone calls, Web traffic and email content into the clandestine room. That copied data was then handed to the NSA. According to the EFF, one expert said: "This isn't a wiretap. It's a country-tap."

It was effectively a gigantic wiretap on a huge portion of the Internet flowing in and out of the U.S. This led to an almighty class action lawsuit led by the EFF. Perhaps the more worrying part of this is that the wiretap included vast amounts of U.S. resident data, which falls in breach of FISA. Obama said on Thursday: "With respect to the Internet and emails, this does not apply to U.S. citizens and it does not apply to people living in the United States."

The nature of a beam splitter — a "prism" — therefore seems like an apt name for what appears to be a logical progression of Room 641A

In simple terms, this could be exactly what is happening at Tier 1 edge devices, which splits the beam and redirects it to equipment monitored by the NSA. Granted, in this day and age it would be a little obvious to do exactly the same thing that transpired in Room 641A. Instead, a special beam-splitting router installed at the edge connection could perform this and siphon off the data. 

PRISM is likely massive part-signals intelligence (SIGINT) and big data application that has the active and knowing involuntary participation of the U.S.' largest telecom firms, network equipment makers, supercomputer builders, and government-outsourcing professional services companies as the moving cogs in the privacy-invading machine.

Editorial standards