A Collection of Quotes from “Data and Goliath” by Bruce Schneier

data-and-goliathI just finished Data and Goliath, by Bruce Schneier. I had about 10% of the book to go when the recent Apple/FBI matter came up, which inspired me to dive back in. For possibly the first time ever, I actually highlighted sections of the book while I was reading it (right from the beginning). Here they are.

The UK company Cobham sells a system that allows someone to send a “blind” call to a phone—one that doesn’t ring, and isn’t detectable. The blind call forces the phone to transmit on a certain frequency, allowing the sender to track that phone to within one meter.

We are living in the golden age of surveillance. Sun Microsystems’ CEO Scott McNealy said it plainly way back in 1999: “You have zero privacy anyway. Get over it.”

if you let us have all your data, we will show you advertisements you want to see and we’ll throw in free web search, e-mail, and all sorts of other services. It’s convenience, basically. We are social animals, and there’s nothing more powerful or rewarding than communicating with other people.

And why do we allow governments access? Because we fear the terrorists, fear the strangers abducting our children, fear the drug dealers, fear whatever bad guy is in vogue at the moment. That’s the NSA’s justification for its mass-surveillance programs; if you let us have all of your data, we’ll relieve your fear.

You could probably store every tweet ever sent on your home computer’s disk drive. Storing the voice conversation from every phone call made in the US requires less than 300 petabytes, or $30 million, per year. A continuous video lifelogger would require 700 gigabytes per year per person. Multiply that by the US population and you get 2 exabytes per year, at a current cost of $200 million. That’s expensive but plausible, and the price will only go down. In 2013, the NSA completed its massive Utah Data Center in Bluffdale. It’s currently the third largest in the world, and the first of several that the NSA is building. The details are classified, but experts believe it can store about 12 exabytes of data. It has cost $1.4 billion so far. Worldwide, Google has the capacity to store 15 exabytes.

Collecting metadata on people means putting them under surveillance. An easy thought experiment demonstrates this. Imagine that you hired a private detective to eavesdrop on someone. The detective would plant bugs in that person’s home, office, and car. He would eavesdrop on that person’s phone and computer. And you would get a report detailing that person’s conversations. Now imagine that you asked the detective to put that person under surveillance. You would get a different but nevertheless comprehensive report: where he went, what he did, who he spoke with and for how long, who he wrote to, what he read, and what he purchased. That’s metadata. Eavesdropping gets you the conversations; surveillance gets you everything else.

When you have one person under surveillance, the contents of conversations, text messages, and e-mails can be more important than the metadata. But when you have an entire population under surveillance, the metadata is far more meaningful, important, and useful.

In fact, that’s the basic promise of big data: save everything you can, and someday you’ll be able to figure out some use for it all.

These storage limits pertain to the raw trove of all data gathered. If an NSA analyst touches something in the database, the agency saves it for much longer. If your data is the result of a query into these databases, your data is saved indefinitely. If you use encryption, your data is saved indefinitely. If you use certain keywords, your data is saved indefinitely. How long the NSA stores data is more a matter of storage capacity than a respect for privacy.

I have an Oyster card that I use to pay for public transport while in London. I’ve taken pains to keep it cash-only and anonymous. Even so, if you were to correlate the usage of that card with a list of people who visit London and the dates—whether that list is provided by the airlines, credit card companies, cell phone companies, or ISPs—I’ll bet that I’m the only person for whom those dates correlate perfectly. So my “anonymous” movement through the London Underground becomes nothing of the sort.

Sometimes linking identities across data sets is easy; your cell phone is connected to your name, and so is your credit card. Sometimes it’s harder; your e-mail address might not be connected to your name, except for the times people refer to you by name in e-mail. Companies like Initiate Systems sell software that correlates data across multiple data sets; they sell to both governments and corporations. Companies are also correlating your online behavior with your offline actions. Facebook, for example, is partnering with the data brokers Acxiom and Epsilon to match your online profile with in-store purchases.

Paula Broadwell, who had an affair with CIA director David Petraeus, similarly took extensive precautions to hide her identity. She never logged in to her anonymous e-mail service from her home network. Instead, she used hotel and other public networks when she e-mailed him. The FBI correlated registration data from several different hotels—and hers was the common name.

Maintaining Internet anonymity against a ubiquitous surveillor is nearly impossible. If you forget even once to enable your protections, or click on the wrong link, or type the wrong thing, you’ve permanently attached your name to whatever anonymous provider you’re using.

In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using at that time. Researchers were able to de-anonymize people by comparing rankings and time stamps with public rankings and time stamps in the Internet Movie Database.

95% of Americans can be identified by name from just four time/date/location points.

Surveillance is the business model of the Internet for two primary reasons: people like free, and people like convenient.

The truth is, though, that people aren’t given much of a choice. It’s either surveillance or nothing, and the surveillance is conveniently invisible so you don’t have to think about it.

surveillance-based marketing

Some people have pledged allegiance to Google. They have Gmail accounts, use Google Calendar and Google Docs, and have Android phones. Others have pledged similar allegiance to Apple. They have iMacs, iPhones, and iPads, and let iCloud automatically synchronize and back up everything. Still others of us let Microsoft do it all. Some of us have pretty much abandoned e-mail altogether for Facebook, Twitter, and Instagram. We might prefer one feudal lord to the others. We might distribute our allegiance among several of these companies, or studiously avoid a particular one we don’t like. Regardless, it’s becoming increasingly difficult to not pledge allegiance to at least one of them.

It’s not reasonable to tell people that if they don’t like the data collection, they shouldn’t e-mail, shop online, use Facebook, or have a cell phone. I can’t imagine students getting through school anymore without Internet search or Wikipedia, much less finding a job afterwards. These are the tools of modern life. They’re necessary to a career and a social life.

In the US, anyone willing to pay for data can get it. In some cases, criminals have legally purchased and used data to commit fraud.

A criminal hacks into a database somewhere, steals your account information and maybe your passwords, and uses them to impersonate you to secure credit in your name. Or he steals your credit card number and charges purchases to you. Or he files a fake tax return in your name and gets a refund that you’re later liable for. This isn’t personal. Criminals aren’t really after your intimate details; they just want enough information about your financial accounts to access them. Or sufficient personal information to obtain credit.

Facebook’s CEO Mark Zuckerberg showed a remarkable naïveté when he stated, “You have one identity. The days of you having a different image for your work friends or co-workers and for the other people you know are probably coming to an end pretty quickly. Having two identities for yourself is an example of a lack of integrity.”

It’s not necessarily that we’re lying, although sometimes we do; it’s that we reveal different facets of ourselves to different people. This is something innately human. Privacy is what allows us to act appropriately in whatever setting we find ourselves. In the privacy of our home or bedroom, we can relax in a way that we can’t when someone else is around.

Whether or not anyone actually looks at our data, the very facts that (1) they could, and (2) they guide the algorithms that do, make it surveillance.

We need to defend against a panoply of threats, and this is where we start having problems. Ignoring the risk of overaggressive police or government tyranny in an effort to protect ourselves from terrorism makes as little sense as ignoring the risk of terrorism in an effort to protect ourselves from police overreach.

Unfortunately, as a society we tend to focus on only one threat at a time and minimize the others. Even worse, we tend to focus on rare and spectacular threats and ignore the more frequent and pedestrian ones. So we fear flying more than driving, even though the former is much safer. Or we fear terrorists more than the police, even though in the US you’re nine times more likely to be killed by a police officer than by a terrorist. We let our fears get in the way of smart security.

the costs of privacy loss are nebulous in the abstract, and only become tangible when someone is faced with their aftereffects. This is why we undervalue privacy when we have it, and only recognize its true value when we don’t.

In the courts, companies should litigate on their users’ behalf. They should demand court orders for all access, and fight back against any court orders that seem overly broad.

On four occasions in the early 2000s, Yahoo complied with Chinese government requests for data about individual users that led to those people’s arrest and imprisonment on charges of “subversion” and “divulging state secrets.” Should Yahoo have done that? Does it make a difference if the repressive regime is, like Saudi Arabia, on friendly terms with the US? Many US Internet companies argue that they are not subject to the jurisdiction of countries in which they do not maintain offices. A US company probably can’t resist Chinese law, but it probably can resist those of smaller and less powerful countries.

As soon as there’s a horrific crime or a terrorist attack that supposedly could have been prevented if only the FBI or DHS had had access to some data stored by Facebook or encrypted in an iPhone, people will demand to know why the FBI or DHS didn’t have access to that data—why they were prevented from “connecting the dots.” And then the laws will change to give them even more authority.

Fundamentally, the argument for privacy is a moral one. It is something we ought to have—not because it is profitable or efficient, but because it is moral. Mass surveillance should be relegated to the dustbin of history, along with so many other practices that humans once considered normal but are now universally recognized as abhorrent. Privacy is a human right.

Privacy is not a luxury that we can only afford in times of safety. Instead, it’s a value to be preserved. It’s essential for liberty, autonomy, and human dignity. We must understand that privacy is not something to be traded away in some fearful attempt to guarantee security, but something to maintain and protect in order to have real security.

Our data has enormous value when we put it all together. Our movement records help with urban planning. Our financial records enable the police to detect and prevent fraud and money laundering. Our posts and tweets help researchers understand how we tick as a society. There are all sorts of creative and interesting uses for personal data, uses that give birth to new knowledge and make all of our lives better.

Again and again, it’s the same tension: group value versus individual value. There’s value in our collective data for evaluating the efficacy of social programs. There’s value in our collective data for market research. There’s value in it for improving government services. There’s value in studying social trends, and predicting future ones. We have to weigh each of these benefits against the risks of the surveillance that enables them. The big question is this: how do we design systems that make use of our data collectively to benefit society as a whole, while at the same time protecting people individually?

I often turn to a statement by Rev. Martin Luther King Jr: “The arc of history is long, but it bends toward justice.” I am long-term optimistic, even if I remain short-term pessimistic. I think we will overcome our fears, learn how to value our privacy, and put rules in place to reap the benefits of big data while securing ourselves from some of the risks.

Data is the pollution problem of the information age, and protecting privacy is the environmental challenge.

I would also like to thank Edward Snowden, whose courageous actions resulted in the global conversation we are now having about surveillance. It’s not an exaggeration to say that I would not have written this book had he not done what he did. Also, as a longtime NSA watcher, reading those top-secret documents is pretty cool.

By Paul Cunningham

Paul is a writer and entrepreneur living in Brisbane, Australia. He enjoys spending time with his family and running in the mountains. Paul was the founder of Practical 365, a former Microsoft MVP, and Pluralsight trainer. Paul is also on Twitter and Instagram.