Web developer & user interface engineer
Tinkering with the web since 1996

email: joe@artlung.com · twitter: @artlung
San Diego, California, USA
aka joecrawford.com

There could not be more news, and there could scarcely be fewer bloggers

“Nobody goes there anymore. It’s too crowded.” ((Yogi Berra))

It’s 2013 and I’m 43 years old. I’m old enough to be called middle-aged now. I’m old enough to think I know a thing or two, and be right.

Perhaps it’s time to quote some Ronald Reagan, as I did on this very forum 10 years ago:

“You and I are told we must choose between a left or right, but I suggest there is no such thing as a left or right. There is only an up or down. Up to man’s age-old dream — the maximum of individual freedom consistent with order — or down to the ant heap of totalitarianism. Regardless of their sincerity, their humanitarian motives, those who would sacrifice freedom for security have embarked on this downward path.” ((Reagan, 1964))

The speech that came from was more about the worries of a welfare state and statism in general. But that nugget has a truth about it. The recent revelations of constant surveillance of Americans’ cell phones and internet traffic worry me. This is simply “signals intelligence” of course and in principle is a good thing. It’s the kind of data mining we’ve all grown used to. Devices which track our movements all the time. Debit and credit cards have done this on a more rudimentary level for decades: each transaction is tracked and stored. A record of my card use is a record of where I have been. We carry cell phones in ever greater numbers. More people have access to mobile phones than working toilets ((United Nations, 2013)). And with phones, comes GPS tracking and cell tower triangulation. If my phone is on, it is maintaining those “5 bars” of service by talking to cell towers and all of that data is stored. All the time. It’s trivial to keep all this data. And we accept it.

I happily use Facebook. When my mother was in the coma that preceded her death, Facebook was invaluable for keeping in touch with relatives for whom email was too infrequently checked, for whom phone numbers had changed, for whom texting was inadequate. I used it to coordinate flights, give out news, seek solace, share photos, share thoughts, private and public.

But Facebook and like systems are doing the same thing the credit cards and phones are doing. They track what I do and where I go, and what I think and who I am with. Who I have been with, who I marry, who I date, who my family is. And we happily submit to this surveillance because it comes with a concomitant value. I get a substantive value from being able to share photos asynchronously with my family. What’s the alternative to sharing photos with 50 or 100 people? I could go spend $500 to print and mail photos to my family: all geographically remote: or I could do the same thing by posting onto Facebook at the cost of my internet service or phone service and a bit of my time. Oh, and at the cost of Facebook’s systems learning a bit more about me.

And it is Facebook’s computers, by and large. And now the news is that more and more government computers are doing the same. It’s machines that are learning more about me. We call it learning, but it is a sadly blunt tool. In 2002 I read something that had a big impact on me about collecting mass data. In 2002 I linked to it. I’ll include it entirely for my purposes today, because it was talking about the “Total Information Awareness” program the government was purportedly working on at the time.

------ Forwarded Message
From: Benjamin Kuipers <kuipers@cs.utexas.edu>
Date: Sat, 14 Dec 2002 13:42:44 -0600
To: dave@farber.net
Cc: Benjamin Kuipers <kuipers@cs.utexas.edu>
Subject: 'Total Information Awareness' as a diagnosis problem


Consider the aims of the 'Total Information Awareness' program as
a problem in diagnosis.

Out of a large population, you want to diagnose the very few cases
of a rare disease called "terrorism".  Your diagnostic tests are
automated data-mining methods, supervised and checked by humans.
(The analogy is sending blood or tissue samples to a laboratory.)

This type of diagnostic problem, looking for a rare disease, has
some very counter-intuitive properties.

Suppose the tests are highly accurate and specific:
  (a) 99.9% of the time, examining a terrorist, the test says "terrorist".
  (b) 99.9% of the time, examining an innocent civilian, the test says
      "innocent civilian".

Terrorists are rare:  let's say, 250 out of 250 million people in the USA.

  (a) When the tests are applied to the terrorists, they will be
      detected 99.9% of the time, which means there is about a 25%
      chance of missing one of them, and the other 249 will definitely
      be detected.  Great!

  (b) However, out of the remaining 249,999,750 innocent civilians,
      99.9% accuracy means 0.1% error, which means that 250,000 of
      them will be incorrectly labeled "terrorist".  Uh, oh!

The law enforcement problem is now that we have 250,250 people who
have been labeled as "terrorist" by our diagnostic tests.  Only
about 1 in 1,000 of them is actually a terrorist.

If we were mining for gold, we would say that the ore has been
considerably enriched, since 1 in 1,000 is better than 1 in 1,000,000
by quite a lot.  There's still a long way to go, though, before
finding a nugget.

But we are talking about people's lives, freedom, and livelihoods here.
The consequences to an innocent civilian of being incorrectly labeled
a "terrorist" (or even "suspected terrorist") can be very large.

Suppose, out of the innocent people incorrectly labeled "terrorist",
1 in 1,000 is sufficiently traumatized by the experience so that
they, or a relative, actually *becomes* a terrorist.  (This is
analogous to catching polio from the polio vaccine:  extremely rare,
and impossible with killed-virus vaccine, but a real phenomenon.)

In this case, even after catching all 250 original terrorists,
250 new ones have been created by the screening process!

The numbers I've used give a break-even scenario.  But 99.9%
accuracy and specificity is unrealistically high.  More realistic
numbers make the problem worse.  Nobody knows what fraction of people
traumatized as innocent victims of a government process are seriously
radicalized.  1 in 1,000 is an uninformed guess, but the number could
be significantly higher.

A mass screening process like this is very likely to have costs that
are much higher than the benefits, even restricting the costs to
"number of free terrorists" as I have done here.  Adding costs in
dollars and the suffering of innocents just makes it harder to
reach the break-even level.

Ask your neighborhood epidemiologist to confirm this analysis.
It is applied routinely to public health policy, and applies
no less to seeking out terrorists.

There are alternative ways to detect and defend against terrorists.
Mass screening approaches like TIA are very questionable in terms of
costs and benefits.

Best wishes,


Benjamin Kuipers, Professor         email:  kuipers@cs.utexas.edu
Computer Sciences Department        tel:    1-512-471-9561
University of Texas at Austin       fax:    1-512-471-8885
Austin, Texas 78712 USA             http://www.cs.utexas.edu/users/kuipers

A few months ago Anil Dash reminded us all that at one time we, the people of the web, roundly rejected single sign on schemes no matter their utility:

In 2003, if you introduced a single-sign-in service that was run by a company, even if you documented the protocol and encouraged others to clone the service, you’d be described as introducing a tracking system worthy of the PATRIOT act. There was such distrust of consistent authentication services that even Microsoft had to give up on their attempts to create such a sign-in. Though their user experience was not as simple as today’s ubiquitous ability to sign in with Facebook or Twitter, the TypeKey service introduced then had much more restrictive terms of service about sharing data. And almost every system which provided identity to users allowed for pseudonyms, respecting the need that people have to not always use their legal names.

Largely, we have traded the simplicity and ubiquity away. Probably because keeping passwords for every service straight is more than a person can really do. We don’t understand computers and we don’t understand computer security. We don’t understand risk. A study was done dropping USB sticks in a parking lot, and the results were dismal: US Govt. plant USB sticks in security study, 60% of subjects take the bait. Bruce Schneier pointed out the failure here isn’t human, but technical: The problem is that the OS trusts random USB sticks. The problem is that the OS will automatically run a program that can install malware from a USB stick. The problem is that it isn’t safe to plug a USB stick into a computer.((Yet Another “People Plug in Strange USB Sticks” Story, June 29, 2011)).

Tim Bray wrote recently about the resurgence of the blog ((Blogginess)). Well, if not the resurgence, at least the utility and expressiveness of the blog. Blogs at their best are not in the clutches of these third parties. The blog you’re reading this on is self-hosted, backed up, and not censorable by Facebook. But of course, the network effects of the social networks are powerful. But I think lone voices still have a chance to change minds. The thing of course, is that now we partly measure influence in terms of number of likes ((Facebook)), retweets ((Twitter)), reblogs ((tumblr)), reshares ((Google+)).

There is no decentralized repository of influence.

If you have an hour, listen closely to Anil Dash’s talk on The Web Web Lost. He makes compelling characterizations of these spaces we’ve all agreed to participate in, and how these spaces do not have the same free speech rules. Which is to say, they are not as free. Here’s where I’m tempted to quote Patrick Henry ((I’ve been known to before)) but the truth is I want to have freedom entirely, and I also want to participate in these walled gardens.

And right now I am feeling the tension of not being able to unravel the contents of this omelet of freedom and less-freedom.

I doubt any of this was coherent, but I’ll post it anyway. I reserve the right to revise and extend my remarks.

two comments so far...

I like this. Thanks for keeping some useful discourse flowing! And for being one of the watchful eyes needed to keep this country “great.”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.