“How many people here have built a system that takes a billion requests a day? Well you could. And actually that’s the point of this conversation–what I want to talk about. It’s the same thing that’s made Google possible I mean think about what Google does, we take hundreds of millions of fairly hard queries a day; the queries tend to say things like ‘searching for camels in Tanzania’ and we sort of shake our head and try and figure out what that means and we go over petabytes of content, not terabytes but petabytes of content. And we have a couple hundred milliseconds in which we’re allowed to search the entire petabytes and return back to you what we found in rank order. So not only are we trying to search really, really large amounts of data we’re trying to search it extraordinarily quickly and we’re trying to do this hundreds of millions of times a day. And we do it. And we do it without a helluva lot of sweat. The way I think about Google is that’s it’s lots of PHDs driving tanks. It’s all about brute force. Everyone’s sort of General Patton–they don’t drive around the wall they drive through the wall. It’s really dumb techniques, used in large scale: I mean for example, the spellchecking. Every so often when you type a Google query and it will say ‘did you mean,’ and it’s usually because you put in a typo. This is not because we have some incredible dictionary or some brilliant thesaurus that tells us what you meant. It’s because we’re tracking what people type _after_ they type the query that didn’t return anything — and it turned out that that was a very efficient way to figure out what you probably meant to type, in fact it works much better than any spellchecker. But notice the stupidity of the approach: ‘people who typed this usually wanted to do this’–works great.”
Comments Open; Trackbacks Open.