Geeking with Greg: 06/01/2007

Saturday, June 30, 2007

A/B testing at Amazon and Microsoft

Ron Kohavi, Randal Henne, and Dan Sommerfield from Microsoft have a paper on A/B testing, "Practical Guide to Controlled Experiments on the Web" (PDF), at the upcoming KDD 2007 conference.

Ronny Kohavi was at Amazon.com as Director of Personalization and Data Mining for about two years (Sept 2003 - June 2005). The paper contains some mentions of Amazon's A/B testing framework (which was developed in the 1990s, but has been continuously refined since then) and other useful information on running experiments on a live website.

Some excerpts from the paper:

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called ... A/B tests.

The authors of this paper were involved in many experiments at Amazon, Microsoft, Dupont, and NASA. The culture of experimentation at Amazon, where data trumps intuition, and a system that made running experiments easy, allowed Amazon to innovate quickly and effectively.

Controlled experiments provide a methodology to reliably evaluate ideas ... Most organizations have many ideas, but the return-on-investment (ROI) for many may be unclear ... A live experiment goes a long way in providing guidance as to the value of the idea.

Many theoretical techniques seem well suited for practical use and yet require significant ingenuity to apply them to messy real world environments. Controlled experiments are no exception. Having run a large number of online experiments, we now share several practical lessons:

A Treatment might provide a worse user experience because of its performance ... because it is slower ... Compute the minimum sample size needed for the experiment ... We recommend that 50% of users see each of the variants in an A/B test ... A small [win] ... may not outweigh the cost of maintaining the feature ... Running frequent experiments and using experimental results as major input to company decisions and product planning can have a dramatic impact on company culture.

Amusingly, some details from a couple of the posts on this weblog are quoted at a couple points in the paper.

Ronny also gave a talk (PDF) at eBay Research Labs earlier this month that covered similar material.

See also Dare Obasanjo's post on Ronny's paper and his eBay Labs talk.

See also "Front Line Internet Analytics at Amazon.com" (PDF), a 2004 talk by Ronny Kohavi and former Amazon.com Personalization Director Matt Round that has more details on Amazon.com's A/B testing framework.

Update: It appears that all three of the authors of the paper -- Ron Kohavi, Randal Henne, and Dan Sommerfield -- were at Amazon.com. Randal Henne was at Amazon Apr 2003 - May 2006. Dan Sommerfield was there Dec 2003 - Jul 2006.

Update: Four months later, Jeff Bezos discusses the value of experimentation at Amazon in a HBR interview [Found via Werner Vogels].

More on Google Scalability Conference

The first Google Scalability Conference was held here in Seattle last weekend. Sadly, it was scheduled for the same weekend as Foo Camp, so I was not able to attend.

Fortunately, Microsoft blogger extraordinaire Dare Obasanjo made it to the conference and wrote up excellent and detailed notes on four of the talks:

It looks like a very interesting conference. Sorry I missed it!

Supposedly, "all sessions will be available on YouTube and Google Video after the conference." So far, I was only able to find one talk there, "Building a Scalable Resource Management System for Grid Computing".

The others may show up on Google Video in a few days. If they do, I will update this post with links to them.

If you are interested in this conference, you might also be interested in a few ([1] [2] [3] [4] [5]) of my many older posts on large scale computing at Google that include links to other papers, presentations, and videos.

Update: Some other notes ([1] [2]) on the conference from Robin Harris at StorageMojo. [via Ewan Silver].

Update: Some more notes, this one from François Schiettecatte.

Update: Torsten Curst posted notes as well and was particularly impressed with the Amazon talk. As François Schiettecatte mentioned, a paper with at least some of the material from that talk, "Dynamo: Amazon's highly available key-value store", will be presented at the upcoming ACM Symposium on Operating System Principles.

Update: Dan Creswell has more details on the Amazon talk.

Update: Three more of the conference talks ([1] [2] [3]) are now available on Google Video.

Update: Two more of the conference talks ([1] [2]) on Google Video, including the one by Marissa Mayer.

Update: In a more recent post, "Google Scalability Conference talks available", I listed all the talks that are available to make it easier to watch them.

PC World on Windows desktop search

PC World slams Windows desktop search, putting it as #13 on "The 20 Worst Windows Features of All Time" and saying:

It's kind of astonishing: Windows users had to wait nearly a quarter century, until Windows Vista, for an OS with really good search features.

Windows XP Search may be the worst of all, with an interface that's as patronizing as it is sluggish and confusing.

See also my Mar 2005 post, "Desktop search should not exist", where I said, "The opportunity for third-party desktop search apps [only] exists because the Microsoft Windows file search is pitifully weak."

See also my Nov 2006 post, "Is desktop search over?", about the desktop search that finally works in Windows Vista.

Monday, June 25, 2007

Fee fie Foo done

I just got back from Foo Camp, Tim O'Reilly's "Friends Of O'Reilly" conference.

It was an interesting event, pretty much as described, a self-organized, somewhat chaotic blend of "people who're doing interesting works in fields such as web services, data visualization and search, open source programming, computer security, hardware hacking, GPS, alternative energy, and all manner of emerging technologies" who sat down to chat, debate, "share their works-in-progress, show off the latest tech toys and hardware hacks, and tackle challenging problems together."

Most incredible was the diverse group of attendees, ranging from university professors to tech gurus to venture capitalists to goofy little startups. In addition to the various tech celebrities -- Larry Page, Caterina Fake, Paul Graham, Ray Ozzie, Kevin Rose, to name just a few -- there were even folks such as Wes Boyd, founder of MoveOn.org.

Part of the experience is the opportunity to bump into random people and explore various ideas. Joe McCarthy and I discussed innovation at startups versus innovation in research groups. I talked to Paul Kedrosky about trying to use information on the web for hedge funds. I had an extended discussion with Wes Boyd about the future of journalism. Steve Yegge convinced me not to hate Javascript quite so much. Udi Manber and I discussed his departure from A9. Nat Torkington and I talked about environmental and energy policy. Mark Atwood and I argued about the short-term prospects for utility computing. And, there were many more casual conversations on many more topics.

It was good to see old friends and colleagues. Amazon and ex-Amazon.com folks included Russell Dicker, Kim Rachmeler, H.B. Siegel, Shel Kaphan, Udi Manber, DeWitt Clinton, Peter Vosshall, Chris Brown, and Steve Yegge. I finally got a chance to meet Paul Kedrosky, Garrett Camp, Doug Cutting, Bradley Horowitz, Greg Stein, Marti Hearst, Om Malik, Stewart Butterfield, Matt Cutts, Nat Torkington, Artur Bergman, Luis von Ahn, and Don MacAskill face-to-face. It also was good to see Danny Sullivan, Tim O'Reilly, Peter Norvig, John Battelle, Niall Kennedy, Mez Naam, Jed Harris, and Brian Aker again.

The many talks, most of which take the form of a discussion rather than a lecture, were remarkable as well. Sadly, there were often three or four talks in the same time slot I wanted to attend -- so much to see, so little time -- but I was able to attend and enjoy many.

For example, Mez Naam gave a fun, SciFi-like talk on what happens as 3D printers become cheaper, smaller, higher quality, and widely adopted for manufacturing. In the near term, we may see some goods reduced to information -- all you need is the blueprint for what to print to make your very own iPhone -- which could cause serious disruptions in some industries and much confusion for intellectual property laws. In the much more speculative longer term, Mez asked, what might happen if people can create drugs, even pathogens, at their desktop with cheap hardware?

Researchers Marti Hearst, Martin Wattenberg, Fernanda Viegas, Jeffrey Heer talked about data visualization, focusing on demos of Many Eyes and Sense.us. The talk explored how easy data visualization and sharing tools help people collaborate and learn from data. A very cool idea was the ability not only to comment on the graphs, but also draw on the graphs and refer to other graphs, facilitating discussion and exploration. Marti Hearst also briefly discussed tag clouds, ending with the thought-provoking conclusion that tag clouds are intended not as a particularly useful method of conveying and summarizing information, but as a means of socializing among people.

Researcher Andrea Thomaz from the MIT Media Lab showed off videos of Leonardo, a robot designed with gestures that naturally appeal to and are easily interpreted by people.

Researcher Neil Halelamien from CalTech discussed how placing a rapidly fluctuating magnetic field at the back of someone's head can stimulate neurons on the surface of the brain and create some unusual (and temporary) visual effects involving replay of images just seen.

I sat in on a conversation with Stephen Hsu and several other folks working on computer security that came to the rather dismal conclusion that not only can we expect severe, large scale botnet attacks in the near future, but also we can expect a future where most computers have some low level of infection by malware (much like the human body has a continuous, low-level infection by viruses and bacteria). Some of our discussion is similar to what appeared yesterday in the NYT, "When Computers Attack", which quotes one of the Foo campers, Ross Stapleton-Gray, at one point.

There was a discussion of the book Paradox of Choice -- which argues that more choice can make it difficult to take action and that overoptimizing choices makes people unhappy -- led by H.B. Siegel and including Flickr founders Caterina Fake and Stewart Butterfield. The discussion focused mostly on personal experiences, but some interesting meta questions about the economic rationality of optimization -- cost of time for gathering information versus the cost of a (usually only moderately) sub-optimal choice -- were also raised.

Toby Segaran lead a discussion about wisdom of the crowds and hive mind that, at one point, dived into fun questions of whether hive mind communities suffer from tyranny of the majority and end up fracturing at a certain size. Digg came up several times as an example of wisdom of the crowds, tyranny of the majority, and a hive mind that might fracture.

Peter Norvig gave a great version of his talk on the advantages of big data for solving many types of machine learning problems. The machine translation examples are particularly compelling. If you want to check it out, the talk was similar, though not identical, to some of Peter's talks I linked to in an older post.

Finally, I very much enjoyed a session with Erick Wilhelm, Dennis Cramey, and Will Carter talking about location-aware gaming. The basic idea is to have the virtual game world overlap with the real world. Initially, this has taken the form of games where the real world is used for navigation -- moving in the real-world moves you in the virtual world, but the virtual world is otherwise separated from the real-world -- but there was some fun talk about how the worlds could be blended further. What I would really like to see here is a game where you are essentially someone different in the real world (e.g. a secret agent) and interact with others in the game through your device and through the real world (tasks, information drops, puzzles). It would be like the cell phone is your access into a different persona, but that persona exists both in the real and virtual world.

In all, a very interesting and unusual experience. I am still not sure how I managed to get invited, but it was great to get a chance to go.

Wednesday, June 20, 2007

Latest on the Netflix prize

Eight months into the first year, there has been impressive progress on the Netflix prize.

The current leader has a 7.7% improvement. While each step in closing the remaining 2.3% gap to win the $1M prize will be harder and harder, it is remarkable how far the entrants have come.

I have to admit that I have spent a fair amount of time playing with the contest and the Netflix data. It is quite a bit of fun.

Other than some initial attempts at very simple things like predicting averages or modified averages to get a baseline, most of my attempts were either using variants of traditional collaborative filtering (finding similar users) or item-to-item collaborative filtering (finding similar items). I did get modest improvements, but nothing like the performance of the current leaders. I also played with some simple clustering; my results on that were quite poor.

As it turns out, Netflix currently uses a type of item-to-item collaborative filtering for their recommendations, as Netflix's VP of Recommendation Systems Jim Bennet described in a Sept 2006 talk (PDF).

I suspect substantial improvements, then, will require something quite a bit different than what Netflix is doing. This likely rules out item-to-item collaborative filtering. Experimenting with other techniques, I suspect, will be more likely to bear fruit.

I also suspect that additional data may be useful on this problem. While the creators of the Netflix contest apparently did not expect this much progress this fast, it may be the case that it simply is impossible to get the full 10% improvement using the ratings data alone. In my analyses, data simply seemed too sparse in some areas to make any predictions, and supplementing with another data set seemed like the most promising way to fill in the gaps.

It was very fun working on the contest. As much as I would like to keep plugging away, I am starting to feel constrained by my available hardware (4+ year old Linux boxes), and dropping cash on servers just to keep playing seems excessive. Moreover, if I am going to spend more time and money on this kind of thing, it really should be on Findory. Too bad, it is a cool data set.

By the way, I do want to mention one thing about the structure of the contest. I think the problem statement is a little off given the business needs of Netflix. In particular, the contest requires a recommender system which can predict movies you will hate or be lukewarm to, not just the movies you will like.

A system that predicts only what you will like, often referred to as TopN recommendations, is what most businesses want from recommendations. They want to surface interesting products to customers, helping customers discover products they have never seen. In Netflix's case, they mostly should want to help you discover movies to add to your rental queue.

You might think that a system that is the best at the Netflix contest problem would be the best at the TopN problem, but that is not the case. To see that, take a system that perfectly predicts the TopN next items you will want, but makes mistakes when you are lukewarm to a product. The RMSE of that recommender would be poor in the Netflix contest (because of inaccuracies in predicting ~3 star items), despite the obvious value of the recommender.

See also by Joe Weisenthal's recent post on TechDirt, "Netflix Experiment In Outsourced Innovation Showing Good Results".

See also my previous posts ([1] [2] [3] [4]) on the Netflix recommendations contest.

Findory and free APIs

Looking at the recent traffic data for Findory.com, I was surprised to see traffic spiking.

In fact, including all traffic, Findory.com is up to 26M page views per month, about 10 page views per second on average.

That's odd, I thought. Findory's advertising revenue and third party analyses from sites like Alexa both show slow but steady declines at Findory.com, not a traffic spike. What is keeping Findory's web servers so darn busy?

Turns out that the vast majority (in excess of 95%) of these page views are various forms of robots, mostly hitting Findory's free APIs.

Those page views are not people. They generate no revenue directly. They have little to no value to Findory.

In fact, I suspect that most of these API accesses are being used for various forms of weblog spam. For example, I suspect some are accessing Findory content, stripping all the links out, then placing AdSense ads or link farm links next to that content. Ah, spam, wonderful spam.

I never have been particularly idealistic when it comes to APIs. I tend to take a cynical view on the motivations of companies that offer free APIs.

I also have suspected that most people using APIs seek short term profits, not innovation or building something substantial. While it is just one data point, Findory's experience appears to confirm that view.

Tuesday, June 19, 2007

The truth about free APIs

I think Nat Torkington nailed it in his post, "Six Basic Truths of Free APIs".

As Nat says, APIs are not open source software. The data is not free (as in freedom, not as in beer). You depend on ongoing access to an API at your own risk.

See also a Nov 2005 post where I, after talking with Rael Dornfest, said:

I keep hearing people talk about [APIs] as if companies are creating web services because they just dream of setting all their data free.

Sorry, folks, that isn't the reason.

Companies offer web services to get free ideas, exploit free R&D, and discover promising talent.

That's why the APIs are crippled with restrictions like no more than N hits a day, no commercial use, and no uptime or quality guarantees.

[Companies] offer the APIs so people can build clever toys, the best of which the company will grab -- thank you very much -- and develop further on their own.

Yahoo post-Semel and the long road ahead

Of all the coverage of CEO Terry Semel leaving Yahoo, I think Om Malik's words, "Semel exits masks yet another bad quarter at Yahoo", are most worth noting:

Terry Semel-Sue Decker-Jerry Yang drama had us all distracted from the fact that Yahoo is going to report yet another bad quarter.

The company told Wall Street analysts that the June quarter is going to come in towards the lower half of revenue and EBITDA guidance. Ditto for the second half of 2007.

So nothing really has changed.

Despite our fascination with the lives of overpaid celebrity executives, only one thing matters, Yahoo's ability to compete.

Semel's departure may be the first step back to a more promising path, but Yahoo has a long road ahead before it gets back to where it needs to be.

See also my Jan 2007 post, "Yahoo blew it", and my Oct 2006 post, "Yahoo's troubles".

Update: Kevin Kelleher says:

Many people suspect Yahoo is still in trouble, even with its vilified lightning rod -- formerly known as CEO Terry Semel -- has been shown the door.

If this Internet star continues to devolve into an Internet dwarf, it's not just Yahoo who will suffer, it's all of us.

And, Thomas Hawk writes:

Not only [has Yahoo] not executed on a vision of social search, but that they have bungled the communities that they have purchased and actually done more harm to the company than good.

Almost everyone I've talked to about Yahoo has expressed to me that the company is a wreck. That people are unhappy. That executives are leaving. That bureaucracy reigns supreme and that almost nothing can get done in the current environment.

Rejection forced the creation of Google

The are some cute little tidbits about the very early days of Google in "Google: A company born of rejection":

Google co-founder Larry Page just wanted to finish his doctorate.

Page wanted ... to license the PageRank invention and get some royalties while he went back to his academic work. Unfortunately, licensing proved difficult. Only one search engine company made an offer, and it was more of a token offer.

"They (Page and fellow Google co-founder Sergey Brin) got frustrated so they decided to start a company," [Luis Mejia, a senior associate in the Office of Technology Licensing at Stanford University] said.

The Google Milestones page has more details, including this:

Larry and Sergey .... began calling on potential partners who might want to license a search technology better than any then available ... They had little interest in building a company of their own.

Among those they called on was friend and Yahoo! founder David Filo. Filo agreed that their technology was solid, but encouraged Larry and Sergey to grow the service themselves ... "When it's fully developed and scalable," he told them, "let's talk again."

Others were less interested in Google, as it was now known. One portal CEO told them, "As long as we're 80 percent as good as our competitors, that's good enough. Our users don't really care about search."

Rejected, frustrated, but not willing to let a good idea die, Larry and Sergey created Google, Inc.

Sunday, June 10, 2007

Anti-trust complaint over Vista desktop search

Stephen Labaton at the New York Times writes:

Google has accused Microsoft of designing its latest operating system, Vista, to discourage the use of Google’s desktop search program.

Google complained ... that consumers who try to use its search tool for computer hard drives on Vista were frustrated because Vista has a competing desktop search program that cannot be turned off.

Google said that Vista violated Microsoft's 2002 antitrust settlement, which prohibits Microsoft from designing operating systems that limit the choices of consumers.

As much as I like Google Desktop Search, I have a bit of a hard time seeing Google's point of view on this one.

The opportunity for desktop search only existed because the Microsoft WinXP file search has been bizarrely and pitifully slow.

This seems like a bug. Searching the file system is functionality that is core to an operating system, but Microsoft botched the job of doing it well in WinXP.

In Vista, Microsoft corrected the bug. Desktop search works just fine on newer systems. So, the opportunity for desktop search apps now has evaporated.

See also Todd Bishop's discussion of the same article in his post, "Google revealed as source of Windows Vista complaint".

See also my Nov 2006 post, "Is desktop search over?", and my Mar 2005 post, "Desktop search should not exist".

Update: Looks like Microsoft is not going to pick a fight on this one. Ten days later, Joe Wilcox writes that "Microsoft will modify Vista search".

Personalized search at SMX

Tamar Weinberg at Search Engine Roundtable has good notes on the "Personalized Search: Fear or Not?" panel at the recent SMX conference.

Matt Cutts from Google and Tim Mayer from Yahoo were on the panel. From the notes, it looks like it was pretty interesting.

Gord Hotchkiss, who was also on the panel, has a post with his reaction afterwards. It sounds like the audience, who were mostly search engine optimizers, were fairly hostile to personalized search.

That is not surprising. By reducing the winner-takes-all effect, personalized search will make search engine optimization more challenging and subtle.

For more on winner-takes-all and personalized search, see also my previous posts, "SEO and personalized search" and "Combating web spam with personalization".

Less annoying ads using personalization

Cord Blomquist at the Competitive Enterprise Institute cleanly explains the appeal of personalization for advertising:

All of this data is not being used to create an Orwellian dystopia, but an online ad revolution.

The Web's most annoying feature, the ubiquitous banner ad, may be forever changed. Fewer in number and more useful, the ads of tomorrow will hone in on users' real needs and wants.

Future searches will know if a query for "ring" should present ads for engagement rings, Lord of the Rings, or Saturn's rings. Ads that appear alongside searches will become a resource, instead of a nuisance, thanks to more intelligently assessing users' intentions. This will make our back-link powered, dumb search of today seem, well...dumb.

See also my earlier posts, "Google wants to change advertising" and "Is personalized advertising evil?"

Saturday, June 09, 2007

Personalization will not replace search

Yahoo VP Tapan Bhat recently spoke at the Next Web conference and apparently said:

Search is no longer the dominant paradigm.

The future of the web is about personalisation. Where search was dominant, now the web is about 'me.' It's about weaving the web together in a way that is smart and personalised for the user.

Personalization can help people discover information they would not find on their own, but it is important not to overstate the impact.

Especially in the context Tapan is discussing -- Tapan leads the team running the Yahoo home page -- discovery is important. People need help navigating Yahoo and finding new content on Yahoo. A Yahoo home page that learns from what you do on Yahoo and helps you get what you need faster would be helpful.

But personalization does not replace search. For people who know what they want, the best thing we can do is get out of the way. When people are actively and explicitly searching, when they are on a mission, it is not the time to distract them.

At Amazon, the majority of people came to the Amazon.com home page, then searched. For those people, we mostly got out of their way, showing them search results and some helpful other information strongly related to their search. However, another group of people came to Amazon without such a sense of purpose. For these people, the personalization was key, helping tailor the home page to focus their attention on a selection of Amazon's massive catalog based on their past interests, a view into Amazon created just for them.

Even if search remains dominant, even if we mostly want to get out of the way when people actively search, personalization can still help. Many times, searchers cannot find what they want when they search. They need help expressing their intent. The search engine needs additional information to understand their intent. By learning from what each person and others have done and found, personalization can help better understand intent and help people get what they need faster.

But, in any case, personalization does not mean search is going away. People often know what they want and want it now. In those cases, we should give it to them. Search is and will remain dominant.

On a somewhat different topic, as the Times UK article describes, many interpreted Tapan's words as Yahoo giving up on core search. I do not have much to add to that, but I do want to point out that this is not the first time a high level Yahoo executive has demonstrated a lack of competitive fire for core search.

Sunday, June 03, 2007

The perils of tweaking Google by hand

An article in the NYT business section today by Saul Hansell, "Google Keeps Tweaking Its Search Engine", has intriguing details on Google's ranking algorithm from discussions with Googlers Amit Singhal and Udi Manber.

Some excerpts:

When it comes to the search engine — which has many thousands of interlocking equations — [Google] has to double-check the engineers' independent work with objective, quantitative rigor to ensure that new formulas don't do more harm than good.

Recently, a search for "French Revolution' returned too many sites about the recent French presidential election campaign -- in which candidates opined on various policy revolutions -- rather than the ouster of King Louis XVI. A search-engine tweak gave more weight to pages with phrases like "French Revolution" rather than pages that simply had both words.

Typing the phrase "teak patio Palo Alto" didn't return a local store called the Teak Patio .... Mr. Singhal's group [wrote] a new mathematical formula to handle queries for hometown shops.

Is it better to provide new information or to display pages that have stood the test of time and are more likely to be of higher quality? Until now, Google has preferred pages old enough to attract others to link to them ... [Singhal's] team's solution: a mathematical model that tries to determine when users want new information and when they don't. (And yes, like all Google initiatives, it had a name: QDF, for "query deserves freshness.")

I found this surprising. Google manually comes up with tweaks to its search engine that only apply to a small percentage of queries, tests the tweaks, and then tosses them into the relevance rank?

The problem with these manual tweaks is that they rapidly become unwieldy. As you add hundreds or thousands of these hand-coded rules, they start to interact in unpredictable ways. When evaluating a new rule, it becomes unclear if performance of that rule might be improved by tweaks to the rule, tweaks to other rules, or removing other rules that have now been subsumed.

It appears Google has hit this problem head on. The "many thousands of interlocking equations" require a "closely guarded internal [program] called Debug" that attempts to explain whether the rules are doing "more harm than good."

Frankly, I thought Google was beyond this. Rather than piling hack upon hack, I thought Google's relevance rank was a giant, self-optimizing system, constantly learning and testing to determine automatically what works best.

How would this work? With a search engine the size of Google's, every search query can be treated as an experiment, every interaction as an opportunity to learn and adapt.

In each query, a few of the results would be different each time. Each time, the search engine is making a prediction on the impact (usually an anticipated slight negative impact) of making this change. Wrong predictions are surprises, opportunities to learn, and are grouped with other wrong predictions until the engine can generalize and attempt a broader tweak to the algorithm. Those broader tweaks are automatically tested, integrated if they work, and the cycle repeats.

For example, on the query [teak patio palo alto], experiments may show the Teak Patio store in Palo alto is getting unexpectedly high clickthrough. Another query, for [garden stone seattle], is showing similar problems in experiments. In clustering, both queries are classified as local. Both show modest purchase intent. The clickthrough urls have been classified as local businesses. On the known data, the most general rule based on this result, one boosting local businesses when a query is classified as local with purchase intent, appears to give a lift. The rule is tested live on a percentage of users, results match predictions, and the system adds the new rule for all users. The process repeats.

Similarly, a group of queries that match known names (e.g. celebrities) may show that links classified as sites are appearing too low in rankings. In automated testing, a general version of this rule performs poorly unless another rule with strong overlap is removed; a more specific rule performs well but applies less frequently. The older rule is removed, the newer general rule put in place.

While this process does result in specific tweaks to the engine like the manual approach, it does not rely on someone manually finding the rule. Unexpected tweaks to relevance rank may arise from the data. Moreover, a self-optimizing relevance rank does not rely on someone manually coming back to rules to maintain or debug them over time.

This approach would require massive computational power -- a huge infrastructure classifying and clustering queries and urls into hierarchies, a framework for testing billions of changes and tweaks simultaneously and generalizing from the results -- but I thought Google had that power already.

Perhaps this merely shows how much further there is to go in search. As Larry Page recently said, "We're probably only 5 per cent of the way there."

Update: Five weeks later, there are a few more tidbits about Google's experiments and their tweaking of their relevance rank in an interview of Udi Manber by Eric Enge. Some excerpts:

We run literally thousands of experiments a year and pick the ones that score well.

We have projects that their sole purpose is to reduce complexity. A team may go and work for two months on a new simpler sub-algorithm. If it performs the same as the previous algorithm, but it's simpler, that will be a big win.

Overall, we have to be very careful that the complexity of the algorithm does not exceed what we can maintain.

Geeking with Greg