Wednesday, March 30, 2011

Latest reading

Here are a few of the articles that have caught my attention recently:
  • Google tries again in social, this time focusing on the common trend of having sharing buttons distributed across the Web. ([1])

  • Good thoughts on building robust websites. I particularly like the assumption that off and slow are perfectly valid states in underlying services and should be expected by all other services. ([1])

  • A Yahoo data center based on chicken coop designs (along with relaxing assumptions about maximum tolerable peak temperature) yields big energy savings. ([1])

  • Cool paper from Googlers coming out later this year at VLDB that talks about how useful direction searches (like on Google Maps) are for finding places of interest and places people might want to go. Similar to the GPS trail data work (using data from GPS-enabled cell phones), but using search logs instead. ([1])

  • I really like this vision for what Google should do for social, that they should go after personal search. This is closer to the idea of an external memory, Memex, and the promise of the languishing Google Desktop than what Facebook is today, but it is a problem many people have, want solved, and one that Google is better able to solve than anyone else (except maybe Microsoft). ([1])

  • Surprising results in this paper out of Yahoo Research showing queries that tend to be unique to particular demographic groups (like men/women, racial groups, age groups, etc.). Jump right to Table 3 (about halfway through the paper) to see it. ([1]

  • Over at blog@CACM, I outrageously claim that both netbooks and tablets are doomed. ([1] [2])

  • Randall Munroe (author of xkcd) is brilliant. Again. ([1] [2])

  • Sounds like YouTube wants to build millions of TV channels designed to match any mood, interest, or person. ([1])

Wednesday, March 09, 2011

Personal navigation and re-finding

Jaime Teevan, Dan Liebling, and Gayathri Geetha from Microsoft Research had a fun paper at WSDM 2011, "Understanding and Predicting Personal Navigation", that focuses on a simple, highly accurate, easy, and low risk approach to personalization, increasing the rank of a result that a person keeps clicking on.

The basic idea is noticing that people tend to use search engines instead of bookmarks, just searching again to re-find what they found in the past. But -- and this is the key insight -- not everyone uses the same query to bookmark the same page, so, for example, one person might use [lottery] to get to the Michigan lottery, another to get to the Illinois lottery, and only a minority use it to get to the top ranked result, lottery.com.

So, keeping track of what individual searchers want when they repeat queries, then giving each searcher back what they want is an easy form of personalization that can actually make a significant difference. Moreover, supporting this kind of re-finding is a baby step toward fully personalized search results (and requires the same first steps to build the underlying infrastructure to support it).

Some excerpts from the paper:
This paper presents an algorithm that predicts with very high accuracy which Web search result a user will click for one sixth of all Web queries. Prediction is done via a straightforward form of personalization that takes advantage of the fact that people often use search engines to re-find previously viewed resources.

Different people often use the same queries to navigate to different resources. This is true even for queries comprised of unambiguous company names or URLs and typically thought of as navigational.

For example, the reader of this paper may use a search engine to navigate to the WSDM 2011 homepage via the query [wsdm], while a person interested in country music in the Midwest may use the same query to navigate to the WSDM-FM radio station homepage. Others may ... issue it with an informational intent to learn more about Web Services Distributed Management .... [Likewise], on the surface it appears obvious that the query [real estate.com] is intended to navigate to the site http://www.realestate.com. However, for only five of the 23 times that query is used for personal navigation does the query lead to a click on the obvious target. Instead, it is much more likely to be used to navigate to http://realestate.msn.com or http://www.realtor.com.

Personal navigation presents a real opportunity for search engines to take a first step into safe, low-risk Web search personalization ... Here we look at how to capture the low-hanging fruit of personalizing results for repeat queries ... There is the potential to significantly benefit users with the identification of these queries, as the identified targets are more likely to be ranked low in the result list than typical clicked search results.
Table 4 in the paper definitely is worth a look. Note that, using a month of data, nearly 10% of queries are personal navigation queries that can be personalized with high accuracy. In addition, on another 5% of queries "when the general navigation queries trigger as personal navigation", "the prediction is over 20% more accurate than when predicted based on aggregate behavior alone." That's a big impact for such a simple step toward personalization, low-hanging fruit indeed.

Please see also my older posts, "Designing search for re-finding", "To personalize or not to personalize", and "People often repeat web searches", about papers by some of the same authors.