Monday, June 29, 2009

New Google study on speed in search results

Googler Jake Brutlag recently published a short study, "Speed Matters for Google Web Search" (PDF), which looked at how important it is to deliver and render search result pages quickly.

Specifically, Jake added very small delays (100-400ms) to the time to serve and render Google search results. He observed that even these tiny delays, which are low enough to be difficult for users to perceive, resulted in measurable drops in searches per user (declines of -0.2% to -0.6%).

Please see also my Nov 2006 post, "Marissa Mayer at Web 2.0", which summarizes a claim by Googler Marissa Mayer that Google saw a 20% drop in revenue from an accidentally introduced 500ms delay.

Update: To add to the Marissa Mayer report above, Drupal's Dries Buytaert summarized the results of a few A/B tests at Amazon, Google, and Yahoo on the impact of speed on user satisfaction. As Dries says, "Long story short: even the smallest delay kills user satisfaction."

Update: In the comments, people are asking why the effect in this study oddly appears to be an order of magnitude lower than the effects seen in previous tests. Good question there.

Update: By the way, this study is part of a broader suite of tools and tutorials Google has gathered as part of an effort to "make the web faster".

14 comments:

Anonymous said...

So if I understand correctly, the idea that the 500ms delay in 2006 was the cause of the 20% drop in revenue (proportional to further use and searches, I guess) was completely dis-proofed by actual, rigorous testing.

As I read these numbers, speed does not matter at all (even almost half a second of delay caused an almost negligible drop in use)

According to these number, we are much better of looking for other factors like usability to improve our sites.

Greg Linden said...

No, I wouldn't conclude that.

The 2006 data from Google was also the result of a rigorous A/B test. I wouldn't say this new test disqualifies the old.

I also wouldn't agree that they are in conflict. They measured different things with different delays in different places.

One measured revenue drops with a longer delay that may have occurred after the search results appeared and before the advertising rendered.

The other measured usage drops with shorter delays that occurred before any content on the page rendered.

I would take this second Google study as another data point, like you should the Amazon studies and Yahoo studies that also showed the impacts of delays on user satisfaction.

Anonymous said...

Thanks for the Dries Buytaert reference.

I just don't see it. The Marrissa Mayer data is now a meme (20%-30%/0.1 sec, see http://twitter.com/chabotc/status/2401905834 for example) but unlike you, I never saw real data.

All of the other data, including yours (1% drop in sales / 0.1s delay, according to the Dries post), give much smaller effects.

"Speed matters" is obviously true, but what is the real quantitative meaning behind the slogan? 10s of % effects/0.1sec is really different than 1% or 0.1%/0.1s and leads to different priorities.

Unknown said...

Thanks for the link! I noticed that the link had a typo (beter instead of better) so I just corrected that. If possible, please update your link. Thanks!

Greg Linden said...

It's a good point, Ewout, that the data has wide variation.

In part, I think this is because the experiments differed in how much and where they did the delay as well as what metric they used to determine impact.

For example, if the search results appeared significantly before the ads appeared, you might expect to see a fraction of people click on the search results before the ads even display.

In the end, how much speed impacts perceived quality in a particular application probably depends on the usage and interface design. What these studies indicate is that we probably should be concerned about speed. How much we should be concerned likely is something that would have to be determined for each application.

By the way, you mentioned that you hadn't seen Marissa give that +500ms leads to a -20% revenue drop data point? If you want, you can see it in her "Scaling Google for the Everyday User" talk at the Seattle Scalability Conference in 2007 (see around 13:00 in the video).

jeremy said...

By the way, you mentioned that you hadn't seen Marissa give that +500ms leads to a -20% revenue drop data point?

Greg,

In your cited Nov 2006 post, you wrote: "Traffic and revenue from Google searchers in the experimental group dropped by 20%." Not just revenue, but traffic. Which I assume is pretty much a 1:1 correspondence with number of searches issued.

So I'm kinda with Ewout, in scratching my head in a little bit of confusion over this one. Let's see..

200 ms delay = 0.2% traffic drop (2009 study)
400 ms delay = 0.6% traffic drop (2009 study)
500 ms delay = 20.0% traffic drop (2006 study)

I'd be hard pressed to fit any sort of regression line to that data. Does that additional 100 ms, when going from a 400ms to 500ms delay, really cause an additional 19.4% drop in usage?

Greg Linden said...

That's a good point, Jeremy. I don't have a good explanation for this inconsistency since my previous explanation breaks down if the -20% drop was also to searches.

Perhaps might be worth writing Jake Brutlag to get his thoughts?

jeremy said...

Sure, we can write Jake. His email isn't listen in the paper -- do you have it?

Another though, while I'm on it: Let's even say that these numbers are correct. Let's assume that the 400ms delay causes a 0.6% drop in the number of searches performed, and that drop is statistically significant. (I don't see significance values reported in the paper, but.. whatever. Let's assume that it is significant.)

Translated into real numbers, that means that for every 500 queries that a user used to do, he or she now does 497? Or, given the statistic that the average searcher does 4 searches per day.. that means that once every 125 days, the user does one less search? That's once every 4.2 months, or approximately three times a year that the user does one less search, than they otherwise would have. ONE.

I can see how, maybe, from the corporation's perspective, all those little tiny differences add up to a significant difference in revenue. Especially when you aggregate over hundreds of millions of users.

But I'm looking at this from the perspective of the user. And I am having a hard time seeing how this really affects the user, at all. If once every four months I feel so inclined to not perform a search that I otherwise would have, I don't think that I would ever even notice that, or in any way be inconvenienced by that fact.

I don't get it.

And so it seems to me that a much better use of corporate resources would be to make sure that a larger percentage of the searches that a user does succeeds. What is the statistic.. something like 50% of all searches end in failure? If you could lower that failure rate to, say, 45%, then the end user would be much happier, than if you lower delay from 400ms to 100ms.

Greg Linden said...

It looks like Jake has published his e-mail on his home page. It is jakeb[at]google.com.

I get what you and Ewout are saying, that this study not only appears to be inconsistent with previous reports on the magnitude of the effect, but wildly inconsistent. It would be interesting to get Jake's thoughts on that if he is willing to share them.

Greg Linden said...

In a separate e-mail, the author of the paper, Jake Brutlag, suggests that the difference is likely due to confounding factors such as additional client side latency when adding 30 results.

Jake also pointed to Marissa Mayer's Velocity 09 talk, where (starting around 5:00) Marissa explicitly discusses these different results, though she doesn't go into why they differ by such a large amount.

Given the magnitude of the difference, it is hard not to still have questions about these results, but perhaps that helps a little.

Shirish said...

A talk by Eric and Jake @ Velocity, 09 -- http://blip.tv/file/2279751

fnthawar said...

Don't forget, correlation does not equal causation. The original Marissa Mayer study was trying out 20 search result links vs. 10. It also happened to be a slower loading page.

Just because user's didn't like 20 links AND a slower page, doesn't mean the slowness was the only cause of less usage.

randall said...

I bet the 20% drop in revenue was mostly because of fewer *ad clicks*, not fewer *searches*. Going to the next page of Google results brings up more ads. Scrolling down the page doesn't.

Unknown said...

The experiment Marissa Mayer referenced changed the default number of search results on the page. This is a different mechanism of increasing latency than injecting server-side delay (server-side delay underlies the results described in this blog post). Furthermore, as fnthawar comments, it is not clear one can untangle the user response to additional results vs. additional latency to generate those results.