Tuesday, June 29, 2004

New web servers for Findory

Findory.com moved to new web servers this evening. Seems to be humming along nicely on the new boxes.

Monday, June 28, 2004

Safari and RSS

Scripting News is one of many sites reporting that Safari will integrate an RSS reader in the next major version. Interesting that Safari will aggregate all articles in reverse chronological order regardless of the source RSS feed. Getting closer to the Blogory idea of emphasizing articles over feeds. Dave Winer also speculates on a bigger Microsoft, Yahoo, and Google battle over blogging tools and readers.

Challenges in web search engines

Another interesting paper out of Google focusing on some of important challenges in relevance ranking. Particular attention paid to search engine spamming and determining the quality of a web page. John Battelle has a nice summary of the paper.

Gates on MSN Search

Bill Gates reveals some interesting tidbits about the timeframe for launching the new MSN Search in this ZDNet article:
    In July [2004], the format of the site will change -- and so will the quality of what you get -- and the way it'll look is dramatically improved. It'll be later this year that we actually roll out what's entirely our own back end driving the search.
Gates also briefly mentions plans to apply natural language understanding and personalization techniques to deliver better search results.

Localized advertising

Interesting article on Search Engine Watch about the localized advertising engines used by Google and Overture.

Thursday, June 24, 2004

goZone "personalized" search

goZone claims to offer "extreme personalized search", a phrase that strikes me as more amusing than meaningful. They appear to have tried to combine A9's search history with Google's profile-based personalized search. I wasn't impressed when I tried it out, but your experience may differ.

Curious how many people are trying to hop on the personalized search bandwagon. But I haven't seen a single compelling implementation yet. Doing personalized search right isn't easy.

CMP blocks Google News

CyberJournalist reports that CMP Media is blocking all links from Google News. This is an even dumber business decision than mandatory registration. But both drive customers away for no obvious gain.

Update: PoynterOnline and Micropersuasion also comment on CMP's move.

Update: CyberJournalist was contacted by CMP. They claim the blocking of Google was accidental. "For the record, no referral from any Google site should see any blocking page at all, ever," said Mike Azzara, CMP's vice president in charge of Internet business. Good news.

Wednesday, June 23, 2004

Technology.updates.com

Dave Winer points to Technology.Updates.com. It appears to be part of CNet (and perhaps related to CNet's news aggregator).

In addition to having lots of RSS feeds on some nice, specific tech topics, I found it interesting that Technology Updates mixes news from mainstream news sources (newspapers and magazines) and weblogs. Because the quality and characteristics of blog articles differ so much from mainstream sources, aggregating them all together can be a challenge.

And I'm not sure how well it's working out for them. The Search Engine section, for example, drifts from interesting articles about competition with GMail to the fried chicken recipe from Google's chef. Appears they have severe problems with duplicate stories too.

In general, the problem with news aggregators comes back to relevance rank. Which of those 10,000 stories are you going to show me today? Aggregate all you want, but if you don't get me the news I need in the top ten stories, forget about it.

The problem in a nutshell

Captured perfectly in this quote:
    We are drowning in information but are starving for knowledge.

Bezos on A9

John Battelle gives a teaser from an interview he had with Jeff Bezos. The teaser has yet more hints that A9 intends to pursue personalized search:
    ... As I was pushing to understand Amazon's long-term interest in A9 ... [Bezos used] the term "discovery" as an umbrella term which incorporates search ... What's discovery? ... It's search plus what happens when the network finds things for *you* - based on what it knows of you, your actions, and your inferred intent. Inferred intent? How might the network be smart enough to do that? Ay, there's the rub.

Tuesday, June 22, 2004

Peter Norvig at the Google Blog

Peter Norvig has the latest post on the Google Blog. For us geeks, I think he sums up the temptation of working at Google pretty well in his closing line:
    Now if you'll excuse me, I have to get back to work -- I have some ideas that can only be tackled with a few terabytes of text and a few thousand computers.
One of my earlier posts has a link to a paper that describes Google's yummy cluster. Yes, Peter, we'd all love to get our hands on it.

Google Web Office?

Phillip Greenspun is one of several people who speculates that Google will need to build a web-based Office product to compete with Microsoft.

Seems unlikely to me. What expertise does Google have in an office product? This is way outside of their core competencies. Nothing do to with information retrieval, very little innovation.

So how can Google grow? How can it support it's lofty $30-50B valuation? Google needs to focus on its strengths, innovative new products to enter new markets and a web advertising revolution to drive revenue.

Monday, June 21, 2004

Amazon switches to A9 for web search

John Battelle noticed that Amazon recently switched its web search to A9 from Google.

Sunday, June 20, 2004

The Google PC

Jon Udell has a column in Infoworld on the Google PC.
    Imagine that Google, rather than Microsoft, controlled the desktop. Job No. 1 for the Google PC would be to vacuum up all available sources of data. Job No. 2 would be to exploit that data to the hilt.
So, what does he mean by vacuuming up all sources of data?
    On the Google PC, you wouldn’t need third-party add-ons to index and search your local files, e-mail, and instant messages. It would just happen. The voracious spider wouldn’t stop there, though. The next piece of low-hanging fruit would be the Web pages you visit. These too would be stored, indexed, and made searchable. More ambitiously, the spider would record all your screen activity along with the underlying event streams.
And what does he mean by exploiting that data? In addition to easy access to any data you previously accessed or used on a task, Jon suggests that the Google PC would do personalized information filtering:
    Bayesian categorization: My SpamBayes-enhanced e-mail program learns continuously about what I do and don’t find interesting, and helps me organize messages accordingly. A systemwide agent that’s always building categorized views of all your content would be a great way to burn idle CPU cycles.

Saturday, June 19, 2004

Google's "site-flavored" search

Google Labs is now offering something they call a "site-flavored" search. It is a search that is biased or filtered to specific subject categories. For example, you can specify that any searches done through a Google search box that you put on your site should be bias toward pages that are in the "Computer/Internet" category.

It appears to be an application of the same technology used for Google's personalized search. In fact, the HTML code you use to put the site-flavored search box on your site contains a reference Kaltix in an image called google_kaltix_site_flavored_searchbox.gif. Google acquired Kaltix year ago. The technology behind Google's personalized search was developed by Kaltix. A paper published by some of the people who founded Kaltix gives the details of their personalization algorithm.

Update: An excellent ResearchBuzz article covers Google's new feature in detail.

Friday, June 18, 2004

Google's viral marketing of GMail

GMail's viral marketing campaign has been very well executed. It was a simple technique. Start with a small user base (many of which were Google employees), then repeatedly give out small numbers of invites to the existing GMail users for friends and colleagues. The pool of users grows exponentially, but in a controllable way, and every user feels like their few invites are scarce and precious.

I has assumed that they were restricting supply of the accounts because it was a beta test, but there's more here. The artificial shortage of supply creating such demand and buzz that people have been paying to get invites. People who have a GMail account feel exclusive, elite, like part of a club, all because of the shortage of supply. Now everyone wants a GMail account.

Find.com

A new web search engine, Find.com, appears to have launched today. Seems to have clustering like Vivisimo and a history of your previous searches like A9. It appears to be a metasearch, using other search engines like MSN and Teoma and aggregating the results. The quality didn't appear to be particularly high; this is a common problem with metasearch, as I described in an earlier post. From the About page of Find.com, they claim to be doing personalized search as well, but I haven't noticed anything that appears to be personalized when I used the site.

Thursday, June 17, 2004

Blinkx

John Battelle points to a review of an intriguing new tool called Blinkx.
    BlinkX is all about contextual search. Say you are reading through a big Microsoft Word document, on I don’t know European Union policies on data transfer, the BlinkX bar at the top of the page, will retrieve relevant news item links with brief summaries (only visible when a mouse moves over the link) and other important links. It can do the same for a web page you are reading. For instance, if you were reading my piece on Cisco buying Procket, you would get links to all relevant news articles on the web, and links to Cisco and Procket homepages. However, the fun begins when you open the client software (which sits in the system tray.) It has a simple entry window. Lets say you put Napa and Sonoma County. It searches and brings back the web for news, Amazon for books, websites of relevance, e-commerce links and but more importantly any documents, emails etc related to that subject on your desktop.
Very interesting idea if it actually works. A desktop search, hidden web search, and context-based personalized search, all in one. Perhaps the future of desktop search?

Update: Emergic.org posts about a Linux tool called Dashboard that
    constantly combs through your e-mail, calendar, address book, word-processing, and browser programs and brings together information related to your current tasks before you even know you want it. Say you're reading an e-mail from a collaborator on a project. Dashboard automatically shows the person's contact information, her last five e-mails, and your upcoming appointments with her.
And Microsoft has announced a similar effort called Implicit Query.

Why can't a newspaper be more like a blog?

Excellent six part series (Part I, II, III, IV, V, Conclusion) by Barry Parr on MediaSavvy called "Why can't a newspaper be more like a blog?"

Great points on building community, providing discussion forums, access to archives, and providing RSS feeds. Newspapers have only begun to use the web as a channel for attracting readers. There's a lot that could be done to improve the experience, increasing traffic and revenue.

On RSS feeds, see also my earlier post on how news sources get value from providing RSS feeds.

Introduction to targeted advertising

An AP article provides a simple introduction to targeted advertising. It underemphasizes targeting to the page content and overemphasizes privacy issues, but still a good introduction to the technique. I would have liked to see the article contrast this approach, which has the potential to make advertising useful and relevant, with the more common technique of exposing all your customers indiscriminately to annoying, intrusive, and mostly irrelevant flashy graphical advertising.

See also my earlier article on Google AdSense how it could drive a targeted advertising revolution.

eBay's search for sellers

Is eBay having a hard time attracting enough new sellers to support its growth?
    Finding new sellers and keeping existing ones happy has gotten tougher lately. For one thing, new rivals are coming on strong, especially as eBay's own merchants have steadily been offering more and more new merchandise. For more than four years, Amazon.com (AMZN ) has been welcoming brand-name merchants to sell on its site. It has also teamed with used-book, music, and DVD sellers. And search engine Google can send customers directly to a merchant's Web site via cost-effective targeted ads.

    The rivals are taking a toll: The online print store Art.com, for one, once listed 8,000 posters a week on eBay -- but not anymore. Now, founder and Chief Strategy Officer Michael Marston says paying Google several million dollars a year for leads, in addition to selling on Amazon, results in better sales than on eBay.

    Even more worrisome, some top merchants among eBay's 430,000 mom-and-pop sellers -- which still account for more than 95% of that $24 billion in gross sales -- are hitting the wall. Scot Wingo, CEO of ChannelAdvisor, which sells e-commerce-management services to large eBay sellers, sees up to 20 of them implode every month as they fail to adapt to changing technology or new competition or simply can't keep up with growth. Laments Wingo: "You see these nice couples from Florida at eBay Live, and you know they're going to be out of business by next year."

Wednesday, June 16, 2004

Web newspaper registration stirs debate

CNN reports on the annoying registration requirements of many online newspapers and sites like BugMeNot.com that seek to foil them:
    Imagine if a trip to the corner newsstand required handing over your name, address, age, and income to the cashier before you could pick up the daily newspaper. That's close to the experience of many online readers, who must complete registration forms with various kinds of personal data before seeing their virtual newspaper.
Marketing folks will argue that this kind of registration data is valuable for advertisers and to better understand their readers, but I doubt they understand the costs of these kinds of hurdles. Throwing up registration requirements will reduce traffic -- some people just won't bother with it -- and losing traffic means lost advertising dollars.

What I would recommend is voluntary registration and voluntary user surveys to gather the same data on a sample of your audience. For advertising, target the ads to the content of the page, like Google AdSense. If you want to get tricky, start tracking individual behavior -- articles read and advertising viewed -- to personalize the ads to each reader. With these techniques, you'll have the data you need to understand your readers and be able to have effective, targeted advertising programs.

There's really no need for these mandatory registration forms. It shows the lack of imagination and poor grasp of technology of the marketing organizations at these firms.

Will somebody please fix TiVo Suggestions?

Jason Kottke has some excellent ideas for TiVo, including this one on TiVo Suggestions:
    The current recommendations suck, especially if you consider the massive amounts of data that TiVo gathers on their users' viewing habits. They can do better than a list of 50 shows that are vaguely related to ones you may have watched before. Take a page from Amazon's book. When a user views a particular show's details, offer a short list of similar shows ("people who watched this show also watched..."). Break them down by category into recommendations for sports, for movies, for whatever. Along with the collaboratively filtered recommendations, TiVo should publish lists of new and notable shows, categorized appropriately.
Jason has a great point. Why doesn't Tivo have "people who watched X also watched Y"? And why are TiVo Suggestions so pitifully bad?

Seems like a bizarre business decision to me. Not only would excellent recommendations increase the value of TiVo and improve word of mouth for the service, but also it would raise switching costs when people consider moving to competing services. Why doesn't TiVo put more effort into their recommendations?

Tuesday, June 15, 2004

Findory's got a new search engine

A new keyword search engine under the hood for both Findory News and Findory Blogory. Should provide substantially faster keyword searches.

Where is MSN going?

Jeff Bezos frequently says that he wants Amazon "to be the place to find, discover, and buy anything you want online."

The head of MSN says that he wants MSN to be the place where customers can "find, discover and experience whatever they want online."

I normally take vision statements as the garbage they usually are, but I find this parallel to be intriguing. The focus on "discovery" is particular interesting. Personalization, particularly Amazon.com-style personalization, seems to be a major focus for MSN.

Backdoor program gets backdoored

Amusing article on SecurityFocus. I love this quote: "That's the moral. You can't even trust Windows malware."

Monday, June 14, 2004

Coverage of Findory Blogory

The beta test of Findory Blogory has been getting some attention on several popular weblogs, including mentions on Gary Price's ResourceShelf (and Search Engine Journal), Dave Winer's Scripting News, John Battelle's SearchBlog, Steve Rubel's Micro Persuasion, and a longer article on Search Engine Watch.

Sunday, June 13, 2004

AdSense and the publishing revolution

The New York Times has a fantastic article on the impact of Google AdSense:
    The crucial point is that the blogger reaches those potential advertisers without having to hire a sales staff, prepare media kits or invest scarce time and money.

    Why does that matter? It completes the publishing revolution brought on by the Internet. The first stage was the liberation of the reader, who, thanks to browsers, could look at publications in any part of the world. Next was the liberation of would-be publishers. Thanks to blogging tools, anyone can present his or her views online. And now, thanks to automated ad sales, small publishers have a more viable hope of creating a business, and keeping independent voices, than they did even a year ago. A. J. Liebling's wisecrack that "freedom of the press is guaranteed only to those who own one" takes on new meaning when technical and financial barriers to creating a Web-based press drop so low.
It's not discussed in this article, but it's also important to note that Google AdSense is probably the key to Google's future revenue growth.

Friday, June 11, 2004

Findory Blogory

Findory.com is beta testing Findory Blogory, a personalized weblog reader. Instead of manually hunting down weblogs that interest you, Blogory watches what weblog articles you read and recommends other articles that might interest you. No need to read weblogs individually or mess with XML feed URLs. Findory Blogory finds the blog articles for you. There's nothing else like it! Try it out!

Thursday, June 10, 2004

Microsoft Research coverage

This month's Scientific American has a brief overview of Microsoft Research, including some of its history. In reading the article, I kept thinking about the contrast between the more traditional way Microsoft structured its research group, mostly isolating it from product development, verses the way Google integrates their PhDs into their development teams, essentially making the entire organization one big research lab.

Another article has tidbits from Microsoft Research's recent Cambridge Science Open Day event, including brief coverage of some techniques for reducing the impact of "search engine spam" and a method for extracting a natural language text summary of a cluster of news articles.

Tuesday, June 08, 2004

Yahoo finally testing a new home page

Yahoo is testing a home page redesign. They've apparently gone two years with the current cluttered version.

If there's ever a site that cries out for personalized navigation, it's the Yahoo home page. They've got hundreds of links vying for your attention. I'd love to seem them try something like this:
  • Search at the top for easy access
  • A section with links to generally popular features
  • A section with links to features I use the most
  • A section intended to introduce me to areas I might be interested in but don't seem to know about (analyze what I use, figure out what I might also want to use, and market it to me, but market something else quickly if I don't respond)
  • Featured content from My Yahoo
Different people need different things from Yahoo. Instead of trying to cram everything on the home page, why not emphasize what is most likely to be useful to me?

Should every site be a blog?

Dana Blankenhorn argues that the home page of every corporate and political web site should be a blog.
    What people most want in pages they bookmark is dynamic content. They want to know that each time they hit the page there will be something new to see.
Is this true? It depends on the goal of home page. If you want and expect people to spend time reading content on your home page, then putting articles (perhaps even blog articles) directly on the home page might make sense.

But, for most substantial corporate web sites, the home page has two goals: (1) Tell people what is available on the site. (2) Advertise new products.

The home page should provide navigation links into the rest of your content. Many people come to the site on a mission, looking for something specific. The site should make it easy for them to find what they want.

The home page is an opportunity for marketing. There's an advertising opportunity to introduce customers to new products and services. For new customers, all the products and services are new. For existing customers, only products they haven't used or seen before are new.

If the goals of the home page is are easy navigation and internal advertising, it's still possible for the home page to be dynamic. Personalization can emphasize the most useful navigation links and most relevant and interesting new products and services based on the individual interests of the customers. This is particular important on large corporate sites that easily become cluttered (e.g. Yahoo) if they try to provide links to everything.

Monday, June 07, 2004

Smarter than the CEO

An article in this month's Wired magazine correctly points out that teams often make better decisions than individuals. The article starts by criticizing the superstar CEO culture and command-and-control management structures, then quickly gets into the idea that no single person can consistently make great decisions alone. A (well functioning) team can outperform any individual by using all the talent, knowledge, and experience in the group to make better, more informed decisions.

But I'm not sure I agree with the article's claim that "internal decision markets" are a promising way to optimize decision making. Rather, I'd argue that the key is team culture and structure. In particular, ideas should be openly discussed, debated, and challenged. Responsibility and authority should be pushed down, decentralized, and widely distributed. The people closest to the problem should be allowed to make informed decisions while drawing on the advice, experience, and creativity of the rest of the team. Business metrics on team performance should be open and shared, so everyone can easily understand the impact of their actions. Mistakes should be acknowledged and well understood, not to assign blame, but to learn and improve. And rewards should emphasize team performance over individual performance.

Vivisimo ranks second

Vivisimo, a clustering search engine, ranked a surprising second (tied with Yahoo) in an About.com focus group study of different search engines.

Sunday, June 06, 2004

The secret weapon: An army of PhD's

The NYT Digital Domain column today points out an important source of competitive advantage for Google, Google's hordes of PhD's.

But the article unemphasizes an important point. You can't build a culture of innovation by just hiring a bunch of bright people. You need to allow them to do great things. Many companies hire top talent, but then isolate them in research group in a corner of the company, away from the product teams. In a misguided effort to minimize risk, innovation is crippled.

At Google, the PhD's are integrated into the development teams. This makes the culture look like one big research lab, focused on innovation. People have the autonomy, authority, and responsibility necessary to get things done. Google doesn't just hire great people; it lets great people do great things.

Saturday, June 05, 2004

What is personalization?

Personalization is hot these days, widely seen ([1] [2] [3] [4]) as a key battleground in the battle between Microsoft, Yahoo, and Google. But what is personalization?

I'd define web personalization as delivering different and unique content to each individual customer based on the customer's interests. It's your own version of the web site, a site just for you.

To distinguish personalization and customization, I would argue that personalization uses implicit interests and customization uses explicit interests. Personalization learns what you like from your actions; you are what you click on and what you buy. Customization requires you to explicitly specify what you want; you are what you say you are. My Yahoo is an example of customization. You tell the site what you want. Amazon.com is an example of personalization. The site learns your interests and adapts.

In a world with a glut of information, personalization offers a way to find focus. It doesn't waste your time showing you what everyone else sees. It learns what you like, shows you what you want to see, and filters out the rest. That's personalization.

Thursday, June 03, 2004

Interview with Gary Flake from Yahoo Research Labs

Great interview with Gary Flake, head of Yahoo Research Labs, on ResourceShelf. A lot on Yahoo's search plans and strategy, which appear to focus on natural language understanding, artificial intelligence, and personalization. An excerpt on personalized search:
    RS: What's going to be the "next big thing" in web search?

    GF: I believe that the next big thing in web search will be a form of personalization that is simple, unobtrusive, intuitive, and almost without exception better than the non-personalized version of web search. Two ways of getting this wrong are to (1) keep the GUI as is, implicitly build a user model, and show personalized results all the time, or (2) expose many new GUI elements to the user to give a great deal of explicit control for personalization.

    The sweet spot -- the thing that works -- will most likely be a slight modification to the GUI, say a single new GUI element, that gives the user the power to tell the search engine what they like or dislike. If done correctly, we will all wonder how we ever searched without it, and it will be as if we get the best of both worlds: more control with minimal complication and a search experience that seems tailored to our own needs.

    RS: At the beginning of 2009 what will Yahoo search look like?

    GF: My hunch is that personalization will be so good that most users will look back to web search circa 2004 as ridiculously outdated. I also think that Yahoo! will have nailed user intent to the point that we will be able to tailor the result set to focus on documents that satisfy the need behind the query, instead of returning results that merely contain the same words as in the query.

Microsoft business strategy

Interesting criticism of Microsoft in the Seattle Weekly. Thanks, Todd Bishop, for the link.

Wednesday, June 02, 2004

Why big media should support RSS

Dave Winer is exactly right on why big media companies should use and support RSS:
    I don't think that providing RSS feeds, if you do it right, lowers traffic, in fact I think you can gain traffic. I assume you'd publish links to your articles with brief descriptions, in your RSS feeds. So when the reader clicks on a link, they go to your site to read the full article (only if they're interested of course) and your traffic stays even. Of course those pages have ads, so your revenue doesn't decrease. In this view, think of your feeds as inexpensive advertising for your publication.
RSS feeds drive traffic (and thus advertising revenue) to the news sites of big media companies. By providing teasers in the RSS feeds, customers are drawn to the news site to read the full article.

Dave also points out, as I did a couple months back, that advertising in the RSS feed is counterproductive since it just reduces the number of people who will use the feed and thereby reduces advertising revenue from the news website.

Update: Detailed article on OJR about commercial interest and concerns about RSS.