Wednesday 17 August 2005
I HAD AN ENJOYABLE DAY spending time over at Technorati. Tantek Çelik invited me to come down and chat with some of the engineers there and gain insight into the challenges of building a specialty search engine for blogs.
Time spent one-on-one with the Vice President of Engineering, Adam Hertz, was especially informative and interesting.
Technorati’s Oft-Slow Performance
A well-known issue has been that Technorati server performance can be very slow, and apparently the reason is a bit more complex than one might imagine. Technorati understands the problem very well, and has in fact made solving it priority one.
Adam drew an upward facing triangle on a whiteboard and told me it’s known as the “Devil’s Triangle” around the office. The bottom left angle is marked as query rate, the top is marked data set and the bottom right angle is update rate.
While data set and query rate are involved in other types of search engines, the update rate is a feature that is challenging Technorati. Unlike a traditional search engine, which doesn’t look at update frequency, Technorati is constantly logging updates to blogs as well as managing data sets and queries, and this is the area where the most problem solving needs to be done.
“It used to take hours for us to index a blog, now it takes minutes.”
Web Standards Allow Better Searches
We know this from past experience, but in this case, Technorati doesn’t just look at text as Google or Yahoo! do. The engine looks for discrete information within a page such as a blogroll. Apparently, blogs that conform to standards-based template designs are easier to do this with because they are marked up properly to begin with. Services such as MSN Spaces are also easier to grab information from, because the discrete data is in consistent locations within the document.
Knowing this makes a strong argument for bloggers with custom designs to use meaningful markup and CSS if they want Technorati to parse their blogs more effectively.
Think Tags Don’t Matter? Think again.
Several respected colleagues have expressed that tags don’t matter – D.L. Byron said it in exactly that way. Adam had a lot of insight about this, pointing out that when you apply a Technorati tag to something you’re assigning meaning to it. This is important because it extends the meaning of a given post.
An example would be if I have a post about a beautiful rainbow I saw while taking my car to get repaired. I might mention my car, along with the repair issues. A regular search engine isn’t able to distinguish that the post is really about the rainbow, not the car. But an author who tags the post can extend the meaning via a tag, let’s say, rainbow.
This allows people searching Technorati to hone in on posts about rainbows and end up with results that are far more specific to their query.
“We’re trying to do two things, be of service to bloggers and of service to readers.”
Two interesting Technorati tag facts:
- If you tag posts using any of the current top ten tags, this boosts traffic to your blog. Of course, this is to me a bit of an SEO Snake Oil Strategy, because the goal is to up the traffic by not using tags relevant to your post
- There is a great animation of tag growth at Technorati. You can view the small version (12.2 MB) or the large version (20 MB) or grab it via this torrent link – it’s distributed under a Creative Commons attribution NonCommercial license
“Using tags is a social act as well . . . I’m not just tagging that post so I can keep track of what it’s about, I want other people to find it, and, I’m participating . . . in building a collection of information on these topics”
After my visit with Adam, Tantek and I had a nice conversation over coffee a bit later in the day and we talked more about tagging from a technology standpoint, microformats in general and the relationship between technology and society. I’ll be writing more about that in the not-so-distant future both here and for InformIT.
It was a terrific time, complete with a Thai lunch with David Sifry, Ryan King, Derek Powazek and Tantek and Adam. So many thanks to Tantek and Technorati for welcoming me so warmly, and answering all of my questions in such detail.
Despite the problems building, improving and scaling a specialty search engine, Technorati is a company built on passion for the Web, for blogs, for individuals and for society at large.