molly.com

Wednesday 17 August 2005

Tag You’re It: Talking to Technorati

I HAD AN ENJOYABLE DAY spending time over at Technorati. Tantek Çelik invited me to come down and chat with some of the engineers there and gain insight into the challenges of building a specialty search engine for blogs.

Time spent one-on-one with the Vice President of Engineering, Adam Hertz, was especially informative and interesting.

Technorati’s Oft-Slow Performance

technorati devil's triangle

A well-known issue has been that Technorati server performance can be very slow, and apparently the reason is a bit more complex than one might imagine. Technorati understands the problem very well, and has in fact made solving it priority one.

Adam drew an upward facing triangle on a whiteboard and told me it’s known as the “Devil’s Triangle” around the office. The bottom left angle is marked as query rate, the top is marked data set and the bottom right angle is update rate.

While data set and query rate are involved in other types of search engines, the update rate is a feature that is challenging Technorati. Unlike a traditional search engine, which doesn’t look at update frequency, Technorati is constantly logging updates to blogs as well as managing data sets and queries, and this is the area where the most problem solving needs to be done.

“It used to take hours for us to index a blog, now it takes minutes.”

Web Standards Allow Better Searches

We know this from past experience, but in this case, Technorati doesn’t just look at text as Google or Yahoo! do. The engine looks for discrete information within a page such as a blogroll. Apparently, blogs that conform to standards-based template designs are easier to do this with because they are marked up properly to begin with. Services such as MSN Spaces are also easier to grab information from, because the discrete data is in consistent locations within the document.

Knowing this makes a strong argument for bloggers with custom designs to use meaningful markup and CSS if they want Technorati to parse their blogs more effectively.

Think Tags Don’t Matter? Think again.

Several respected colleagues have expressed that tags don’t matter – D.L. Byron said it in exactly that way. Adam had a lot of insight about this, pointing out that when you apply a Technorati tag to something you’re assigning meaning to it. This is important because it extends the meaning of a given post.

An example would be if I have a post about a beautiful rainbow I saw while taking my car to get repaired. I might mention my car, along with the repair issues. A regular search engine isn’t able to distinguish that the post is really about the rainbow, not the car. But an author who tags the post can extend the meaning via a tag, let’s say, rainbow.

This allows people searching Technorati to hone in on posts about rainbows and end up with results that are far more specific to their query.

“We’re trying to do two things, be of service to bloggers and of service to readers.”

Two interesting Technorati tag facts:

  1. If you tag posts using any of the current top ten tags, this boosts traffic to your blog. Of course, this is to me a bit of an SEO Snake Oil Strategy, because the goal is to up the traffic by not using tags relevant to your post
  2. There is a great animation of tag growth at Technorati. You can view the small version (12.2 MB) or the large version (20 MB) or grab it via this torrent link – it’s distributed under a Creative Commons attribution NonCommercial license

“Using tags is a social act as well . . . I’m not just tagging that post so I can keep track of what it’s about, I want other people to find it, and, I’m participating . . . in building a collection of information on these topics”

After my visit with Adam, Tantek and I had a nice conversation over coffee a bit later in the day and we talked more about tagging from a technology standpoint, microformats in general and the relationship between technology and society. I’ll be writing more about that in the not-so-distant future both here and for InformIT.

It was a terrific time, complete with a Thai lunch with David Sifry, Ryan King, Derek Powazek and Tantek and Adam. So many thanks to Tantek and Technorati for welcoming me so warmly, and answering all of my questions in such detail.

Despite the problems building, improving and scaling a specialty search engine, Technorati is a company built on passion for the Web, for blogs, for individuals and for society at large.

Filed under:   general
Posted by:   Molly | 21:07 | Comments (24)

Comments (24)

  1. One of the things I really like about Technorati is how open they are about what they do well, and what they are having trouble with. Faced with a mountain of criticism, they keep trudging on, building a great brand and a great tool.

    I noticed (and blogged on) that they just announced multiple tag search, which is another useful feature. If they could now add “AND” to their “OR” functionality, it would become incredibly useful as a research tool.

    In my opinion, by focusing on evolving tag search, and speeding up the index (including the link index) and search, they will continue to be in the front of the pack.

  2. Pingback: WiRED.Pod » Blog Archive » Technorati’s Triangle

  3. All very interesting. I’m a big fan of technorati but have never bothered to tag my blog. I believe that the topic of each entry is really up to the reader to decide because I’m usually just ranting nonsense.

  4. Oooh, thanks for this. Very interesting! 🙂

  5. Interesting stuff. The animation really is quite impressive.

  6. Molly, one thing that I think is important to mention is that Technorati indexes blog software categories as tags. In other words, if you place a post in a category in WordPress, for example, Technorati will consider that a tagged post under the category name.

    This is important because embedded Technorati tags (using an <a&rt; anchor tag) are much more likely to be gamed by people, as their only purpose is to attract attention. Category-tagging, on the other hand, is done first for personal use, and so is less likely to be gamed.

  7. Michael: I agree about the openness completely.

    Josh: Yes, this is a concern that has been raised over and over again and it’s something that I’m sure Technorati will be looking into. However, the link incredibly powerful on the one hand because it’s what allows for an instant access to search for all tags of that nature.

    Constantly adding categories to a blog in order to support tags as you need them means having a huge number of categories Categories are by nature supposed to be broad but tags are downright granular.

  8. The reason that blog spam won’t be/isn’t as big of a problem as it was with the normal search engines is that you have to look at the searches in different ways. When I go to Google and do a search for “rainbow”, I want the most usefull and relevant information about rainbows. Google gives me “Reading Rainbow”, which may or may not be relevant to my search. Google (and the others) makes it a priority to sort their results by relevance.

    When I go to Technorati and do a search for “rainbow”, I’m looking for interesting information and posts about rainbows. I can’t guarantee that any of it will be relevant, but it’ll at least be current. Technorati sorts by date. I don’t use Technorati when I’m looking for general information about something, and I don’t use google when I’m looking for the newest news and information about that same thing.

  9. Molly, thanks for the post. It was a pleasure meeting you and meeting with you. Just to set the record straight, the Devil’s Triangle term is due to Ian Kallen, our erstwhile architect. Thanks again and have fun at the conference.

    -A-

  10. Molly, that’s an interesting observation about categories being broad and tags being granular. I wonder if practice erases this distinction, but in your trip to Technorati, did they happen to mention why they index both at the same level?

  11. Pingback: Kevin Burton's Feed Blog

  12. Well, if Technorati hosted *me* to a meeting and took me out for lunch, I wouldn’t blame them for slipping something stomach-churning into my meal. Since they’re considered a de facto standard (especially regarding ranking) it would be great if their ranking actually used something more than some very superficial (and flawed) screen-scraping… http://www.multidimensional.me.uk/2005/08/17.html#a222

  13. Pingback: Emad Fanous » Blog Archive » Technorati Response Times and Tag Info

  14. Your closing sentence, and Michael Arrington’s comment above, reflect my exact thoughts regarding Technorati and why I continue to be an unabashed supporter of their work. Thanks for this post.

  15. Pingback: B.L. Ochman's weblog - Internet strategy, marketing, public relations, politics with news and commentary

  16. Pingback: Just Relax! :: What’s wrong, Technorati? :: August :: 2005

  17. Pingback: Emergence Marketing

  18. Successful website

  19. Pingback: Talking to technorati | Janet Martin

Upcoming Travels