Monday, February 13, 2006

Thinking about gatekeepers, spam, and metadata

This is a long one... please forgive the length... I'm thinking it out loud as I go.

Doc Searls isn't a gatekeeper... but he does a great service for me, he helps filter out the massive volumes of stuff that is Web 2.0 (or blog-o-sphere... I don't have a good term for it).

I count on Doc to share interesting things he's found related to Web 2.0, Blogging, RSS, Podcasting, and the societal impact thereof. If you want to know how to build a new hierarchy of your own, Doc's the man.

Doc recently pointed out that he subscribes to several keyword searches to keep abreast of the topics he's interested in. In this case, he doesn't use any human gatekeepers, but is relying on the technology as a tool. This allows Doc to get input from a much larger span of sources than any other human could deal with. When you operate this way, you too can get a much broader range of sources, and be much better connected.

Unfortunately there is a big downside to this.... SPAM.

I tried out Doc's approach, because I'm interested in Computer Security. I'm interested in a very special aspect of it though, I'm very pragmatic in that I want to be able to run untrusted code... which makes finding the right keywords almost impossible. Once you do find the right keywords, guess what happens.... you hit the wall-o-spam again....

For example, the search for keykos security at technorati returns 2 results in the past 33 days. The first post is spam, and there's my post right below it. This is a very narrow search, yet spam comes up first. Unless of course I'm wrong and a new blog with no real content named "search engine submission" really isn't spam... ;-)

Points so far:
  • We need other people's help to filter Web 2.0 down to size
  • Spam is a persistent problem
  • Keywords only go so far
  • Tagging helps
The tools and social constructs are getting better, which is encouraging. What I think we need now is better tools. We need better ways of sharing metadata.

For example, this very document before your eyes could be generating community metadata as you read it. How much of a page you read, how much time you spent reading it, and when, are all useful bits of knowledge if you have efficient ways to store and index it. I know that I certainly would like to know which parts of this blog are interesting, so that I can improve my writing skills to help get my point across effectively.

I'd like to know exactly what I've read. I'd like to be able to pull that data up, and add some metadata and comments, and rate it. This would be useful for my own reference, and possibly for others as well.

Annotation + Rating
A lot of what Doc Searls does manually by blogging could be interpreted as 3rd party annotation. He points us towards things he finds interesting, along with clues about his opinion of it (agree/disagree/funny/insightful, etc). I believe it would be very useful for the rest of us to do the same in a more automated, tool leveraged way.

The broken promise of Web 1.0 is that HTML doesn't actually allow Markup of Hypertext. We've had to evolve blogs, and commenting about posts, and technorati, and all sorts of other infrastructure just to get to the point where we kinda make up for a basic design error.

In my ideal world, you'd be able to pull up a page, highlight a section of it, and add notes, ratings, links to supporting material or ideas. The correct term for this is Markup, but that's been diluted by the term HTML as to become useless as a search term, so we're forced to use Annotation instead.

Just imagine if you could right-click on any web page and then click "flag as spam". The backside of this would be that you could then know before opening a link, what was spam. It would get really powerful if you could then have this data factored in to your search engine results before you see them.

Just imagine if you could right-click on the same page, and add technorati tags to it... including SPAM. ;-)

We need to be able to do this... the last search engine to do it is a wrotten egg!

Well.. that's a lot of wandering around the topic... more of a brain dump actually....

Thanks for sticking with it, hope you found it interesting.

--Mike--

No comments: