The barriers to entry for that particular field (search ranking based on implied data) are sufficiently high that you just aren't going to be able to get there. It's doubtful that even Microsoft has enough resources and resolve to do it.
The next big thing is to enable a new type of metadata system for internet. It needs more horsepower, new algorithms, and has the potential to actually kill spam dead.
Doc Searls asked the question that got me started on this thread... and I posted the first rev of this brainstorm there in response. But I'm going to take it a step or two further here, because my answer there can be generalized to a far larger and wider set of problems.
Google distills out value from the internet in terms of an artificially derived reputation system, sometimes called PageRank.(TM?) I propose a more direct and explicit system of reputation and trust, which could be of far greater value.
First of all, we'd need a way of making assertions and commentary on existing content. (This is a long term grevience of mine, as HTML doesn't actually allow MarkUp in the strictest sense). This could be done with RSS, agregated locally, or any of dozens of other ways... as long as it's available to the systems ranking results BEFORE the end user sees them. (Could be local, google based, or anything in between)
The assertions would need to be machine readable, and digitally signed. Some of the english language equivalents might be:
- I think this article is 100% right, and I'm Mike Warot
- I trust Doc Searls 95% on this subject, and I'm Mike Warot
- This article is 100% funny and 0% informative, and I'm Mike Warot
- This author is 100% spam and I'm a member of the spam fighters trust (PGP Key)
Once those assertions (and signatures) are gathered up (once again, the mechanism can vary all over the map), the real work begins, building a mesh of them. This would be messy, and I'm sure there are a ton of ways to optimize it and distribute it across hardware.
The value is then you can combine this web of reputation and assertions with the World Wide Web of content to get a knowledge representation that has far greater value that the content alone.
Imagine a FireFox add-in that allowed you to rank pages, their authors, and the sites they appeared on, as you finished reading them. The results of this could be stored locally, or shared, depending on your preferences. You could then have your own private set of things you never want to see again (ala AdBlock Plus), or have things from more trusted sources appear with a different font to indicate the source's level of trust.
The next step up is to allow you to re-sort Google's output (or someone elses) based on your set of ratings and preferences.
This same set of infrastructure could work on Email to help kill spam dead.
The first group to get it done and working right can count on at least matching the value of Google, if not doubling it. Wouldn't it be nice if it were just a set of open standards?