Sunday, July 31, 2005

Google Stream

More thoughts about Google, a remix of these items
  • Google as infrastructure
  • Doc states we need to maintain diversity
  • Everything is a loop
Wouldn't it be cool if Google offered a new type of service, based on the massive input streams they have available. If you were interested in a sufficiently narrow topic, and boy would it have to be narrow, you could get a stream of EVERYTHING being said about that topic.

As with search results, the tighter the specifications, the smaller the result set. For example, if you subscribe to "Norse Mythology", you might get 310 pointers per day. (310,000 results scaled by 1000 days average life - it's a guess) The ColorForth stream would be smaller, with perhaps 12 or so results per day (based on 12,500 results)

If google does this new service, it generates comes opprotunity for a completely new breed of filters and search engines specific to a topic. Just as there are many new things being added on top of Google maps, there could be a new breed of searchs, feeds, etc.

A continous Stream of google results would be wickedly cool. (Maybe an RSS feed?, Does it already do this?)

possible new google services

Julian Bond correctly points out that Google has immense leverage when it comes to capturing the Buzz of the World Live Web. They have the sensors deployed which enable amazing things, and Julian wonders why they don't.

I think that Google Web Search is now a well tuned loop, and they would be fools to tinker too much with it. I also believe that there is some manual tweaking of the results that must be done to ensure quality, as well as to meet certain uhm.... other requirements. This mix of factors keeps them from tweaking their main output, and rightly so!

However, Google has many outputs, and they can always add yet another beta, so I hereby predict a few new Google services: Fresh and Echo

Google Fresh: the freshest possible results pulled from web logs, new pages found with adsense, etc. This tool will be used as foward intellegence for companies with a clue.

Google Echo: a set of tools that help you find all the places your blog gets echoed, like trackback on steroids, eventually integrated with Blogger and other blogging software.

They might even figure out how all of this fits in one new service, but I haven't figured it out yet, but I'm sure that someone else will.

Friday, July 29, 2005

Tuning the loop

In the industrial world, control loops are used to adjust for the natural variations in inputs to a system while maintaining the desired output. There are inevitably multiple interacting parameters which can be adjusted to produce the best results. When a system is first installed, there is usually a period of time over which the loops are adjusted to produce the desired results. This process is called tuning the loop. Tuning the loop requires a well balanced mix of experience, skill, technical knowledge, and nerve (especially when a ladle of molten steel is involved!)

My experience tuning a loop was far more mundane. I had helped to repair a control system used to make very flat steel plate, and I figured it would just be a simple adjustment or two to get things working again. I tweaked, and tweaked, and things just didn't work at all, I thought perhaps the repairs weren't correct, it turns out I just didn't have enough experience tuning loops. I was amazed when a more experienced engineer did in a few minutes what would have taken me hours to figure out. Tuning the loop is not initutive at all.

So, when Doc Searls said we need to test search engines to see how fast they pick up on "Buzz"m it brought back memories of my failure to tune the loop. I agree with Doc that it would be very interesting to see just how quickly an idea gets picked up, and made available for discussion. I worry that we might make the wrong adjustments based on that measurement.

If Doc says something, I'll find out the next day, search engine or not, because he's one of my daily reads. I my collection of daily reads, along with the act of maintaing it is a loop, which as only one consumer... me. A little bit of that output makes it here, and to various discussions which I contribute to, but this loop is truely personal.

The sum of our manual systems for filtering out news has an infinite number of controls, but we all tweak our own outputs, and it seems to work just fine. It's called a Democracy, if it all works right.

When we choose to use a search engine, we're outsourcing our filtering mechanism. As a community, we're choosing to support the engines which seem best tuned for our needs. We've all contributed by voting with our clicks, to the tuning of Google, and all of the others. We just weren't too aware of the loop we're all in.

Search engines rely on a set of algorithms to separate the signal from the noise. Google's pagerank algorithm, for example, turned out to be quite helpful in attempting to emulate the network of trust we all build for ourselves, by importing it from the data implicit in the links in the world wide web.

Pagerank is one channel of information that gets used to decide who gets listed first. There are many others, and they are highly optimized and very well guarded. I'm sure that time is one of the elements that gets considered as well. Google has one person, Peter Norvig, their Director of Search Quality, who seems to be the one person responsible for the finial tuning of Google's loop.

So, you can see, it's all loops, within loops, within loops... and it's very deep, and I could go off on a thousand tangents... but I'll try to stay on track here... back to my point...

I'm worried that the "buzz" signal might get tuned too far in the wrong direction. It would be tempting to say that the fastest is the best... which seems to be what Doc is implying.

We need to have reasonable speed of communication. It seems obvious to me that a blog-only search engine should be able to keep the response time down to a day or two, but it needs to do it for everyone, regardless of rank. I feel it's very important to make sure that everyone gets to participate in the conversation. When you use a search engine, you're looking for quality, and not merely the first post. (ala Slashdot)

I'm interested in participating in a high quality, positive discussion which produces postitive results. While it's important to have good response time, I want to avoid getting lost in the noise of buzz.

We need to be careful what we wish for, and patient with the results, tuning is tricky stuff.

Wednesday, July 27, 2005

My 2 Cents about ID

Doc points to Passel as yet another possible solution to the big identity problem we all face. I reviewed it, and found it to be lacking for a few simple reasons.
  • It requires end users to have a web server with HTTPS available
  • It requires an agent mediating identity
Those two items are both deal breakers in my book. I have no "secure" web servers at work, nor elsewhere available to me at present. I'm also unwilling to install yet another layer of code between myself and the internet.

I followed the links, found other projects, including OpenID, which gets closer to home for me, but still misses the mark. I found it lacking for one very simple reason:
  • It doesn't let the user authenticate outside the current browser session
We've all been taught not to trust any URL which we didn't provide ourselves. How can a reasonable paranoid end user trust that their browser isn't being spoofed with a redirected URL that looks just like their home Identity server? they can't!

All in all, I'm picky, or just good at spotting problems, or a perfectionist, or all or none of the above... he said equivically... ;-)


What do I think would make a good identity system? Here's my scenario:
  • Web site needs identity
  • User presents a URL pointing to their ID server
  • Web site requests validation of identity, while asserting it's own identity
  • ID server returns "approval pending"
At this point anywhere from a millisecond to a year passes, depending on the whims of the user.
  • Independent of the above, the end user opens up a session with their ID server
  • the ID server shows the pending requests for validation, along with it's own checks to make sure the requests are from valid sources
  • the end user approves the validations, and optionally makes choices about the disclosure of additional data (instead of filling out yet another registration page)
  • the ID server may provide a link to go to the site in question, or the user may return there as a continuation of the above session, depending on circumstances.
  • the end user reloads the page on the web site, causing it to re-request validation, this time getting the proper response, and any additional data about the user
  • assertion of ID has been validated, normal useage ensues
Why do I do it this way? Experience has shown that phishing is on the rise, and you just can't trust any parameters provided by others without some level of verification.

It's obvious to me that relying on the end user to initiate a separate sequence of events is a burden. The lowest cost method available is to simply use the browser in a separate session, probably via a BookMark. Even this step might be optimized over time by being built into the browser, if some form of federated ID becomes the defacto standard.

The benefit of this wall of separation is that even the most paranoid (experienced) end user can have a fair degree of trust in the confidentialtity of the interaction with their ID server. It is the trust and the low cost which I think makes this the model which will win in the end.

Once we have federated ID, things get really cool, really quick... here's a example:

  • A read a cool item on Slashdot, which links to story I want to read elsewhere
  • The Web server hosting the story requires registration or an ID
  • I give it my URL
  • it requests validation... and is told to try back
  • I go to my ID server, and see the results of the request... they want an address, phone, etc...
  • I decide to validate the request, and also to provide the other data.
  • I then switch back to the Web site and reload, cause it to request the valition
  • It accepts the validation, and the extra data
  • I get what I want, the story in question
Now, there are subtle variations of this story, one of which would make my ID server cool which would be to be able to choose which set of addresses to provide:
  • Work
  • Home
  • Spam bait
I'd get to decide which on a case by case basis. The other way of doing this is to have more than one ID to present... which is a tradeoff for the end user.

The cost and complexity of getting the ID servers up and running will be taken care of by us, we'll probably optimize it along these lines:
  • open source
  • multiple competing implementations (exploring the solution space)
  • organic addition of features
  • hard to set up
  • easy for the end user (if we're smart)
  • fading into the infrastructure over time
Well, that's the way I see this ball rolling. It looks like it'll be a fun ride.

Mike Warot - July 2005

Monday, July 18, 2005

Harry Potter Hype

So, I read today's Drudge hype, and am told that the author of Harry Potter made $36,000,000 in one day. Color me cynical, or even grumpy, but why is that a big deal? She probably spent a year or two writing it, and they employed fairly draconian measures to force the simulteous worldwide release of the book, so of course it's going to have one big spike in sales. What we're going to see next is the very rapid creation of discounted clearence of the surplus.

Now, If Ms Rowling could do this at a slightly faster tempo, say 3 books per day, then she could meet my standard of extraordinary, the income of Microsoft.

Somehow I don't think she's up to the hype, or the task.