Saturday, May 31, 2008

Dead D40... ugh!

My Nikon D40 just died... it has only taken about 29000 photos since I bought it after Thanksgiving... this really sucks! I've contacted support, and will be sending it in on Monday.

The shutter is fubar... when you turn it on it says

! Error: Press shutter
release button again.


Oh well... time to dig out the Coolpix 8800 and get the batteries charged.

Friday, May 30, 2008

Flow based databases to fix twitter?

Twitter is a popular web based instant messenger service, which has been having problems with scaling lately. The facts seem to indicate that a traditional Relational DataBase Management System just isn't an appropriate fit in this case.
I believe that this is a perfect case for a new type of database, and perhaps even a completely new framework of programming. I don't have a good name for it, and the ideas are still vague in my head right now, but I'll try to outline what I'm thinking of below.

I would break Twitter up into a series of tables which get distributed and replicated among a cluster of servers. The tables would relate to each other, but not in the strict atomic transaction model, but one of eventual consistency. These tables would be:
  • Users
  • Queues
  • Subscriptions
  • Content
The real trick would be to treat the tables more like queues or pipes full of data that are appended to with very low random write frequency. The changes would then be aggregated in a channel to make it easy to keep multiple copies for coping with the heavy read access from all of the clients connected at any given time.

The bandwidth external to Twitter is pretty high, because you've got lots of people with many subscribers. The amount of actual non-duplicate data is surprisingly small... and I'm guessing that it's on the order of 3kbytes/second. The real challenge is distributing this 3kbytes in a consistent and reliable manner to all the places it gets copied out.

A flow based database would be able to handle such types of loads by maintaining many local copies and keeping them in eventual consistency by tying them into a channel. This is a place where multicast might be a really good strategy, if not a straight peer-to-peer network.

A flow database could be a straight up normal table in an RDBMS, or it could be something new optimized to the task.

What do you folks think?

Oh my G-d, Scoble killed Twitter...

So Robert Scoble is to blame for taking out Twitter, the all too popular instant messaging system, because he's just to gosh darned popular.


As if...


As I've stated before, the aggregate flow of all tweets in on the order of 10-20 messages per second, based on peeking at the message sequence numbers. It seems readily apparent that they've chosen the wrong architecture for this.


The tweets themselves should be aggregated with a sequence number, and user sequence number into a stream which should get copied to all of the boxes handling User Interface. Deleting a message would be handled by reposting it to the same queue with no data.


The subscription lists should be another stream.


The user database could be yet another stream.


All of those streams should aggregate out to about 10 kbytes/second. The process of splitting out the work to UI boxes is one of straight forward partitioning of the load, and maintaining a list of tweet sequence numbers for each person to see. The aggregated total of all of the three streams would sit on each UI box so they didn't have to get any of it from across the net.


That's my basic idea for scaling twitter. Comments welcome.

Sunday, May 25, 2008

Tim, Noran, Virginia, Bear

Tim, Noran, Virginia, BearHere's a great photo of Tim, Noran, Virginia and Bear. We took Virginia to the Lincoln Park Zoo in Chicago to celebrate her second birthday. Tim was nice enough to meet us there for the occasion. A fun day was had by all.

Bear got to see some of his cousins.

Greta still remains missing. She was last seen near Niagara falls a few years ago.

Thursday, May 22, 2008

Rememberance

In Flanders fields the poppies blow
Between the crosses, row on row,

That mark our place; and in the sky
The larks, still bravely singing, fly
Scarce heard amid the guns below.

We are the dead. Short days ago
We lived, felt dawn, saw sunset glow,
Loved, and were loved, and now we lie
In Flanders fields.

Take up our quarrel with the foe:
To you from failing hands we throw
The torch; be yours to hold it high.
If ye break faith with us who die
We shall not sleep, though poppies grow
In Flanders fields.
— John McCrae



This is a day to remember those who came before us, and to reflect upon the legacy they left for us.

The time is now, always now, to decide what you can do to make the world a better legacy for our children.

Tuesday, May 20, 2008

Hezbollah has fiber?

Today's news of the unexpected is that Hezbollah, the opposition (terrorist?) group in Lebanon has it's own fiber network!

Wow!

Yet, I can't get fiber for a reasonable rate at work or at home in the worlds only remaining SuperPower... hmmmmm.

Found via John Robb.

Sunday, May 18, 2008

ACL as punishment?

Over at Echovar, in the midst of a post summarizing the Internet Identity Workshop:
Chris Saad injected the data portability meme into the flow and suggested personal Access Control Lists, in the form of a “Sharing OK/Not OK” check box on data you give to individuals or companies. It would be interesting to watch Robert Scoble manually configure a complex ACL on his 20,000+ friends (Scoble rushes in where Angels fear to tread).

While it would truly be torture to force a person to manually configure an ACL for 20,000 people, it doesn't have to be that way. One wrong move, and you've lunched everything.

Giving away capabilities on the other hand would be a much easier thing. You have the host environment generate a capabilities token for the piece you wish to delegate access to, then send it through email, or on a web page, or whatever the end user's security policy specifies is the right thing to do.

It would make far more sense to have a system that lets users delegate capabilities to any given part of their information, blog posts, photos, etc. The fact that you start with a model of least privilege means that you start with the most you're willing to give away, and pare down from there. You don't have to worry about giving away the store by mistake.

Yes, Access Control Lists would be punishment, but being able to give away little bits, without fear, is a quite liberating alternative.

I look forward to the future.

Saturday, May 17, 2008

QWERTY Considered harmful?

An interesting observation from Daniel Berger:
Larry Wall’s first rule of computer language design is, “Everybody wants the colon”3. Maybe the problem is that we just don’t have enough symbols on our darned keyboards. The result is that we’re left fighting over the scraps that QWERTY gives us, e.g. the colon. My opinion is that a limited number of usable characters limits our thinking and our expressiveness. (emphasis mine)

In my recent quest to push forward awareness of capabilities, the notion of expressiveness seems to be at the crux of everything. If you don't have a conventional way to express something, it takes a lot of work to come up with something to get your point across.

I believe that rich source code is overdue. The idea first came to me via Chuck Moore's ColorForth, but I think it could be applied in a wider array of places. The ability to simply highlight a section and make it a comment without worrying about syntax would be cool, but I'm sure there are far more powerful uses that would quickly arise, such as the ability to do literate programming, freely mixing source and documentation and content.

The arguments against any new programming technique usually tend towards the fact that pretty much any language can already express any program. These arguments always miss the expressiveness that a new language brings to making it easier to solve a certain class of problem.

Friday, May 16, 2008

Originality

Quote of the day:

Originality is overrated. Clarity, especially for those of us who have trouble achieving it, is also appreciated.
That was in response to an Megan McArdle's concern her post might not be original enough. It was, and I learned a few things. I liked the CS Lewis quote in the middle.

Thursday, May 15, 2008

Almost useful capabilities demo - 0.012

So, I've done some more programming, and I'm now up to version 0.012. for my Capabilities demo.

The main page at http://127.0.0.1:81 is now the user page, with the protected content. You have to have a capabilities token to edit the data.

The administration page is at http://127.0.0.1:81/admin, which allows you to create and revoke capabilities, and see the current "protected content".

It's all implemented in python, in a single file, just to make it easy to demo.

The slow road to implementation

There are a lot more choices to make, and details to manage, in the process of programming a web server than I would have expected. I've made a lot of decisions, trying hard not to worry too much about it, to avoid analysis paralysis.

http://warot.com/python/
contains my recent python programs. I have to name them with .txt on the end or the web server tries to run them (and fails).

So far I've managed to get up to webserver008.py, which manages to create random numbers and keep a list of them available. At the rate I'm going, I'll have something usable in a few months, which is better than never. 8)

Should you choose to actually download and run the thing... here's what it does.

In a DOS box (or your command line equivalent)

Welcome to Mike Warot's capability based security demo web server, version 0.008

You can access it at http://127.0.0.1:81
Use control-c to tell it to shut down, which may take up to 10 seconds
started httpserver...

If you then open http://127.0.0.1:81 in a web browser, you'll get a very informative message like this:

this is the default content, not served from a file
Here are the valid tokens:

Now.. for the completely undocumented and poorly written section of code... change the URL to http://127.0.0.81/token, and you'll get something like this:

0451a66530b72a980725745c39992239


Isn't that lovely? If you then go back to the home page at http://127.0.0.1:81 and refresh the page, now you'll see:

this is the default content, not served from a file
Here are the valid tokens:
0451a66530b72a980725745c39992239 [Revoke]

That's a list of all the tokens, with the ability to revoke one of them. That's pretty much the full extent of the power of this demo.

It's a list, with a undocumented, poorly designed and inconsistent UI... but it's a step in the right direction. Oh... and it's licensed with the GPL so you can fork the project. ;-)

I hope to get a reasonable list view with the ability to issue tokens without having to mung the URL in the next day. Code to actually give out capabilities to edit a resource should be next week.

It's slow going, isn't it?

--Mike--

Computer as suspicious package

It occurs to me that almost none of us could answer the question "did you pack it yourself" in the affirmative if we're using a PC, especially not if we bought it from a vendor who favors crapware.

I haven't actually had total control over the contents of a computer since I built a little box back in the 1980's that watched for a ring signal on a phone line, flipped the relay to pick up the line, used a 4 channel 8 bit A/D converter to sample 4 incoming voltages, then used a speech chip to speak the given voltages (in almost recognizable english) to the caller, twice, then hang up.

I wrote the code, programmed the 2764 EPROM, and it was totally under my control. I packed that piece of hardware... but since then... no way.

If you get a new PC from a good source, you can reasonably trust that the BIOS isn't going to be subversive. Once you load an OS, you've definitely had someone else doing your packing.

If it came loaded (or used)... there's really no way you can truly trust it, you just have to assume it's all going to be ok. Most of the time, it works out that way, or if it is a zombie on a botnet, you don't even know it, which is almost as good for most people.

It's a strange thought... but one I think might provoke some discussion.

Tuesday, May 13, 2008

Only Communists complain about twitter??

Why Cliff Gerrish thinks that wanting to fix twitter is the same as communism is beyond me. I'm not part of the Gillmor gang, and I'm annoyed that twitter is broken quite a bit. Does that make me a communist too?

Twitter breaks, a lot... it's broken now, giving me time to write this. It's ok to complain about a broken service. Twitter is a good service, when it works, but it's too valuable to leave to the winds of chance. Thus... replacing twitter with something more reliable is a natural itch.

I guestimate that the aggregate flow through twitter is somewhere around 3kbytes/second when it's at full bore. It can be replaced with a set of machines, with normal code, and normal network hardware. There's no super hardware or non-obvious patentable code buried in it... anyone with enough programming skills, hardware and time could do it.

But... even hinting that we might do this sends Cliff into a 1950's McCarthy era rant about communism... it's just.... odd.

Being able to trap keywords and subscribe to them from the overall stream still only has to content with 3kbytes/second. Again... normal hardware, normal networks, just a bit of distributed software to make it all work.

Even more stuff to choose... ugh!

The last time I did full time programming, I was using Turbo Pascal 7.0 on a DOS platform. The notion of being able to have multiple users trying to use our little home grown inspection software was just starting to enter reality. I then did a bunch of other stuff.... now programming is coming back into play because of the capabilities demos I want to do.

I'm like a newbie all over again... I've kept up a little bit on the buzzwords, etc... but haven't had to actually implement anything from scratch in more than 12 years. I figured surely in the meanwhile all of this stuff would be sorted out, and there would be a nice standard way to have programs talk to each other across the internet.

So now I know what all of those buzzwords like SOAP, XML-RPC, REST, WDSL and the rest mean... nobody has a nice simple way to do things...

I was hoping to do a nice simple demo of a RESTful capabilities system using Python as a simple standalone app that anyone could just put on their PC (or server). It turns out that there are several things in the way. Here are some of the things I've learned.

  • Cryptographic random number routines aren't included in python. (Dean Landolt suggests punting the issue and getting on with it... and I agree for the demo)
  • The library that would do it requires me to be able to re-compile python (using Visual Studio 2003)
  • REST isn't... the common example of Flickr as a RESTful API isn't.
  • WDSL is for people who like to write specification specifications, and don't write code.
  • REST is the choice, except that web browsers don't actually PUT or DELETE, and a lot of people use GET for things with side-effects.
  • There are a lot of python web toolkits out there, including CherryPy, TurboGears, Web.py, Django, and others.

In spite of all that, here are my design choices to date:

Programming language: Python, because it's cross platform, a known entity, and quite powerful, despite the immutable strings, and comes with a web server library.
Database: None - it's a demo
Random Salt: the built in non-secure RNG from python
Protocol: REST-ish... GET for reading, idempotent operations only, POST for everything else. Rest because there should only be one URL per object, regardless of the compromise about PUT/Delete.


The demo will be of the ability to edit a string. You'll be able to see the string with a straight web page. You'll be able to request a token to edit the string, you'll be able to write the string (provided you have the token) and you'll be able to revoke the token.

I'm hoping that's simple enough for me to get done on a few train trips to/from work.

Monday, May 12, 2008

Random numbers are hard... whoda thunk it?

It turns out the hard part of doing capabilities on the Internet is the lack of a suitable random number generator... which kinda blindsided me. I'm trying to find an implementation of the ISAAC random number generator that would work in either Python, or active server pages, and haven't been able to find one. It's critical to give out unforgeable tokens, and a cryptographically secure random number generator is the way to go. You can't even think of using the built in random generator, because it's too easy for a determined attacker to guess the next output after a short run of samples.

So, eventually I'll find what I want (or be forced to port it myself)... and then I can get back to the examples... which will generate a token consisting of the object, the capability, and a random number to serve as salt to keep from having it forged.

Wish me luck.

Saturday, May 10, 2008

Twitter - Capabilites mashup

The idea of a distributed replacement for Twitter is floating around... and I've been writing about capabilities recently... what would we get if you merged the ideas?

#1. - Get rid of user accounts on twitter... just hand out the capability to post, which would be different each time it's issued, and individually revoke able. I'd hand them out in an Email, to limit the user base a bit and cut down on spam. You could always store the email address somewhere in a table along with the capability to know who it is if necessary.

#2. - Allow each user to then hand out tokens that would allow a direct message, which they could proxy and/or revoke themselves. This would make it possible for an end user to block someone from making direct messages, without the need for it to happen in the central code. The proxy that does this could be a separate service, and doesn't play a part in the security of the central capability provider code.

#3. - Allow each user to hand out tokens that would allow following them, which like above, they could proxy and revoke themselves. This turns the distributed twitter into an effectively private email system without too much work.

#4. - There's really not much difference between a tweet and a blog post, other than length. There's not much different between a private posting and and email... you could cover all of them this way.

Ok... it's 10:30 and I'm sleep deprived, so this might not be as coherent as it seems at the time... though I hope it is.

Capabilities offer a huge amount of flexibility when doing system design. They make it possible to break apart the logic of a complex application without having to worry about the combinatorial explosion that results from the conventional idea of having every piece of code enforcing a ton of rules.

What do you all think?

--Mike--

Teaching and learning, the circle of life

I'm glad to see that Dean has made the conceptual leap to understanding capability based security. It's a tricky subject to explain, and I was starting to worry he'd get discouraged... but he's made it... we've got another convert. 8)

Now, the thing to do is to take him at his word, and see exactly what helped him to see the value, and to make it easier to get to there from a world steeped in the Dogma of ACL uber Alles.

The key distinction he makes is that a capability is more than a token. He then presents cases of issuing new capabilities based on old ones, always with less authority that the original. This is a very powerful lever... once you grok it, you'll never forget it.

I've got to spend some of my very limited free time to get some actually capabilities samples up on the net, however that may happen. I've got some knowledge of Delphi, Python, and ASP... one of those should suffice to get something that can issue capabilities and let anyone store a few bits on a server somewhere.

I don't think it's really important to get huge examples working, just enough to squeak by and help others by making the cognitive leap smaller.

Thanks Dean!

You've helped renew my faith that blogging is an effective way to make things better.

Home made transistors?

I'm inspired by the guy who is making his own film with this apparatus, and this person who makes his own triode vacuum tubes, to wonder, just how hard is it to make a transistor?

I know that there are a lot of choices out there, and silicon and germanium are the most common, but I'm just looking for something that can amplify an audio signal, or switch on and off with a beta of at least 10, and a cutoff frequency of at least 10Khz.

What are the choices if you back wayyyyy off from the state of the art? I know copper oxide, copper sulfate, and lead sulfide are all semiconductors, what other choices are there? Surely chemistry has come a long way, and this can be done in a home basement.

Questions and comments are welcome.

--Mike--

Winners never quit, nor does Nader... uh Hillary

Ellen R. Malcolm says that "Winners never quit", which makes a great sound bite, but is irrelevant to the question at hand. Which person will do the best job of leading our country in the next 4 years?

If you consider the normal social behavior of following (that you agreed to in writing), Hillary is out of the race, and we have two choices left, McCain, or Obama.

Hillary isn't content to play by the rules, she wants to push Obama out of the race, at all costs, even though she can't win, because she's not the Democrat party choice.

However, Hillary is now so focused on power she's willing to do anything, even sabotage her own party (effectively campaigning for McCain). She's been following this track of desparation long enough now that I've been referring to the McCain/Clinton 08 ticket for some time now.

While I would certainly welcome an Obama/Paul ticket (the only 2 sane candidates who ever had a chance in the first place)... that's not in the cards.

Hillary is now playing to role of Ralph Nader, and spoiling the election for the Democrats. So, we'll see Hillary play some role in making sure the Bush-Clinton dynasty gets handed off to it's chosen successor, with a possible re-run by Hillary in 2012.

Don't you think 28 years of Bush-Clinton is enough?

It's time to give the people a chance to run things.

--Mike--

Update: John Aravosis says it even better (with less rant, and more logic).

Wednesday, May 07, 2008

Capabilities, Internet Style - Part 3

We all stand on the shoulders of Giants, in this case I'm relearning the lessons of the folks who wrote KeyKOS, CapROS, and EROS, which are capability based operating systems.

Dean Landolt has been giving pretty good feedback, and it's a good discussion going now. (It's fun!)

I said that a capability is stand alone, and gave an example of sending a capability in an email. Dean thought about it, and is discouraged by the implications he imagines when you apply it to a compound document

“Of course, this completely blows up the easy send-me-an-email capability described above. I haven’t worked through all the use cases in my head, but my guess is there room for both. But one thing I don’t want to do is reimplement the cascading nightmare that is administering a windows file share. Creating a system simple and clear enough for the average user to fully understand the implications of their actions is paramount.“

Now, if we were talking about ordinary Access Control Lists, yes it would be a nightmare, but we’ve got our shiny new “magic bullet” capabilities, so it’ll be a piece of cake. Trust me, and sign this purchase order. ha... just kidding.... ;-)

Use Cases are where to start to find out the implications of your models of things, and begin to flesh out details when you implement code. They provide the differentials to guide programmers when they reach forks in the road that could go either way. Dean provides 2 of them, sharing a bucket of blog entries, and editing portions of a compound document.

In either case, you want to take a capability and build finer-grain capabilities on top of it. Doing this with a file based ACL is impossible, because files are treated as atomic entries, and there’s no way to protect part of a file. Nobody ever thought it would be necessary to do so, and there’s no way to express the concept in an Access Control List.

Capabilities allow for arbitrary expression of rights to an extent limited by the capabilities of the programmer who implements them. Consider this thought experiment as an exploration of the expressivities of capabilities.

Let’s say I send an email something like this:

Hi Dean,

I got it working, I think... here’s version 0.001 of the server. I’ve got it up and running at http://warot.com/cap01

It uses a server at http://warot.com/cap01/cap01.asp, which is a simple ASP script I wrote to get things off the ground. It stores a single file in “content.txt” in the same folder. (so you can view it directly)

It’s a form (to start with)... it’ll take a token, and let you edit the file represented by the token (assuming it hasn’t been revoked)

What do you think?

--Mike--

Oh... yeah... here’s the token: [42]

Now, this really doesn’t do anything new, does it? No. Consider the next email:

Hi Dean,

Here’s version 0.001 of my first capabilities proxy... I’ve got it running at http://warot.com/proxy01, it requires uses a token you provide access the file store capability. It then lets you read/write, but only stores in UPPER CASE. Pretty weird, eh?

--Mike--

Now, this is a trivial example of vaporware in action, but does show something important. You can filter something by writing code ON TOP OF existing capabilities, without having give away the farm. The proxy in this case can only access the capability provided to it, so it CAN’T do anything outside than capabilities it’s provided. It can then provide a new service, a file store that doesn’t allow lower case letters.

I’ve just expressed a new thing... in a secure manner. Because you’re not at any point handing over a username/password pair, you don’t have to give away the keys to the kingdom. You only have to provide the keys to an object, or arbitrary portion thereof.

You can right proxies that you don’t have to trust beyond the capability you provide. Those proxies can then provide tokens with newly limited portions of the capability they posses.

Dean, you mention the nightmare of Windows administration (which is pretty much equivalent to any other ACL administration scenario), in this case the users themselves handle the distribution of a capability, and they CAN'T go outside of the original capability in scope or potential.

Instead of hoping that each and every piece of code makes all the correct security checks every time (leading to a combinatorial explosion when it comes time to test), you can start with the assumption of a capability, then test each level of delegation to make sure it works as expected. You only have to test one level at a time. This turns a nightmare in to a tractible problem.

I hope this makes sense.

--Mike--

Here are some references that I've been chewing on:

http://c2.com/cgi/wiki?CapabilitySecurityDiscussion intelligent discussion at the Portland Patters Repository

http://www.cap-lore.com/CapTheory/ - a new sight that I'm still plowing through, very helpful so far.



Tuesday, May 06, 2008

Capabilities, Internet Style - Part 2

I think the primary feature that capabilities have to offer to an internet environment is one of isolation intent from authentication.

Now... that's pretty obtuse, and I might have even said it wrong... I'll expound on it for a bit.


I want to be able to generate a token that gives access to a resource on the internet. I want to be able to do it in a way that only requires holding the token, with no other authentication necessary. I want to be able to issue multiple tokens to access the same resource. I want to be able to revoke a token without ambiguity. (I don't care about copies of the resource, that's a branch into the murky world of DRM)

I think the simplest way to do this is to write a proxy server that has the local authority to access a given resource, and to allow it to maintain the database of tokens, and to mediate access to the resource. I hope that this could eventually be folded into the operating systems, or even the kernel of Linux at some future point.

For now, a proxy, no matter how inefficient will suffice to demonstrate principles and help popularize capabilities as an better alternative to handing over authentication information to code you can't trust.

For now, the proxy has to allow a local user to generate access tokens, manage an access control list, and enforce it. I think that something that works locally and can be accessed via HTTP is the way to go.

I'll start working on a prototype... probably in Python, to help get this ball off the ground.

I'm interested in collaboration in all aspects of this project.


So... from a user perspective, you don't get much. You already have full access to your stuff. You get a toy which hopefully can allow you to sandbox access to a file and give it away, without your username or password being involved. (Unless of course the code in the proxy is bad, and goes all confused deputy on you)

Being able to give away access without sharing usernames or passwords helps make your internet node more valuable, because you can innovate once again. Heck, you might even get to the point where Metcalfe's law starts to apply again and get some real value going.



I hope that wasn't too far out for everyone.

What do you all think?

--Mike--

Why I voted for Obama today.

Well, I've voted for Obama... and now it's time explain why a bit more.

The Clinton's believe in Triangulation as a way to win. The idea is that there is a spectrum of opinion about any given subject, the best way to make a deal is to stake out a position the proper distance from the extremes, and get to a happy medium.

Triangulation is worship of a false god. It's profane, and profoundly misguided.

People have opinions on a wide variety of subjects. There's no logical consistency to it, let alone any kind of continuous spectrum to choose from. There is no "liberal" mindset. It exists only in the pigeonhole that people try to push us into.

From this, it's no small stretch to assume that there is no single rational model of the world. We don't all think about things the same way. We're all different (but I'm not!)

Instead of worrying about the right thing to do, the triangulation cultist worry about where they are relative to the mythical "mainstream" and try to maintain the strategic position relative to it.

If we all believe that slavery was a good thing, Hillary would be for cutting taxes on the chains because the price of steel from China went up... she'd propose a slave chain tax holiday. The issue of slavery won't be part of the picture.

Think I'm being absurd? Well, we're all slaves to our cars, and Hillary just proposed a tax on the fuel for our cars. She doesn't even consider that perhaps there's a bigger issue to be resolved here... the "non-negotiable American Way of Life"

We need leaders who don't just triangulate, we need to move away from the myopic vision of politics as usual, and to step back and look at the big picture, and least try to do the right thing, for a change.

--Mike--

Capabilities, Internet Style - Part 1

Dean is looking for a project for his Social Computing class. He's thinking about some form of Social File System. In response to a post about capabilities, he wrote:

So how would you suggest scaling capabilities to the internet? Everything I've always read about capability-based security alludes to persisting and passing file handles, but what does this look like on the web?

I have a Flickr account, with about 3000 photos now on line. If I want to publish photos, Flickr is the way to go... it's great for broadcasting. The thing Flickr lacks (by design) is any capability to delegate access to any portion of my account. There's no way I could create a bucket for someone to add photos to. Each photo is tied to an owner, and there's no way to delegate access.

So, from a capabilities view, a Flickr account is atomic.... you either have read/write access, or you don't. There's no granularity to it at all. If I wanted to share my flickr account I'd have to give my password to someone to do it.

Amazon S3 works in a more favorable way, from the standpoint of granularity. It treats each object as a separate entity, with it's own access control list. These objects live inside of buckets, each amazon user can have up to 100 buckets. This makes it easier to set the default permissions for objects, and segregate capabilities. Amazon thus supports delegating access, all the way down to the individual object.

From a capabilities perspective, it would seem that Amazon S3 is the way to go. It's certainly much better than the all/nothing approach of Flickr. However, you still have some significant restrictions.

S3 objects can be shared with other specific S3 users, ALL S3 users, and the world. There's no way to hand off an actual capability to someone without requiring them to have an S3 account.
Amazon wasn't thinking of how to optimize their service for capabilities when they designed it.

The next logical step would be to figure out how to extend Access Control to a distributed system of identity. This has to be an important feature in any Social File System. Implementing an Access Control List which allows both OpenID and Microsoft LiveID to be used to authenticate would be a good first step.

Now for the last step, the one that is subtle and very powerful. So far, we're still dealing with Usernames and Passwords. We need to take the last step, and get away from usernames and passwords. A pure capability isn't tied to any username or password. It grants specific access to an object (or set of objects).

Usernames and passwords work well for real live people. They should NEVER be given to code you can't trust. Capabilities offer path which allows for the separation of intent away from authentication.

Once authentication is out of the picture, then you can hand off a capability to a program, and it can't do anything outside of that capability, because it doesn't have any user names or passwords to give away. You don't have to trust that it won't send your bank password to China, because it can't.

So, how would it work? At the lowest level, I'd start with the same basic file systems we all know and love. I'd extend the data structure for the access control list to allow the creation of tokens. These would be a large random number, along with access rights, just like those for any other user. There would need to be a new API for generating these tokens, along with whatever tweaks would be required to integrate the extensions into the file system code.

Up a level, imagine being able to right-click on a file or folder somewhere, and say "generate capability" and have a dialog box appear which would allow copy/paste to take the resultant long text string and allow it to be given to a program.

Up a level more, you could drag and file (or folder) onto a web page, an the default action would be to allow write access to that folder (subject to the default policies you set in place)

You could even drag/drop that folder into an email, to allow the capability to be sent to a friend.

There's more here... thus this is part 1.

--Mike--

Sunday, May 04, 2008

Why Silos work

I think we need to have a conversation about Silos, specifically about why they work. There is another level of depth to the silo analogy that Doc Searls uses, which I present here.

Contemporary farming uses a process known as ensilage preserve crops for feed during the off season. The process allows for the slowing of the otherwise rapid decay of plant material by limiting the intrusion of oxygen, and controlling unfavorable reactions. It requires fixed infrastructure (silos) and a set of skilled workers to prevent unfavorable results.

The current version of the Internet relies on a similar process. Companies such as Google, Yahoo, Twitter, Flickr and others provide server farms to store content for distribution of users. They also provide the set of skilled workers to prevent unfavorable results.

At first glance, the analogy seems to be a fairly simple one... a place where things are stored, and kept out of the weather. However the analogy has a lot more depth than even Doc Searls might have imagined. The mechanism of preserving something and transforming it by adding value also works to give the metaphor more depth.

Taking something as ephemeral as a grass crop and storing it for 6 months is a remarkable achievement if you can do it on a sustained basis. The same can be said for taking the daily diaries of the general public, and keeping them online for years. It takes a persistent effort by a skilled set of workers in both cases to keep conditions optimal.

The farm worker is trying to prevent decomposition, control pests, and maintain sweet silage. The internet worker is trying to thwart hackers, spam, zombies, and any number of other pests, while working with fundamentally unreliable hardware and internet connectivity.

Flickr is a typical internet silo. The input is millions of photos (and now videos) from users throughout the world, along with some of their time and attention and a dash of identity. The value added is that of hosting the photos, transforming them automatically into thumbnails and a number of other sizes. Simply storing everyone's snapshots is nice, but there is far more value added than is immediately obvious.

When you link to a Flickr picture, everyone knows it's safe because it is really a photo, and not some malware waiting to trick your jpeg processing library in your browser. They also know it's not likely to be offensive, because of the filtering done on the images to conform with social standards. A further sense of safety is implied because the identity of the photographer is coupled to the photos, which also allows safe conversations back through the built in mail system.

Flickr works mostly because it was there first, has good network scale, and provides a great deal of safety. It's hard to replicate these things by accident, so it's important to have done your homework and looked deep into the real value proposition of the existing silos.

Twitter works because it makes IM safe and easy (when it works). A distributed twitter system has to replicate not only the message passing and filtering part of twitter, but the core value in terms of social value... the ability to ban users and moderate content.

There's a lot more to a good silo than servers and bandwidth. It takes a skilled team to keep the technology working, and a different skilled team to keep the social networking going as well.

It will be interesting to see what kinds of distributed silage systems get built in the next 10 years. I'm willing to help anyone who wants to start one.

--Mike--