--Mike--: programming

Showing posts with label programming. Show all posts

Sunday, October 11, 2020

ImgBurn to the rescue... twice!

I've been on a Retro computing kick lately. I hit a few walls because I couldn't figure out how to get things into the VirtualBox machines (the OSs involved were so old there were no file/network sharing things I could get to work). Then it dawned on me that I could make .ISO files, and insert them in the virtual CD-ROM drive. ImgBurn did the job under Windows 10.

Here is Forth/2, which I wrote, and Brian Mathewson documented, back in 1992-1994, running under OS/2 4.50

Next up we have STOICAL, a next generation FORTH system which has some promising features I'm interested in exploring. I had to get Debian 3 up and running to get it installed via dpkg

Jonathan Sachs initially wrote STOIC for the Data General NOVA machine, then moved on to write Lotus 1-2 3.
Jonathan Liles wrote STOICAL to run on Linux, then moved on to write NON.

Here is STOICAL running under Debian 3

Friday, October 02, 2015

CapabilityPipes v0.001 - A very rough draft of an incredibly powerful idea

This is a raw dump of an idea that came to me at 4AM... I hope it's coherent enough to catch on... I will of course keep refining it.

This is v0.001 of the idea

++ Capability Pipes

Unix/Linux is a set of tools which work together to allow you to pipe output from one program into another, and the resulting plumbing lets you do very powerful things. We need a similar set of tools for the capability security model. This would allow you to have complete and total control over your applications, your network useage, and everything your computer does on your behalf, in a rational and expandable manner.

Instead of trusting applications to do everything, why not use the pipe/api model to limit their connections to the world, so that you can tightly restrict the side effects of everything, as needed?

Give the user a traditional view of the world, just like the linux they have now, but instead of trusting applications blindly, force them all to use capability pipes (like file handles) to do all their I/O.

Of course, you could always default things to the current look/feel of a typical linux desktop, to make transitioning easy for users.

It is impossible to overstate the amount of power this would put back into the hands of users.

Examples, use cases:

A mute filter to allow control over the audio output of a web browser.
Filtering of which URLs a web browser is allowed to access
A batch file which could do more than chroot ever could, with all the limits hard enforced by the operating system
All file pipes would be chosen / supplied from outside the application.

iptables allows a linux system administrator to do very powerful things with the network stack of a machine... this would be a much more fine grained approach as you could control I/O of everything down to the bit level, or not... as you see fit, in the unix way.

You could count the bytes a web browser sends or recieves on each and every page. You could log things.

Digital Rights Management would be killed stone dead as a nice side effect.

Ad blocking could be scripts that users could tweak themselves.

Tuesday, July 17, 2012

Secure programming - good intentions

I recently read a good article about security practices in applications and software as a service. The author lays out some very good rules to help keep users information secure in today's threat environment. However, it strikes me strong reminder of the vast amount of effort we're wasting by trusting applications programs at all.

We should never completely trust any programs, services, or drivers outside of the very kernel of an operating system. We shouldn't have to. The millions of lines of code that are required to do even a basic database with a web front end are bound to have bugs which can lead to unintended and unwelcome side effects. The effects can be subtle to disastrous depending on what cascade of events happens.

The application programmer has no tools to prevent his program from exceeding its scope at a given task. Current operating system design holds that the user is the proper level of granularity for deciding what access a given task is to be allowed. All of the responsibility is then thrust upon programmers to keep things safe as a result. The programmer and/or install package is then responsible for setting all of the permissions on all of the the objects (files, pipes, registry entries, ports, etc) in the end system to be appropriate for the given tasks.

This is an impossible task, given that there can be literally millions of such permissions to set, and it only takes one mistake to let things pass through.

It doesn't have to be this way.

Capability based security is an approach that uses the principle of least access to enforce security in a much more appropriate manner. The millions of choices about what to deny are replaced with a much shorter list of what things to allow. This list is per process, not per user. It tells which files, folders, ports are to be allowed, and which mode (read only, write only, append only, full, etc).

This is a much more natural way to handle risk, as you simple decide what side-effects you are going to allow a given process to have, and the operating system enforces your decision. You don't have to trust your code, nor does the user. If something goes wrong, the maximum extent of damage is already known. You don't have to worry about the entire system shattering.

Isn't it time we stop spending so much effort on making our programs safe, when it could be better spent building better programs? Help support efforts which will deliver operating systems with capability based security, such as Genode, which provides a choice of 8 microkernels, capability based security, and runs native Linux applications.

Thanks for your time and attention.

Thursday, October 16, 2008

Blogging tools still suck

Here's an article which is interesting, insightful, and dead wrong...

To be able to do a full criticism of it, you really need to be able to do markup on it. That is, you need to be able to add a layer of commentary on top of it. Currently, the only way to do this is to copy the whole bloody thing, and then embed your own layer of markup into the copy. This sucks.

The idea of marking up text is hard coded into things like the Torah... which is 5000+ years old... yet the wizards that give us toys like IE, Firefox and Chrome can't seem to grasp this concept.

ugh!

Tuesday, June 03, 2008

Towards a new database model

Over the past few days I've been pondering the way databases get used, and I think I have a way to help make things better by shifting things around a bit. I'm going to dive right in to deep territory here...

The current crop of SQL type databases are all batch oriented. They just don't scale well because of this. We need to update the model from one of data that gets visited by the occasional query to one where the queries are always running and updating their results.

If you find yourself requerying a database without having changed the data yourself, you're wasting a huge amount of resources. You really only want to know what's changed, and it's silly to re-examine everything that isn't modified since the last run of the query.

Imagine a table in a database. If you recorded the initial state of the table, and all of the subsequent operations, you could perfectly replicate the state of the table.

If you then wrote code to implement this type of logic, it takes a lot less code to keep valid replicas. You would have a stream of changes instead of a set of facts that constantly needed to be reexamined.

If you ran a query against such a table, then ran the query against the stream of changes, you could have a running output that worked vastly more efficiently, since you only have to examine records that have changed.

I'm not a computer scientist, nor an engineer, but it seems to me that this model should be looked into a bit, and has some promise to save us all some time and effort.

Who knows, it might even save Twitter.

Friday, May 30, 2008

Flow based databases to fix twitter?

Twitter is a popular web based instant messenger service, which has been having problems with scaling lately. The facts seem to indicate that a traditional Relational DataBase Management System just isn't an appropriate fit in this case.
I believe that this is a perfect case for a new type of database, and perhaps even a completely new framework of programming. I don't have a good name for it, and the ideas are still vague in my head right now, but I'll try to outline what I'm thinking of below.

I would break Twitter up into a series of tables which get distributed and replicated among a cluster of servers. The tables would relate to each other, but not in the strict atomic transaction model, but one of eventual consistency. These tables would be:

Users
Queues
Subscriptions
Content

The real trick would be to treat the tables more like queues or pipes full of data that are appended to with very low random write frequency. The changes would then be aggregated in a channel to make it easy to keep multiple copies for coping with the heavy read access from all of the clients connected at any given time.

The bandwidth external to Twitter is pretty high, because you've got lots of people with many subscribers. The amount of actual non-duplicate data is surprisingly small... and I'm guessing that it's on the order of 3kbytes/second. The real challenge is distributing this 3kbytes in a consistent and reliable manner to all the places it gets copied out.

A flow based database would be able to handle such types of loads by maintaining many local copies and keeping them in eventual consistency by tying them into a channel. This is a place where multicast might be a really good strategy, if not a straight peer-to-peer network.

A flow database could be a straight up normal table in an RDBMS, or it could be something new optimized to the task.

What do you folks think?

Oh my G-d, Scoble killed Twitter...

So Robert Scoble is to blame for taking out Twitter, the all too popular instant messaging system, because he's just to gosh darned popular.

As if...

As I've stated before, the aggregate flow of all tweets in on the order of 10-20 messages per second, based on peeking at the message sequence numbers. It seems readily apparent that they've chosen the wrong architecture for this.

The tweets themselves should be aggregated with a sequence number, and user sequence number into a stream which should get copied to all of the boxes handling User Interface. Deleting a message would be handled by reposting it to the same queue with no data.

The subscription lists should be another stream.

The user database could be yet another stream.

All of those streams should aggregate out to about 10 kbytes/second. The process of splitting out the work to UI boxes is one of straight forward partitioning of the load, and maintaining a list of tweet sequence numbers for each person to see. The aggregated total of all of the three streams would sit on each UI box so they didn't have to get any of it from across the net.

That's my basic idea for scaling twitter. Comments welcome.

Saturday, May 17, 2008

QWERTY Considered harmful?

An interesting observation from Daniel Berger:

Larry Wall’s first rule of computer language design is, “Everybody wants the colon”³. Maybe the problem is that we just don’t have enough symbols on our darned keyboards. The result is that we’re left fighting over the scraps that QWERTY gives us, e.g. the colon. My opinion is that a limited number of usable characters limits our thinking and our expressiveness. (emphasis mine)

In my recent quest to push forward awareness of capabilities, the notion of expressiveness seems to be at the crux of everything. If you don't have a conventional way to express something, it takes a lot of work to come up with something to get your point across.

I believe that rich source code is overdue. The idea first came to me via Chuck Moore's ColorForth, but I think it could be applied in a wider array of places. The ability to simply highlight a section and make it a comment without worrying about syntax would be cool, but I'm sure there are far more powerful uses that would quickly arise, such as the ability to do literate programming, freely mixing source and documentation and content.

The arguments against any new programming technique usually tend towards the fact that pretty much any language can already express any program. These arguments always miss the expressiveness that a new language brings to making it easier to solve a certain class of problem.

Thursday, May 15, 2008

Almost useful capabilities demo - 0.012

So, I've done some more programming, and I'm now up to version 0.012. for my Capabilities demo.

The main page at http://127.0.0.1:81 is now the user page, with the protected content. You have to have a capabilities token to edit the data.

The administration page is at http://127.0.0.1:81/admin, which allows you to create and revoke capabilities, and see the current "protected content".

It's all implemented in python, in a single file, just to make it easy to demo.

The slow road to implementation

There are a lot more choices to make, and details to manage, in the process of programming a web server than I would have expected. I've made a lot of decisions, trying hard not to worry too much about it, to avoid analysis paralysis.

http://warot.com/python/ contains my recent python programs. I have to name them with .txt on the end or the web server tries to run them (and fails).

So far I've managed to get up to webserver008.py, which manages to create random numbers and keep a list of them available. At the rate I'm going, I'll have something usable in a few months, which is better than never. 8)

Should you choose to actually download and run the thing... here's what it does.

In a DOS box (or your command line equivalent)

Welcome to Mike Warot's capability based security demo web server, version 0.008

You can access it at http://127.0.0.1:81
Use control-c to tell it to shut down, which may take up to 10 seconds
started httpserver...

If you then open http://127.0.0.1:81 in a web browser, you'll get a very informative message like this:

this is the default content, not served from a file
Here are the valid tokens:

Now.. for the completely undocumented and poorly written section of code... change the URL to http://127.0.0.81/token, and you'll get something like this:

0451a66530b72a980725745c39992239

Isn't that lovely? If you then go back to the home page at http://127.0.0.1:81 and refresh the page, now you'll see:

this is the default content, not served from a file
Here are the valid tokens:
0451a66530b72a980725745c39992239 [Revoke]

That's a list of all the tokens, with the ability to revoke one of them. That's pretty much the full extent of the power of this demo.

It's a list, with a undocumented, poorly designed and inconsistent UI... but it's a step in the right direction. Oh... and it's licensed with the GPL so you can fork the project. ;-)

I hope to get a reasonable list view with the ability to issue tokens without having to mung the URL in the next day. Code to actually give out capabilities to edit a resource should be next week.

It's slow going, isn't it?

--Mike--

Monday, May 12, 2008

Random numbers are hard... whoda thunk it?

It turns out the hard part of doing capabilities on the Internet is the lack of a suitable random number generator... which kinda blindsided me. I'm trying to find an implementation of the ISAAC random number generator that would work in either Python, or active server pages, and haven't been able to find one. It's critical to give out unforgeable tokens, and a cryptographically secure random number generator is the way to go. You can't even think of using the built in random generator, because it's too easy for a determined attacker to guess the next output after a short run of samples.

So, eventually I'll find what I want (or be forced to port it myself)... and then I can get back to the examples... which will generate a token consisting of the object, the capability, and a random number to serve as salt to keep from having it forged.

Wish me luck.

Saturday, May 10, 2008

Twitter - Capabilites mashup

The idea of a distributed replacement for Twitter is floating around... and I've been writing about capabilities recently... what would we get if you merged the ideas?

#1. - Get rid of user accounts on twitter... just hand out the capability to post, which would be different each time it's issued, and individually revoke able. I'd hand them out in an Email, to limit the user base a bit and cut down on spam. You could always store the email address somewhere in a table along with the capability to know who it is if necessary.

#2. - Allow each user to then hand out tokens that would allow a direct message, which they could proxy and/or revoke themselves. This would make it possible for an end user to block someone from making direct messages, without the need for it to happen in the central code. The proxy that does this could be a separate service, and doesn't play a part in the security of the central capability provider code.

#3. - Allow each user to hand out tokens that would allow following them, which like above, they could proxy and revoke themselves. This turns the distributed twitter into an effectively private email system without too much work.

#4. - There's really not much difference between a tweet and a blog post, other than length. There's not much different between a private posting and and email... you could cover all of them this way.

Ok... it's 10:30 and I'm sleep deprived, so this might not be as coherent as it seems at the time... though I hope it is.

Capabilities offer a huge amount of flexibility when doing system design. They make it possible to break apart the logic of a complex application without having to worry about the combinatorial explosion that results from the conventional idea of having every piece of code enforcing a ton of rules.

What do you all think?

--Mike--

Wednesday, April 23, 2008

Guestimating twitter

I did some guestimates in Excel to see if I could figure out why twitter has issues with scaling. The results are surprising. I guestimated that the total byte flow of the raw tweets to/from the database would be on the order of 3k per second.

You can see the spreadsheet for yourself (thanks to Google docs).

The raw flow is surprisingly low. Mostly because we're talking about unique tweets, with no duplication to all of a person's followers, etc.

At the heart of it, any reasonable PC could do it... the bandwidth is very low.

The key question is architecture. If it's done right, the backbone of twitter has this very low bandwidth (but very important) flow multicast out to a bunch of front end machines that handle the subscriber's pipes. If you keep it basic, listing for each message (about 10 / second), if everyone had 5000 subscribers, you'd still only have to put out 50k messages /second for the whole twitter farm... not unreasonable if you broke it up into pieces.

I think I could build a twitter clone. I think a lot of us could. I think it would be a cool hacking challenge to see who could do it with the smallest collection of VMware machines against a simulated load/userbase.

What do you folks think?

--Mike--

Initialization code in Adobe Flash ActionScript 3.0

At long last, it's really, really, working right... and the code looks like this:

var initialized; // has to be declared, even if it's undefined

if (initialized == undefined) {
stop(); // stop the slide show on the first frame
Next_btn.addEventListener(MouseEvent.MOUSE_UP,function(evt:MouseEvent):void { nextFrame(); });
Back_btn.addEventListener(MouseEvent.MOUSE_UP,function(evt:MouseEvent):void { prevFrame(); });
}

initialized=true; // could make it anything, as long as it's not undefined

So, now I can just put all of the initialization code in the middle of the braces, and it all runs only once!

Tuesday, April 22, 2008

Slide shows and duplicate event listeners in Adobe Flash ActionScript 3

Today was my first day (of 30) as an Adobe flash developer. I focused on trying to help one of my co-workers who is an avid PhotoShop / CS2 user try to get up to speed on getting a basic slide show build in Flash with ActionScript 3.

Most of the tutorials you find on the net assume ActionScript 2, so that's a handicap from the get go. We did eventually find something that got us going... until the bug made itself obvious in the first 2 minutes... we were getting multiple presses out of our Next button if we dared go back to the first frame of the slide show. Then we noticed it happened with the back button as well.

The code looked like this:

stop(); // stop the slide show at the first frame

Next_btn.addEventListener(
MouseEvent.MOUSE_UP,
function(evt:MouseEvent):void { nextFrame(); });

Back_btn.addEventListener(
MouseEvent.MOUSE_UP,
function(evt:MouseEvent):void { prevFrame(); });

Which works... unless you happen to get back to the first frame, because it runs again every time you happen to touch the first frame. There doesn't appear to be any way to put Initialization code before all of the run time code, so you have to work around this (and this page is probably where you'll learn about this if you're having this problem in the future, and not Adobe's help files)...

To prevent duplicate event handlers, it's necessary to wrap them in code, each and every time, to prevent this.

Something like this:

stop(); // stop the slide show on the first frame

if (! Next_btn.hasEventListener(MouseEvent.MOUSE_UP))(
Next_btn.addEventListener(MouseEvent.MOUSE_UP,function(evt:MouseEvent):void { nextFrame(); }));

if (! Back_btn.hasEventListener(MouseEvent.MOUSE_UP))(
Back_btn.addEventListener(MouseEvent.MOUSE_UP,function(evt:MouseEvent):void { prevFrame(); }));

It seems to be quite a waste to me... perhaps someone has a more elegant way of initializing things to avoid having to add all of this code, and force it to only execute once per application?

I'm glad to have figured it out and got things working today, so now I can sleep. Hopefully this will help at least one other person who is frustrated with duplicate events firing from addEventListener messing up things.

Friday, March 21, 2008

Language Mismatch

Al Sweigart found my rant about Python, and responded almost instantly (the same work day...)

I was impressed, and thankful for the time and attention that resulted from the serendipitous intersection of my venting frustration, and his proactive preparation for advocating Python at work. I was impressed by the community that suddenly showed up at my first public sign of angst.

Now that I've had time to calm down, and consider the advice given... I'm still disappointed in Python... but it doesn't suck anymore. Al is right when he says

Your Ignorance Does Not Make a Programming Language Suck

I'm surprised by immutable strings in Python. It was frustrating hitting an unexpected roadblock in what otherwise seems to be a cool programming language that I'm just getting into.

I can do what I want, but not in the way I had intended. I have to use a kludge to work around the immutable strings in Python, or just give up and find another way. Needless to say, it's frustrating when a language just doesn't match your expectations... it's an impedance mismatch.

I've read PEP 3137 - Immutable Bytes and Mutable Buffer, and at first I was hopeful that change was in the air, and I could really LIKE python again... but alas... the immutable string is here to stay.

Python has so many cool features, it's really strange that they can't handle the idea of strings that are variables, or variable parameters, like Pascal. Even a namespace hack to allow access one level up would be better than nothing.

I'm finding it really hard to let go of string variables, and I guess that's just a mismatch for me to be aware of in the future.

Perhaps I'll have to implement a sourcebuffer class, with functions like expect, getchar, getstring, getnumber, getfloat, etc.

I've managed to sidestep the issue for now by using split and some other tricks.

I do like the trick given in the comments:

a,c = s[:1], s[1:]

Of course, it would be nice not to have to keep putting the string back into itself... which I think is really my main objection to immutable strings. Why should I have to keep manually putting data back into the place it just came from? Doesn't that increase the risk of putting back into the wrong place?

Oh well... it's interesting having my 15 minutes of fame over a programming issue.

Thanks to everyone for your time and attention.
--Mike--

Tuesday, March 18, 2008

Python sucks!

All I want to do is to take a string, and pull the first character off of it in a function....

like this:

def pullchar(s):
"""pull a single character from the front of a string"""
c = s[0]
s = s[1::]
return c

Can python do this? Not really...

Why? Because there doesn't seem to be any way to pass a variable to a function...

WTF?

--Mike--