Friday, August 19, 2005

The cost of C

We've all come to expect that any given computer program is going to have a bug or two. However, the nature of the cost of these bugs is changing. I feel it's time to re-examine our choice of languages for implementing Operating Systems.

{I'm biased... I admit it... but this isn't meant to be flame bait, and I hope it contributes to the discussion. I'd especially like to know which bits I'm wrong about}

A somewhat short history of programming

In the beginning was the Flying Spaghetti Monster. It was written in assembler, and could do many great and mysterious things with its noodley appendage. It's wrath was written in the core dumps of the devout few who were its followers.

The first programming environments allowed the user total freedom to use the machine as they saw fit. The machines were so expensive, it was well worth some extra time on the part of the programmer to wring out a few operations here and there in the name of efficiency. The hacker culture grew out of this need to wring the most possible work out of the fewest possible machine cycles.

As machines became more powerful, and lower in cost, the benefit of wringing out the extra few cycles decreased, as the programmers time became relatively more valuable. The growing complexity of programs, resulted in the appearance of procedural programming, which breaks programs down to a set of procedures and funtions. C was one of these procedural progamming languages.

Structured programming relies on the concept of limited scope to reduce the coupling between portions of a program, in an effort to localize and reduce the resulting effects of logic errors, and other bugs. The benefits of structured programming are now an accepted fact in most corners of the software development world. Pascal was one of the first popular structured programming languages.

Over time, other improvements have made the scene, including:
  • Type checking
  • Bounds checking
  • Object oriented programming
  • Native strings
  • Garbage Collection
  • Functional Programming
  • Aspect Oriented Programming
  • Programming by Contract
All of these improvements are aimed at improving the productivity of the programmer, at the expense of run time. As computers continute to become faster and less expensive, this appears to be a worthwhile tradeoff.

Why C?

The Unix operating system became widespread in academic circles, and was the first widely ported to a large number of environments. Liberal licensing terms and access to the source code helped spread the popularity of the C language among it's users.

Meanwhile, the computer science community developed a strong interest in the Pascal programming language. UCSD Pascal was widely distributed, but had its share of problems mostly due to the interpreted nature of the UCSD implementation.

The movement towards structured programming in the 1970's met head on with this large body of C programmers. Because programs in UCSD Pascal ran many times slower than those written in C, the C programmers won the battle. This is especially true in terms of Operating Systems design, where speed is more important than programmer time.

C remains the implementation language of choice for operating systems, even in 2005.

The cost of C

We've come to accept that if a given program fails in some fashion, we can track down the bug to a specific piece of code, and just fix it. The costs are limited to the inconvenience and lost work of the user, along with the costs of incorrect outputs. Structured programming limits the action of a bug to its immediate module and those calling it.

However, when pointers are involved, any bug can invalidate the scope limits of everything, including the operating system. Thus pointer bugs can result in seeming random crashes. The debugging of pointer errors is one of the toughest jobs facing a programmer. It is this fact which lead to the introduction of largely pointer-free programming languages.

The C language as utilized today still forces the programmer to deal directly with pointers. Unlike other variables, when a pointer has an incorrect value, the scope of the error immediately becomes unlimited. A single pointer error in any line of C code has the capacity to effect any other variable or result. Thus pointer errors are a special danger.

The C language also does not deal well with complex data types, and buffers for data must be manually allocated and removed. The problem of buffer overflows has been dramatically demonstrated in the various worms that exploited them to great cost in the last few years.

All of this results in C programs tending to have buffer overflow issues, as well as pointer handling problems.

Conclusion

The choice of the C language for implementation of operating systems made sense in the 1970s, but is no longer appropriate as it currently is utilized. The buffer and stack overflows that are more prevalent in C programs provide more targets for exploitation, and reduce our collective security. It's time for an alternative.



I'm just one guy... with an opinion... which might be wrong...

Thanks for your time and attention

--Mike--

Update 5/3/2007 - A commenter left this

A good article describing overflow exploits in a basic language


http://www.loranbase.com/idx/142/1921/article/Writing-Buffer-Overflow-Exploits-for-Beginners.html

You can find the original article (as near as I can tell) here.

No comments: