Entries Tagged as 'People'

Premature optimization is the root of all evil. D.E. Knuth

Premature optimization is the root of all evil.

D.E. Knuth
Professor Emeritus of
The Art of Computer Programming at Stanford University

Free software heroes: from Stallman to Google, a list of inspiring individuals who made everything possible

By Tony Mobily from Free Software Magazine

This article was originally published on “2008-06-15 13:09:55 +0000”. I re-read it, and decided that it deserved to be re-published in Free Software Magazine as a tribute to those individual who made GNU/Linux possible. Every field has its own key individuals who donated much of their time to the ideas they believed in. Each one of them is a reminder that it’s up to individuals to make a difference — and to make history. Their work affects large chunks of the world’s population, and bring amazing changes to the way we see and experience the world. The free software world has its own heroes. You probably know a lot of them already; if you don’t, you probably use the results of their work on a daily basis. This article is both a tribute to them, and a summary to those people who are new to the free software world. Some of the key people

Richard Stallman.. With rms, I don’t even know where to start. He started the GNU project, which is a rather important part of the GNU/Linux operating system, in 1983 (that’s right: nineteen eighty-three!) and set up the Free Software Foundation in 1985. He wrote the original GNU C compiler—yes, the program used to transform programs from programming language to executable code. He spends most of his time being a political and software activist. If you want to see what dedication is, read his blog and see his beyond-hectic travelling schedule.

Pamela Jones. Talking about dedication, Pamela Jones is the author of Groklaw, arguably the web site that saved GNU/Linux and free software in general from SCO/Microsoft’s claws. Pamela Jones is a truly outstanding individual. She authored around 1000 articles over the last 3 years—and a lot of them are full-length pieces which resonated loudly in the IT industry as a whole.

Linus Torvards.. He wrote Linux, the kernel, without which the GNU utilities wound’t have anything to run on. Linus’ kernel was timely, and was released under the GPL (written by Richard Stallman) in 1991. Linux is a very important part of the GNU/Linux project.

Mark Shuttleworth. He’s the founder of Canonical, which created Ubuntu Linux. The short version of Shuttleworth’s story is simple: he made a small fortune selling Thawte (which made digital certificates) to VeriSign. He then went through the Russian astronaut training programme and went to space. He came back, and founded Canonical in order to create Ubuntu Linux, which is arguably the most popular and innovative GNU/Linux distribution aimed at end users.

Larry Page and Sergey Brin. They created Google. Regardless of the silly spelling mistake, you may have heard of it: you type a sentence in their web page, and you magically get a list of relevant pages as a result… you should check it out if you haven’t yet. Although Google is not a free software company, and a lot of their software is indeed proprietary, they still released vast amount of free software and (more importantly) contributed to the creation of free standards that are free software friendly (think of OpenSocial vs. Facebook, or Android vs iPhone/Windows Mobile).

Bob Young and Matthew Szulik. Bob Young created Red Hat, one of the most successful free software companies. Under young’s leadership, Red Hat established itself as the leading GNU/Linux distribution in the server space. Red Hat’s contributions to the Linux kernel and free software in general are immense. Matthew Szulik was Red Hat’s CEO after Young, and made the company even stronger. More importantly, Szulik had an historical (and unconfirmed) dinner with Steve Ballmer, Microsoft’s CEO, who tried his best to convince him to enter a compromising patent agreement with Microsoft. Szulik said “no”, although the agreement would have probably been very lucrative for Red Hat. Signing it would have crippled the free software world.

Jimmy Wales. He is the creator of another web site you might have heard of: Wikipedia. I don’t need to put a link here: just type anything in Google (see above: that’s the fancy search page I talked about a minute ago), and you’ll probably find one or more Wikipedia pages listed… Wikipedia’s software is available under a free license (GPL). Yes, that’s the same license created by Richard Stallman (see above). While Wikipedia itself is not free software, it was one of the first times (if not the first time) that the free software philosophy was applied to a non-technical field. And it was immensely successful.

Lawrence Lessig. He created the Creative Commons licenses, which allow artists to release their works under licenses that have the same principles as free software licenses.

Sir Tim Berners-Lee. He invented the Wold Wide Web. And released the specifications (HTTP and HTML) for free, rather than asking companies and developers to enter unacceptable agreements on supposedly non-discriminatory terms. Without him, the internet today could be dominated by MSN- and AOL-like proprietary protocols and chaos. And I mean: chaos.

Blake Ross. He’s the man who, as a teen-ager (in 2003), realised that the free software movement was losing the web browser world because there wasn’t a lean, free web browser available. So, he forked Mozilla and created another piece of software you might have already heard of: Firefox. The rest is history. In fact, it’s a history with a 25% market share, which is impressive if you consider that each copy of Firefox needs to be downloaded and installed, as opposed to using what comes with Windows directly.

Dries Buytaert. The author of Drupal, one of the greatest Content Management Systems out there. (Yes, I am biased, since I am a Drupal developer.) Most people aren’t Drupal users; however, a lot of people are users of web sites that use Drupal as their backend.

Keith Packard. He was the force behind XOrg, a fork of XFree86. GNU/Linux today has a fantastic graphic subsystem thanks to him. This interview with Keith Packard, which dates back to 2003, explains part of what happened. Note that in the interview nothing was set in stone just yet, and XOrg was still more or less an “idea”. Today, it’s a strong reality in the free software world.

Bram Cohen. The mathematical genius creator of BitTorrent. Unlike pretty much everybody else, he released the specifications and the reference implementation of his protocol for free. BitTorrent proved to be crucial for free software, since it made the download of ever-growing distributions possible. Other players (see: the RIAA) are not as impressed by the protocol’s potential.

Michael Tiemann. He founded Cygnus back in 1989. Cygnus Solutions was one of the first attempts to “make money” out of free software. Tiemann also wrote the GNU C++ compiler and worked on the GNU C compiler and debugger, which are crucial pieces of software that change the IT world.

The world without them What would the world be like if those individuals had taken a plumbing career instead? You can argue that if they hadn’t done it, well, somebody else may have. That word “may” is the problem here. (This also brings the more the more theoretical problem of the “near-miss list”: the list of people who did take a plumbing career instead of helping the world, but that’s a different story…) Without Pamela Jones, many (including me) believe that the SCO case against Linux could have taken a much nastier turn. Without Stallman, the free software movement wouldn’t be nearly as organised and strong. Without Shuttleworth, a proprietary GNU/Linux distribution could have become the market leader (it was already happening, slowly, with Linspire). Without Larry Page and Sergey Brin there would be no Google. No Summer of Code. No Android. No OpenSocial—and the list goes on and on. Without Bob Young and Matthew Szulik, there might be no clear leader in the GNU/Linux server market, or—worse — Red Hat might have given in to Microsoft’s pressure to enter a disastrous patent deal. Without Jimmy Wales there would be no Wikipedia. Without Lawrence Lessig, tons of artworks wouldn’t be available through the World Wide Web. And by the way, without Sir Tim Berners-Lee there would be no World Wide Web. Without Blake Ross, you might have to use Interenet Explorer to do anything online. Without Dries Buytaert, Drupal wouldn’t exist. Without Keith Packard, we might be stuck with the monolithic, sort-of-free-but-not-quite XFree86. Without these individuals, basically, the world would be a much, much grimmer place to live in.

Care joining the club?

By reading this article, you probably get the idea: each one of those individuals is smart, dedicated, and willing to sacrifice big chunks of his personal life in order to improve the world. One of the fantastic things about free software is that there is no bar. Anybody can enter it. Your name could well be in this list. All you need, is phenomenal amounts of work and passion for your field—whichever that is. I am not in that list, although I always thought I’d love to be be. I am doing my best with Free Software Magazine, and every time I am tired, or lack inspiration, I look up to those who made this world possible—and strive to do just as much, just as well. We mortals might not go as far as Sir Tim Berners-Lee or Richard Stallman or Pamela Jones. But… we can only try.

Google’s open-source balancing act

Chris DiBona’s job–manager of Google’s open-source programs–is a balancing act.

Google consumes a lot of open-source software for its own highly profitable business. But as he oversees the search powerhouse’s open-source work, DiBona has to ensure that the company reciprocates. It can’t be all take and no give.

Chris DiBona, Google's manager of open-source programs

Chris DiBona, Google’s manager of open-source programs

(Credit: Stephen Shankland/CNET News.com)

Free and open-source software advocates can be powerful allies–but also vocal critics. For example, some have critized Google for its lack of support for the Affero GPL license, which can require those using software for a publicly available network service to share modifications they’ve made to an AGPL software project.

DiBona thinks Google strikes the right balance, though, by offering its own modifications back to many open-source projects, advocating the philosophy in general, and trying to nurture the next generation of open-source programmers.

DiBona has been steeped in open-source software for more than a decade. Before his job at Google, he worked for Slashdot, still an influential virtual water cooler for open-source discussion. Slashdot was part of Linux server maker VA Linux Systems, which had a spectacular initial public offering in 1999 followed not long after by a drastic cutback.

DiBona will be preaching the open-source gospel at the Google I/O conference Wednesday–"open source is too good to be true and thus must be magic," according to the agenda–but I sat down with him beforehand to hear his view of open-source software at Google.

What’s the view of open source within Google?
I asked myself, "Who am I trying to address?" The world of open-source business? No. The world of the open-source enthusiast? No. I’m really looking to work with open-source developers. We came up with these goals for our group: to support open-source development in general, which means to support open-source infrastructure; support the release of open-source code, from Google and in general; and to create more open-source developers, because especially when I started, there was a perception that Google took a lot of people from the open-source world and then went away. It was partly true, because people would come here and say, "Wow, I’ve been working on my open-source project forever, and I want a new problem," and we have a very good class of new problem. So they kind of went away.

That was too bad. The last thing we wanted as a company was to hurt the release of open-source software, because we consider it pretty important. We use a ton of it. Every engineer we bring on–how much open-source do they want to use? We have new packages and new libraries being brought into the company all the time. It’s our group’s job to track that. As we brought people in, we wanted to be sure more open-source developers were being created. So that’s where we came up with the Google Summer of Code, and now we have a high-school flavor of that as well. I think we’ve made a very real impact in creating new people in the open-source world.

I’m curious about maintaining a balance between contributing back to upstream projects vs. maintaining your own internal forks. How do you go through that evaluation?
Google considers some projects more important than others. Obviously the Linux kernel is incredibly important. Every time you use Google, you’re using a machine running the Linux kernel. We have a fairly large kernel team, and we employ people whose job is just to work on the external kernel. Andrew Morton is a good example of that. We try to make sure those guys patch out (submit their modifications to the main open-source project) whenever they can. It’s usually more dictated by the engineer’s time than it is any lack of desire on our part. I always wish we were able to release more, but it takes time for an engineer to do that. For the larger efforts, it’s a little easier because there are more personnel on it.

The same thing goes for our compilers (software that translates programmers’ code into instructions a computer understands). The great thing about our compiler team is they patch as a matter of their jobs. They’re always patching out things from the compiler work we do internally to the outside world. We recently released the new linker, Gold –Ian Lance Taylor works for us on our compiler team. He’s been on the GCC team forever. He used to be at Cygnus (a company that developed GCC). We have a lot of ex-Cygnus people.

Then there are Googlers who just want to patch into an existing projects. They found a bug, they want to add a feature. That takes no time at all. Our team looks at the first couple patches an engineer wants to send out, makes sure the engineer knows what they’re doing with the outside world, then they’re basically given free rein to do that. They keep us posted on what they’re patching. We want to make sure our code gets out to the projects as fast as possible because projects keep on iterating. If you don’t get your patches in, they won’t get accepted, because they’ll be too old or won’t matter. If you’ve got a patch, getting it out there fast is better for us, because then as that project iterates and comes back into the company, we don’t have to reapply a patch.

What are the most important open-source projects you ingest?
The kernel, compilers–GCC, the Python interpreter. Python is very important to us. Google App Engine –it’s a Python hosting system, basically. Java is very important to us, and that’s become open-source now. We have some very good Java people working for us–Josh Block , Neil Gafter –they’ve got a great handle on that technology.

Once you get past those three projects–the compilers, the languages, the kernel–then you go to the libraries. For us that’s OpenSSL , zlib , PCRE . MySQL is hugely important to us. Past that, it starts tapering off pretty quick.

Has the open-sourcing of Java changed anything for you?
Not really. I think it had more impact on the outside world than for us. Java is a fairly mature language now. We’ve been using it for a long time. Before, it was the JCP (the Java Community Process to govern Java’s future)–it had the rubric of openness around it. It was never really not so open. There are questions around what open source means now around Java, specifically J2ME (Java’s mobile edition for gadgets such as cell phones) and the TCK (the technology compatibility kit) .

Are you using a super-uber-customized Linux kernel, or are you guys pretty much vanilla?
I don’t think there’s such thing as a customized Linux kernel anymore. The kernel is incredibly flexible. It’s got all these different architectures. I think the Linux kernel itself is this ubercustomized thing.

But do you have a lot of in-house customizations?
Not a lot. Google is exposed to some interesting hardware before the rest of the world. So internally we’ll be sampling code for that hardware. So that’s pretty custom stuff. But eventually that goes to the outside world. We funded some work with a group in Berkeley called Xorp to bring high-speed Broadcom networking chip functionality to Linux. It’s not in our interest to keep control of it ourselves. So is it customized? Absolutely. But is it heavily customized? I don’t think it is as heavily customized as you might think.

Is it true you still use 2.4 kernels?
In some places, sure.

How about for the core search product?
I don’t know how it’s partitioned out. When you think of Google, you think of search being on top of a kernel that’s static. It’s not always like that. It differs on data centers. I think 2.6 predominates, though.

There’s been discussion about reciprocity. When General Public License (GPL) version 3 came out, the Free Software Foundation dumped the Affero clause out of GPLv3 and split it out into a separate license. Eben Moglen (co-founder of the Software Freedom Law Center and then counsel to the Free Software Foundation) said, to paraphrase, "If Google starts getting too parasitic, then we’ll re-evaluate it." How worried are you of getting a negative perception of using more than you contribute?
I do worry about this. I think it is a largely incorrect perception. You can always give out more, and there are always people who will never be satisfied. Could we be giving back more? Sure. One of the ways I ameliorate that problem is (through) projects like the Summer of Code. Google is releasing every year, not counting Android or the really large open-source projects like GWT , a new project every two or three weeks. Or patching hundreds of projects a month. I conservatively estimate we’re releasing about a million lines of code a year from the company.If you talk to open-source developers–people who are working on projects–I think they understand that. It came back to who do we want to interact with. I always felt the enthusiast community would understand that eventually, and I think that’s true. There are some people who are upset with us because we didn’t embrace the Affero-style GPL, but it’s not practical for us to do so. When they had an Affero-style clause in GPLv3, the thing I told Eben was, "Listen, you can adopt whatever you want. We’ll still keep on backing up the FSF and the SFLC as much as we can, but it means we won’t be able to use that license inside, because it won’t be practical for us to do so." I think that’s a very realistic response. The Affero GPL is out there. That’s great for the people who use it. It’s just not for us.

That’s the thing about free software. You’re not obligated to use it. We have enough fine-grained control within the company that we don’t use things we don’t want to use.

What are your preferred licenses?
We generally release under the Apache License–Apache 2 . We think it has the fairest language of the licenses. And the GPL requires a lot of management–more than we have time for to run a project well under that license–patch flow and all that. Apache 2 encourages people to take the thing and run with it. That’s what we’re going for when we release code, whether it’s to have people adopt technologies we really like, or for API examples. That said, we’ve released things under the GPL, LGPL, GPL version 3, BSD. We default to the Apache License.

To what extent to you subsidize gurus to sit around and work on important projects?
We’ve got people like Jeremy Allison and Andrew Morton and some of Guido (van Rossom)’s time. He’s been working pretty heavily on Google App Engine and Mondrian . It’s more common that we…try to make open source a part of their job, so they’re patching out to the libraries they use. We think that’s more healthy than having people whose job is just working on an open-source project.

You use open source a lot internally. Do you have some kind of intellectual property vetting or review before you use it?
We do. There are two ways we do this. When somebody wants to bring a piece of code in from the outside world–open-source or commercial–you need to put it inside a special directory we call "third party." They’re required to put in a file called readme.google (that describes) where they got that software, how it’s licensed, what category that license falls under. We look for things that are obvious. There are some projects that have dubious intellectual property provenance, and we know those, and we know the people who run them, and we tend not to use those ever.Since Google doesn’t distribute a lot of software, we have it easier than companies that ship hardware and software. We have a couple situations where that does happen–the Google Search Appliance, some of the downloadable applications. Those get a little extra attention. Similarly, when we have larger projects like Google Android, we have a higher ceremony–every two weeks we get together and see if the license picture has changed.

The tracking model works really well for us. We have tools written where a program manager or a release manager can turn on a certain level of warning within the build tool and it will tell them what open-source software they have and how they have to comply with it. At that point we set up a mirror for them as they get closer to release.

So that’s the first way we track things. The second way is whenever a Googler puts in a changelist now–this is something we’re just starting to do–we compare it against all known open-source code on the Internet using our Code Search product. We compare the changelist that comes from your average Google engineer against that database of code and we look for intersections. When we find an intersection, we take a look and see if it’s truly a copy. And if it is, we make sure it’s in the right directory and that it’s properly labeled. And we call up the engineer if it isn’t and make sure it gets tagged properly so we can do the right thing by these licenses.

That tool is kind of in its infancy. We’re trying to figure out ways to automate what it does. But it’s great because it scales programmatically. Our group’s goal is not to break builds or stop development. It’s to enable developers to use as much open-source as possible. We think it’s healthy, because then they’re not writing that code, they’re writing other code.

Do you vet code for patent or copyright?
No. We have legal people on our lists. We have two main lists that track these things. Open-source licensing for incoming code and open-source releasing for outgoing code. Legal has a presence there. Patents are incredibly tricky.

Is it easier to get hired at Google if you have experience maintaining your own open-source product or patch?
If you have made a name for yourself in open source, clearly it helps. If you have a healthy project in open-source, I believe it helps. One thing I see on hiring committees is when somebody has an open-source history, it’s really great. You can just look at that history. Interviews are great, but they’re not very deep. They’re only 45 minutes long. So how can you really get a feel for if a person is good at programming, at computer science?

Or at social relations, for that matter.
Open source really reveals that incredibly quickly. You can look at their code, at their activity on mailing lists, how they deal with bugs from real people, and real user problems. That’s an incredible resource.

The Summer of Code isn’t really a recruiting program. If it is, it’s a really expensive one. Last year we created about 2 million lines of open-source code across the 900 students who took part. Of those probably a third are going to stick around with their projects, because the rest have to go back to college.

We have a couple students who have been in the program two or three years. The whole point is to support kids over the summer so they can go and program and not get some other job that has nothing to do with computer science. It’s our fourth year doing it. This year we’ve go 1,109 students doing it across 95 countries.

Posted by Stephen Shankland from CNET News

Interview with Donald Knuth

By Donald E. Knuth, Andrew Binstock Informit.com

Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Andrew Binstock: You are one of the fathers of the open-source revolution, even if you aren’t widely heralded as such. You previously have stated that you released TeX as open source because of the problem of proprietary implementations at the time, and to invite corrections to the code—both of which are key drivers for open-source projects today. Have you been surprised by the success of open source since that time?

Donald Knuth: The success of open source code is perhaps the only thing in the computer field that hasn’t surprised me during the past several decades. But it still hasn’t reached its full potential; I believe that open-source programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code.

For example, open-source code can produce thousands of binaries, tuned perfectly to the configurations of individual users, whereas commercial software usually will exist in only a few versions. A generic binary executable file must include things like inefficient “sync” instructions that are totally inappropriate for many installations; such wastage goes away when the source code is highly configurable. This should be a huge win for open source.

Yet I think that a few programs, such as Adobe Photoshop, will always be superior to competitors like the Gimp—for some reason, I really don’t know why! I’m quite willing to pay good money for really good software, if I believe that it has been produced by the best programmers.

Remember, though, that my opinion on economic questions is highly suspect, since I’m just an educator and scientist. I understand almost nothing about the marketplace.

Andrew: A story states that you once entered a programming contest at Stanford (I believe) and you submitted the winning entry, which worked correctly after a single compilation. Is this story true? In that vein, today’s developers frequently build programs writing small code increments followed by immediate compilation and the creation and running of unit tests. What are your thoughts on this approach to software development?

Donald: The story you heard is typical of legends that are based on only a small kernel of truth. Here’s what actually happened: John McCarthy decided in 1971 to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford, using the WAITS time-sharing system; I was down on the main campus, where the only computer available to me was a mainframe for which I had to punch cards and submit them for processing in batch mode. I used Wirth’s ALGOL W system (the predecessor of Pascal). My program didn’t work the first time, but fortunately I could use Ed Satterthwaite’s excellent offline debugging system for ALGOL W, so I needed only two runs. Meanwhile, the folks using WAITS couldn’t get enough machine cycles because their machine was so overloaded. (I think that the second-place finisher, using that “modern” approach, came in about an hour after I had submitted the winning entry with old-fangled methods.) It wasn’t a fair contest.

As to your real question, the idea of immediate compilation and “unit tests” appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works and what doesn’t. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be “mocked up.”

Andrew: One of the emerging problems for developers, especially client-side developers, is changing their thinking to write programs in terms of threads. This concern, driven by the advent of inexpensive multicore PCs, surely will require that many algorithms be recast for multithreading, or at least to be thread-safe. So far, much of the work you’ve published for Volume 4 of The Art of Computer Programming (TAOCP) doesn’t seem to touch on this dimension. Do you expect to enter into problems of concurrency and parallel programming in upcoming work, especially since it would seem to be a natural fit with the combinatorial topics you’re currently working on?

Donald: The field of combinatorial algorithms is so vast that I’ll be lucky to pack its sequential aspects into three or four physical volumes, and I don’t think the sequential methods are ever going to be unimportant. Conversely, the half-life of parallel techniques is very short, because hardware changes rapidly and each new machine needs a somewhat different approach. So I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity.

Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?

Donald: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Titanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]

How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)

The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn’t be any better off, considering the kind of work I do—even though I’m using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. (No—that’s the wrong metaphor! “Pipelines” actually work for me, but threads don’t. Maybe the word I want is “bubble.”)

From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation. I also admit that I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they’ve begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most—at the cost of incompatibility with legacy x86 programs.)

Andrew: One of the few projects of yours that hasn’t been embraced by a widespread community is literate programming. What are your thoughts about why literate programming didn’t catch on? And is there anything you’d have done differently in retrospect regarding literate programming?

Donald: Literate programming is a very personal thing. I think it’s terrific, but that might well be because I’m a very strange person. It has tens of thousands of fans, but not millions.

In my experience, software created with literate programming has turned out to be significantly better than software developed in more traditional ways. Yet ordinary software is usually okay—I’d give it a grade of C (or maybe C++), but not F; hence, the traditional methods stay with us. Since they’re understood by a vast community of programmers, most people have no big incentive to change, just as I’m not motivated to learn Esperanto even though it might be preferable to English and German and French and Russian (if everybody switched).

Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn’t taken the whole world by storm. He observed that a small percentage of the world’s population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in both subsets.

Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s—it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I’ve ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably.

If people do discover nice ways to use the newfangled multithreaded machines, I would expect the discovery to come from people who routinely use literate programming. Literate programming is what you need to rise above the ordinary level of achievement. But I don’t believe in forcing ideas on anybody. If literate programming isn’t your style, please forget it and do what you like. If nobody likes it but me, let it die.

On a positive note, I’ve been pleased to discover that the conventions of CWEB are already standard equipment within preinstalled software such as Makefiles, when I get off-the-shelf Linux these days.

Andrew: In Fascicle 1 of Volume 1, you reintroduced the MMIX computer, which is the 64-bit upgrade to the venerable MIX machine comp-sci students have come to know over many years. You previously described MMIX in great detail in MMIXware. I’ve read portions of both books, but can’t tell whether the Fascicle updates or changes anything that appeared in MMIXware, or whether it’s a pure synopsis. Could you clarify?

Donald: Volume 1 Fascicle 1 is a programmer’s introduction, which includes instructive exercises and such things. The MMIXware book is a detailed reference manual, somewhat terse and dry, plus a bunch of literate programs that describe prototype software for people to build upon. Both books define the same computer (once the errata to MMIXware are incorporated from my website). For most readers of TAOCP, the first fascicle contains everything about MMIX that they’ll ever need or want to know.

I should point out, however, that MMIX isn’t a single machine; it’s an architecture with almost unlimited varieties of implementations, depending on different choices of functional units, different pipeline configurations, different approaches to multiple-instruction-issue, different ways to do branch prediction, different cache sizes, different strategies for cache replacement, different bus speeds, etc. Some instructions and/or registers can be emulated with software on “cheaper” versions of the hardware. And so on. It’s a test bed, all simulatable with my meta-simulator, even though advanced versions would be impossible to build effectively until another five years go by (and then we could ask for even further advances just by advancing the meta-simulator specs another notch).

Suppose you want to know if five separate multiplier units and/or three-way instruction issuing would speed up a given MMIX program. Or maybe the instruction and/or data cache could be made larger or smaller or more associative. Just fire up the meta-simulator and see what happens.

Andrew: As I suspect you don’t use unit testing with MMIXAL, could you step me through how you go about making sure that your code works correctly under a wide variety of conditions and inputs? If you have a specific work routine around verification, could you describe it?

Donald: Most examples of machine language code in TAOCP appear in Volumes 1-3; by the time we get to Volume 4, such low-level detail is largely unnecessary and we can work safely at a higher level of abstraction. Thus, I’ve needed to write only a dozen or so MMIX programs while preparing the opening parts of Volume 4, and they’re all pretty much toy programs—nothing substantial. For little things like that, I just use informal verification methods, based on the theory that I’ve written up for the book, together with the MMIXAL assembler and MMIX simulator that are readily available on the Net (and described in full detail in the MMIXware book).

That simulator includes debugging features like the ones I found so useful in Ed Satterthwaite’s system for ALGOL W, mentioned earlier. I always feel quite confident after checking a program with those tools.

Andrew: Despite its formulation many years ago, TeX is still thriving, primarily as the foundation for LaTeX. While TeX has been effectively frozen at your request, are there features that you would want to change or add to it, if you had the time and bandwidth? If so, what are the major items you add/change?

Donald: I believe changes to TeX would cause much more harm than good. Other people who want other features are creating their own systems, and I’ve always encouraged further development—except that nobody should give their program the same name as mine. I want to take permanent responsibility for TeX and Metafont, and for all the nitty-gritty things that affect existing documents that rely on my work, such as the precise dimensions of characters in the Computer Modern fonts.

Andrew: One of the little-discussed aspects of software development is how to do design work on software in a completely new domain. You were faced with this issue when you undertook TeX: No prior art was available to you as source code, and it was a domain in which you weren’t an expert. How did you approach the design, and how long did it take before you were comfortable entering into the coding portion?

Donald: That’s another good question! I’ve discussed the answer in great detail in Chapter 10 of my book Literate Programming, together with Chapters 1 and 2 of my book Digital Typography. I think that anybody who is really interested in this topic will enjoy reading those chapters. (See also Digital Typography Chapters 24 and 25 for the complete first and second drafts of my initial design of TeX in 1977.)

Andrew: The books on TeX and the program itself show a clear concern for limiting memory usage—an important problem for systems of that era. Today, the concern for memory usage in programs has more to do with cache sizes. As someone who has designed a processor in software, the issues of cache-aware and cache-oblivious algorithms surely must have crossed your radar screen. Is the role of processor caches on algorithm design something that you expect to cover, even if indirectly, in your upcoming work?

Donald: I mentioned earlier that MMIX provides a test bed for many varieties of cache. And it’s a software-implemented machine, so we can perform experiments that will be repeatable even a hundred years from now. Certainly the next editions of Volumes 1-3 will discuss the behavior of various basic algorithms with respect to different cache parameters.

In Volume 4 so far, I count about a dozen references to cache memory and cache-friendly approaches (not to mention a “memo cache,” which is a different but related idea in software).

Andrew: What set of tools do you use today for writing TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what do you use for the coding?

Donald: My general working style is to write everything first with pencil and paper, sitting beside a big wastebasket. Then I use Emacs to enter the text into my machine, using the conventions of TeX. I use tex, dvips, and gv to see the results, which appear on my screen almost instantaneously these days. I check my math with Mathematica.

I program every algorithm that’s discussed (so that I can thoroughly understand it) using CWEB, which works splendidly with the GDB debugger. I make the illustrations with MetaPost (or, in rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some homemade tools, like my own spell-checker for TeX and CWEB within Emacs. I designed my own bitmap font for use with Emacs, because I hate the way the ASCII apostrophe and the left open quote have morphed into independent symbols that no longer match each other visually. I have special Emacs modes to help me classify all the tens of thousands of papers and notes in my files, and special Emacs keyboard shortcuts that make bookwriting a little bit like playing an organ. I prefer rxvt to xterm for terminal input. Since last December, I’ve been using a file backup system called backupfs, which meets my need beautifully to archive the daily state of every file.

According to the current directories on my machine, I’ve written 68 different CWEB programs so far this year. There were about 100 in 2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely convenient “change file” mechanism, with which I can rapidly create multiple versions and variations on a theme; so far in 2008 I’ve made 73 variations on those 68 themes. (Some of the variations are quite short, only a few bytes; others are 5KB or more. Some of the CWEB programs are quite substantial, like the 55-page BDD package that I completed in January.) Thus, you can see how important literate programming is in my life.

I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux. Incidentally, with Linux I much prefer the keyboard focus that I can get with classic FVWM to the GNOME and KDE environments that other people seem to like better. To each his own.

Andrew: You state in the preface of Fascicle 0 of Volume 4 of TAOCP that Volume 4 surely will comprise three volumes and possibly more. It’s clear from the text that you’re really enjoying writing on this topic. Given that, what is your confidence in the note posted on the TAOCP website that Volume 5 will see light of day by 2015?

Donald: If you check the Wayback Machine for previous incarnations of that web page, you will see that the number 2015 has not been constant.

You’re certainly correct that I’m having a ball writing up this material, because I keep running into fascinating facts that simply can’t be left out—even though more than half of my notes don’t make the final cut.

Precise time estimates are impossible, because I can’t tell until getting deep into each section how much of the stuff in my files is going to be really fundamental and how much of it is going to be irrelevant to my book or too advanced. A lot of the recent literature is academic one-upmanship of limited interest to me; authors these days often introduce arcane methods that outperform the simpler techniques only when the problem size exceeds the number of protons in the universe. Such algorithms could never be important in a real computer application. I read hundreds of such papers to see if they might contain nuggets for programmers, but most of them wind up getting short shrift.

From a scheduling standpoint, all I know at present is that I must someday digest a huge amount of material that I’ve been collecting and filing for 45 years. I gain important time by working in batch mode: I don’t read a paper in depth until I can deal with dozens of others on the same topic during the same week. When I finally am ready to read what has been collected about a topic, I might find out that I can zoom ahead because most of it is eminently forgettable for my purposes. On the other hand, I might discover that it’s fundamental and deserves weeks of study; then I’d have to edit my website and push that number 2015 closer to infinity.

Andrew: In late 2006, you were diagnosed with prostate cancer. How is your health today?

Donald: Naturally, the cancer will be a serious concern. I have superb doctors. At the moment I feel as healthy as ever, modulo being 70 years old. Words flow freely as I write TAOCP and as I write the literate programs that precede drafts of TAOCP. I wake up in the morning with ideas that please me, and some of those ideas actually please me also later in the day when I’ve entered them into my computer.

On the other hand, I willingly put myself in God’s hands with respect to how much more I’ll be able to do before cancer or heart disease or senility or whatever strikes. If I should unexpectedly die tomorrow, I’ll have no reason to complain, because my life has been incredibly blessed. Conversely, as long as I’m able to write about computer science, I intend to do my best to organize and expound upon the tens of thousands of technical papers that I’ve collected and made notes on since 1962.

Andrew: On your website, you mention that the Peoples Archive recently made a series of videos in which you reflect on your past life. In segment 93, “Advice to Young People,” you advise that people shouldn’t do something simply because it’s trendy. As we know all too well, software development is as subject to fads as any other discipline. Can you give some examples that are currently in vogue, which developers shouldn’t adopt simply because they’re currently popular or because that’s the way they’re currently done? Would you care to identify important examples of this outside of software development?

Donald: Hmm. That question is almost contradictory, because I’m basically advising young people to listen to themselves rather than to others, and I’m one of the others. Almost every biography of every person whom you would like to emulate will say that he or she did many things against the “conventional wisdom” of the day.

Still, I hate to duck your questions even though I also hate to offend other people’s sensibilities—given that software methodology has always been akin to religion. With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, let me just say that almost everything I’ve ever heard associated with the term “extreme programming” sounds like exactly the wrong way to go…with one exception. The exception is the idea of working in teams and reading each other’s code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.

Here’s a question that you may well have meant to ask: Why is the new book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1? The answer is that computer programmers will understand that I wasn’t ready to begin writing Volume 4 of TAOCP at its true beginning point, because we know that the initialization of a program can’t be written until the program itself takes shape. So I started in 2005 with Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of Star Wars, which began with Episode 4.)

Finally I was psyched up to write the early parts, but I soon realized that the introductory sections needed to include much more stuff than would fit into a single fascicle. Therefore, remembering Dijkstra’s dictum that counting should begin at 0, I decided to launch Volume 4 with Fascicle 0. Look for Volume 4 Fascicle 1 later this year.

References

[1] My colleague Kunle Olukotun points out that, if the usage of TeX became a major bottleneck so that people had a dozen processors and really needed to speed up their typesetting terrifically, a super-parallel version of TeX could be developed that uses “speculation” to typeset a dozen chapters at once: Each chapter could be typeset under the assumption that the previous chapters don’t do anything strange to mess up the default logic. If that assumption fails, we can fall back on the normal method of doing a chapter at a time; but in the majority of cases, when only normal typesetting was being invoked, the processing would indeed go 12 times faster. Users who cared about speed could adapt their behavior and use TeX in a disciplined way.

Andrew Binstock is the principal analyst at Pacific Data Works. He is a columnist for SD Times and senior contributing editor for InfoWorld magazine. His blog can be found at: http://binstock.blogspot.com.

Gizmodo Bill Gates Interview – CES 2008 Las Vegas