Wednesday, October 09, 2024

Nobel Prize in Physics to...Hopfield and Hinton for Artificial Neural Networks?!?

 You have probably heard by now, but about twelve hours ago the Nobel Prize in Physics was awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks”. To say that this prize is surprising is an epic understatement. It's also causing some consternation and even anger in some quarters. I don't normally feel qualified to comment on the Nobels, but in this case let me record just a few thoughts that are enhanced by conversations with and social media postings by a few friends.

Hopfield was faculty at Caltech when I was there, but was oriented more toward biology at the time, and I wasn't aware enough to take the really important and interesting classes. He taught a class on neural networks early during my time, which a few of my friends took but I didn't. In 1981-83, he, Carver Mead, and Richard Feynman taught a triple-listed Bi/CS/Ph course that was reportedly wild.  I'm sorry I missed that! (I did take Feynman's class later, in 1985-86, that was a direct descendant of that class. We learned a little about neural networks, and in fact a couple of friends and I implemented one in C, though it wasn't very good. That class changed how I view the process of computation, and indirectly led me into quantum computing several decades later.)

One of my closest friends took a class Hinton co-taught back in the early 80s at CMU. She said it was fascinating, all the way back then.

On the prize itself, at least one of my physicist friends is angry. Paraphrasing, it's not physics and why should we be celebrating things that detract from human accomplishment? I don't agree with the latter, but the former will be the basis of bar arguments for a long time to come.

My opinion? Hmm...

It's not even entirely clear to me, after watching the press conference, whether the prize is being awarded for neural networks being physics, or changing the way people do physics. The former figured prominently in the press conference, as they talked about Boltzmann machines.  Yes, Hopfield and Hinton both used ideas from their background in physics, but to me this seems to be a bit of a stretch. The committee also talked about the latter, about the use of neural nets in doing physics. In that case, why not the inventor of the supercomputer, or the laptop?  The inventors of the Intel 4004 microprocessor (Faggin, Hoff, Mazor and Shima), most often cited as the world's first microprocessor, are all still alive. The invention of extreme ultraviolet photolithography is also another good candidate.

I've heard funny takes on this prize, including that the tech bro billionaires bribed the committee.  My own is that it was ranked voting in the committee and everybody refused to budge on their first choice but somehow they all ranked AI high enough that in the end Hopfield and Hinton had the most points and everyone on the committee went, "wait, what???" But they couldn't undo the decision.

That, or they were all angry that Hopfield was left off of Hinton's 2018 Turing Award, shared with Bengio and LeCun. (Speaking of which, I said that Hinton was the first to receive both the Turing Award and a Nobel, but that's not true -- Herb Simon did it first! Interestingly, both Simon and Hinton were on the faculty at CMU.)

I do think it's okay that the committee is stretching the definition of "physics"; they have done that before, with prizes for information technology. But with only a single prize awarded each year in physics, there are many, many discoveries, and discoverers, out there waiting for the public recognition they most definitely deserve. There are other prizes for computing, of course, notably the Turing Award. So while a broad look at how physics has changed society is a good thing, but I think it would be okay to say, "Nah, they already have a prize in their main area, that should be enough."

But in the end, it's recognition of the importance of machine learning, neural networks and ultimately artificial intelligence in our lives already, a fact that will continue to grow. And if it gives Hinton (and others) more of a platform for and recognition of his (and their) reservations about where we are headed with AI, all of that's a good thing. It's a necessary conversation we need to be having, now.

Finally, the score so far for the science prizes this year:

  • White men from elite institutions: 4
  • Everyone else: 0
We'll see when the Chemistry prize is announced this evening if the rest of the world gets skunked again!



Thursday, October 03, 2024

Strength

A friend of mine on Facebook said they think the U.S. needs a strong leader, and that Donald Trump is that person.

I disagree. Trump is not strong.

Calling your opponent "mentally retarded" is not strong.
Stoking fear and hatred of immigrants instead of asking "what brings people to our shores, and how can we help them get the same opportunities we have had?" is not strong.
Bragging about not paying overtime is not strong.
Failing to pay the arenas where you have held rallies is not strong.
Refusing to admit losing an election is not strong.
Being obsessed with and afraid of sharks is neither particularly strong nor particularly weak, but failing to recognize their importance to the ecosystem is definitely not strong.
Nuking hurricanes is not strong.
Threatening to jail your political opponents and reporters who write stories you don't like is not strong.
Speaking of reporters, mocking a talented reporter who happens to have a physical impairment is not strong.
Mocking those with different accents, dress, speech patterns, body types or other differences is not strong; very likely those people speak multiple languages, have a broader experience of the world, and still struggle every day in a culture not their childhood one -- those people, they are strong.
Allowing yourself to simply take as your own opinion the opinion of whoever talked to you last is not strong.
Torpedoing a bipartisan deal on the southern border simply to allow the situation to fester long enough to become a campaign issue is not strong.
Failing to stand up to extremists in your own party is not strong.
Giving Putin whatever he wants is not strong.
Insulting women, assaulting women and taking away their constitutional rights most definitely is not strong.

I could go on.

But finally, attempting to overthrow the U.S. government and install yourself as some sort of petty tyrant because you are not strong enough to lose with grace is weak and cowardly.

Donald Trump is the opposite of strong in every way.

Tuesday, September 03, 2024

Spelunking CACM, vol. 21 (1978): Cray-1, RSA, CSP, and more, more, more!


WOW, the January issue is a must-read!

 It's chock full of papers on some of the most important architectures, both contemporary and historical, in a special issue, though at a glance I don't see an introduction. (But I don't see the front matter in the Digital Library...hmmm?)

  • The Manchester Mark I and Atlas. 'Nuff said. Know your history, kids. Interesting that these two bracket the transistor computer, first developed at Manchester.
  • MU5: This one, I wasn't familiar with, but it's from Manchester, one of the most important places in computer architecture history. Manchester's accomplishments include building the world's first transistor computer. (Wait, didn't I just say that?)
  • Sperry-UNIVAC 1100: a still-active family of 36-bit mainframes (36 bits???), the first of which were vacuum tubes, later transistorized ones with SSI and MSI (I presume). Although the names in the series followed the 11xx convention, the later ones weren't compatible with the earlier ones. Interestingly, this architecture used ones-complement integer arithmetic instead of twos-complement, so it's theoretically possible to have both +0 and -0 values. Like the DECsystem-10 below, it supported unlimited indirect addressing.
  • DECsystem 10: TOPS-20 systems, the next generation of this system, were workhorses at USC/ISI when I first worked in the computer center there. This architecture has a couple of clever, elegant features: essentially a single addressing mode, where the registers are the lowest handful of addresses, and indirection, where any pointer you follow has a bit that says, "Treat this one as pointer instead of the data you're looking for, and follow it." Yes, that could recurse. The OS was innovative, too. Oh, and did I mention that this marvelous has a word length of...36 bits?!? (Didn't I just say that?)  And that a byte can one of several sizes, with 7 and 9 bits being the most common?
  • The Cray-1: the most important supercomputer in history, perhaps, and a beautiful, elegant piece of work. Check out that gorgeous image at the top of this posting. Freon liquid cooling, chained 64-bit floating point vector operations working from vector registers, bipolar memory chips, dual-rail encoding to reduce fluctuation in power and signal noise, and the totally 1970s naugahyde love seat around its set-of-wedges circular design. 138 sustained megaFLOPS, what performance! Fastest machine in the world.

I'm tempted to just stop here and declare victory after January, but I suppose I should at least glance through the other issues...let's see, another report on status of women and minorities in CS faculty...

Ahem. Ah. Let's see, this might be important:

The RSA (Rivest-Shamir-Adelman) public key cryptosystem. Yeah, I'd say that's worth mentioning. Oh, and then Merkle, too. And at the end of the year Needham-Shroeder. A good year for cryptography.

Backus's Turing Award lecture. Want to think about programming languages? Read this first.

It was also an incredible year for the theory of distributed systems. Lamport on Time, clocks, and the ordering of events in distributed systems. One of the classic, classic papers in distributed systems. I still have my students read this, almost half a century later. Tony Hoare's Communicating Sequential Processes. Another of the classic, classic papers. And Per Brinch Hansen's own model for distributed programming. His affiliation is listed as USC, but I never met him; I have no idea when he left USC.

My sister's dog likes to lie on her back and just wallow.  I'm tempted to do that while reading all of these fantastic papers!

Monday, July 08, 2024

Spelunking CACM's Second Decade

 As you know, I am working my way through Communications of the ACM, beginning to end, stopping to read a handful of articles from each year. These aren't necessarily the papers with the highest citations, but instead things that catch my eye from the point of view of the 2020s.  A full two years ago I finished the first decade of CACM; now I have finished the second, so let's index them. (Hmm, I've fallen out of the habit of giving them descriptive names, I should go back to that.)

Sunday, July 07, 2024

Spelunking CACM, Vol. 20 (1977)


I have finally reached my twentieth article in this series! It's probably time for another summary, but first let's do the things that caught my eye in 1977. This was the year that "Star Wars" (with the simple title) came out. I was eleven. I probably wasn't yet particularly aware of computers and computing, but I was already a fan of "Star Trek" and of the space program (which was in the Apollo-Soyuz/space shuttle interlude at the time).

When I first skimmed the contents for 1977, I flagged eighteen articles for further investigation. For a systems person like me, there was a lot to catch the eye. I've winnowed it down, but if you look there are still more good papers than just these.

If you're interested in data about demographics and numbers of students in CS, there is a report following on from the previous year.  What's depressing is not how bad it was in 1976, but how bad our diversity remains today.

As long as we're on the topic of education, it was already possible in 1977 to put together a survey of 200 publications on CS ed. ACM's first curriculum committee issued its report in 1968, though, and SIGCSE was founded in 1970, so it should be no surprise. The CS GRE was also introduced recently and analyzed in CACM.

Dorothy Denning and Peter Denning give an elegant description of how to certify information flow in programs for security purposes. Consider, for example, the simple statement



If the variables y and x are in separate security classes, then there is a potentially unauthorized information flow between the classes. After the execution of this statement, the value of y will equal the value of x, whether we considered that acceptable or not. The Dennings go on to discuss a complete method for static, compile-time analysis of this information flow.

Morgan and Levin provided a mathematical means of assigning files to network nodes to optimize overall system performance. The eye-catching hypercube with cut planes at the top of this blog posting is taken from the article. They even consider multiple copies of the files in the networks, and divide traffic into "query" and "update" and consider them separately. They recognize that assigning the files to the nodes is an exponentially complex problem, and provide some heuristics for avoiding searching the entire space.

Speaking of networks, Tajibnapis developed a routing protocol for the MERIT network in Michigan. His article on TIP was originally submitted in 1974, and finally published in 1977. To me, the NETCHANGE protocol specifying the message contents for routing protocols sounds a lot like RIP, which was developed quite a bit later. However, TIP doesn't seem to have had a lot of impact; I'm not sure why. Of course, even by 1974 Dijkstra's shortest path first algorithm was fifteen years old, and there was quite a flowering of work on routing in the mid-1970s, so it's likely that there was other, similar work. MERIT would go on to be a pivotal place for networking research for decades to come.

In operating systems work, Lamport wrote about readers and writers, which I still teach about in my OS class today. Fundamental stuff, though this article is a little...rococo in its notation, I think.

Parallel computing was in its earliest phase of trendiness. While I associate a lot of theory of correctness in parallel programming to K. Mani Chandy, there is a paper here on how to prove parallel programs correct, using Susan Owicki's techniques and an implementation of Dijkstra's on-the-fly garbage collector as the example. As author David Gries says,

Building a program with little regard to correctness and then debugging it to find errors is even more folly for parallel programs than it is for sequential programs.

Word.

And finally, if you want some history to go with your history, there are Rabin's and Scott's Turing Award lectures, given for their work on nondeterministic automata some years prior.


Monday, June 24, 2024

Lynn Conway

I was in a Starbucks when I saw Dave Farber's message on his IP mailing list saying that Lynn Conway had passed away, and I said out loud, "Oh, no!" and started crying. She was a hero and an icon, and stands high in our technical pantheon.

Of course, every person has the fundamental human right to live as they are, as they understand themselves to be, and the rest of us get no say in who they are. Saying, "It's okay that Freddie Mercury was gay, he was an amazing singer and artist," fundamentally misunderstands this. The vast majority of LGBTQ+ people are perfectly ordinary people, and that is perfectly fine. Margo Selzer said, "It is not the job of the underrepresented to solve underrepresentation," and the same is true for other aspects of life as a minority. It's the job of the majority to change ourselves to be accepting; no minority should be required to step up and be a hero. So, Lynn "owed" no one her work as an activist; it was a role she chose late in life, and we should be grateful for it.

FWIW, it took IBM 52 years to get around to apologizing for firing her (falling just short of the 55 years it took the UK government to apologize for chemically castrating Alan Turing for being gay).
https://www.nytimes.com/2020/11/21/business/lynn-conway-ibm-transgender.html

As it happens, I was talking to a couple of students yesterday about citation counts for researchers in computer science and electrical engineering, and we found a website where the top researcher has half a million citations. You won't find Lynn's name on a list like that, and yet I would put her contribution far above almost everyone on such a list. She received dozens of awards, but far fewer than she deserved, IMO. There would BE no "chip industry" without her.  Pretty much everything else in our research field and our entire industry...is secondary.

Wikipedia tells me that IEEE Solid-State Circuits Magazine published a special issue on her career in 2012. I didn't know that, but it was well deserved. Her own reminiscences are worth reading -- every sentence.
https://ieeexplore.ieee.org/document/6392995
https://ai.eecs.umich.edu/people/conway/Memoirs/VLSI/Lynn_Conway_VLSI_Reminiscences.pdf

We all owe her a tremendous debt. Write her name in the history books, and then go and pay it forward. I'll tell my Computer Architecture class and my quantum computing research group about her tomorrow. I didn't know her in person, but I probably won't be able to keep my eyes dry.

[written right after hearing about her passing, posted a couple of weeks later.]

[edit: obituaries:

]

Tuesday, June 04, 2024

Gordon Bell



 Gordon Bell has passed away.

Gordon was one of the most important computer architects of all time. He designed, co-designed or was executive lead in charge of most of Digital Equipment Corporation's key machines in its heyday, from the initial PDP-1 to the VAX-11, in two stints working for the company. Although he wrote the seminal description of the PDP-11, perhaps the most influential minicomputer of all time, I don't think he had much to do with that design or with the PDP-10, and by the time of the Alpha he had already left DEC for good. But still, much of the company's design sensibilities grew from him.

I learned a great deal from the book he coauthored on computer engineering, using all of the DEC machines as examples. Most of the chapters were coauthored by Gordon and other members of the various technical teams. (Wow, I paid ten bucks for my copy at a DECUS in 1986, but even the Kindle version now goes for a hundred bucks?!?)

He also established the Gordon Bell Prize for accomplishments in parallel computing. One of the recent prizes was for simulation of quantum mechanics, though nothing to do with quantum computing.

RIP Gordon, thanks for the machines and for being such an important advocate of parallel computing.



Thursday, May 30, 2024

Questions about QKD

 I think quantum key distribution is fascinating, but unlikely by itself to serve as reason enough to build a Quantum Internet. Keeping in mind that I am not directly a QKD researcher, in my opinion there are several major hurdles limiting adoption of QKD today:

  • Range of QKD is limited (until we build a multihop, long-distance network).
  • Boxes are expensive, not robust, and require a lot of operational expertise.
  • Attacking QKD deployments is trivial; it's designed to detect eavesdroppers, so by its very nature acting as an eavesdropper is equivalent to launching a DoS attack.
  • Interoperability, standards and global operational confidence are still works in progress.
  • Market pull is still limited, because the problem it solves -- generating shared random or near-random bits secure enough to be used as encryption keys(*) -- still isn't tops on the list of pain points for Chief Security Officers, AND there is a classical solution in the offing (PQC) that requires "only" software and protocols, no new hardware.
  • Latency to start up a connection is orders of magnitude too high to be useful at e.g. the HTTPS level, so it has at best a specific and limited role in systems, e.g. network-to-network IPSec tunnels.
With that in mind, I recently had a discussion with a QKD researcher about how to evaluate new ideas in the area. We came up with about thirteen questions/concerns/metrics answering the question, "What do we want to improve?":
  1. Steady-state key generation rate
  2. Robustness against noise
  3. Fraction of raw resources dedicated to detecting an eavesdropper
  4. Robustness against some known attack (e.g., detector blinding or entangling with qubits)
  5. Required classical communication bandwidth/latency
  6. Simplicity of quantum hardware implementation
  7. Startup time
  8. Preconditions (e.g., pre-shared key for authentication)
  9. Classical resources required, esp. randomness
  10. Ease of integration into classical security systems
  11. Ability to use in a heterogeneous quantum network environment (e.g., full end nodes with memory v. measurement-only end nodes)
  12. Demands on or benefits to quantum network operations (e.g., link tomography or network routing)
  13. Extension to multi-party protocols
Simply playing with ideas, such as "I found this cool thing while looking at quantum graph states...", is great, and important, and that's where highly original stuff comes from. But if you bring me an idea about QKD, I'm pretty likely going to ask which of the above things it improves, or if you think it has value for some additional reason.

(*) John Mattsson of Ericsson pointed out on the QIRG mailing list that it's not accurate to say the bits are used as keys. Instead, bits from any TRNG (true, or hardware, random number generator) are actually further processed through a KDF (key derivation function) or CSPRNG (cryptographically secure pseudo-random number generator), which is then fed to AEAD (authenticated encryption with associated data) before the key that is actually used is complete and finalized. He recommends a German government document on the topic.

Some Technical Women I Admire

I frequently have need of a list of women whose technical work I admire, so I finally decided to list some of them here. I'm not going to do all of your homework for you; you'll need to look them up yourself. Most of the older ones are fairly obvious names, and have Wikipedia pages. A few aren't famous...yet.
  • Fran Bilas
  • Anne Broadbent
  • kc claffy
  • Lynn Conway
  • Agnes Meyer Driscoll
  • Deborah Estrin
  • Elizabeth Smith Friedman
  • Yvonne Gao
  • Mary Hall
  • Margaret Hamilton
  • Betty Holberton
  • Grace Hopper 
  • Mary Jackson
  • Mae Jemison
  • Betty Jean Jennings
  • Katherine Johnson
  • Kanchana Kanchanasut
  • Elham Kashefi
  • Hedy Lamar
  • Ruth Lichterman
  • Barbara Liskov
  • Ada Lovelace
  • Margaret Martonosi
  • Kay McNulty
  • Maryam Mirzakhani
  • Mio Murao
  • Kae Nemoto
  • Emmy Noether
  • Poppy Northcutt
  • Keiko Okawa
  • Jennifer Rexford
  • Sally Ride
  • Jacquiline Romero
  • Mary Shaw
  • Eve Schooler
  • Donna Strickland
  • Dorothy Vaughn
  • Marlyn Wescoff
  • Suzanne Woolf
  • Lixia Zhang
  • and, of course, my own women students and colleagues!

Note that, of course (as with men), it's possible for someone to do amazing, important technical work but still not be a good person. However, of the ones here I know personally, I will say I admire almost all of them as people as well as scientists/technologists.

Saturday, May 18, 2024

A Bundle of New Quantum Internet / Interconnect Papers

A whole pile of our #QuantumInternet engineering papers hit the arXiv yesterday! I'll try to give just a short summary of each paper, so this post doesn't get insanely long. We are currently building a testbed quantum network, so a lot of this material addresses problems we are facing now or will in the very near future. We have three papers on use of entangled photon pair sources (EPPS nodes, in our lingo), a low-level engineering paper on getting the timing right, a paper on how to construct and use efficient switches, and a paper looking a little farther into the future at advanced techniques using complex, many-photon entangled states.

First up, a paper by Soon (all of these papers were highly collaborative in both research and writing, but I will identify each of them by the first author) on design and simulation of a particular link type, using entangled photon pair sources in the middle of the link as above. We call these MSM, or memory-source-memory, links. This kind of link was explored by Cody Jones back in the day, but not fully simulated (or implemented), which of course leaves holes in the needed protocol design and the possibility of dynamic effects that don't show up in an initial analysis. Interestingly, carefully throttling the entanglement attempt rate results in better throughput, a rather unexpected effect. We call the problem the mutual latch-fail phenomenon.
https://arxiv.org/abs/2405.09861

Soon followed that up immediately with a second paper on how longer, multi-hop paths behave when the links are different types. Reassuringly, paths where most of the links are one type but one is different seem to perform as well as completely homogeneous paths. This boosts our confidence that heterogeneous networks will work, and that introducing new technologies won't be a forklift/flag day upgrade.
https://arxiv.org/abs/2405.09862

Those first two apply pretty broadly to different types of network deployments: data center, LAN, WAN. In a third, related paper, Paolo Fittipaldi of Frédéric Grosshans' group simulated a particular subtype of MSM links: satellite! The figure above is probably too small to read here, so I recommend you actually read the paper. But with what we believe are realistic parameters, distribution of several entangled Bell pairs per second while the satellite is visible seems achievable. Performance is largely driven by the photon loss and latency (of course), but also (perhaps less obviously) by the amount of memory available at the ground stations. Extending this to examine the behavior of applications that use such a link, satellite-to-satellite connections, and networks that extra into metropolitan areas around the ground stations will make for interesting work.
https://arxiv.org/abs/2405.07589

Those first three make a pretty good set if you are interested in how entangled photon pair sources will behave in networks.

Since we are talking about preparing for heterogeneity, let me stick in a placeholder here for another paper that should appear momentarily. Poramet has done good work on how to combine purification and error correction in networks. Stay tuned for the paper.

Photons are most useful in our plans when they are entangled with something else (a memory or another photon), and come together in pairs so that we can measure them together, a technique known as entanglement swapping achieved via Bell state analysis or Bell state measurement. To accomplish this, we need good overlap of the arrival times of the photons, which we achieve by changing the length of the path that one photon follows. The basic idea is obvious, but it takes work to get the details right, and our team with Mori-san as first author worked them out. Rob Thew's gang is doing related work; see Secs. VII & VIII of a paper of theirs. We weren't aware of this particular paper before writing ours, but I think there is still value in the engineering analysis we present on tradeoffs in different possible configurations, both for single-hop and multi-hop paths.
https://arxiv.org/abs/2405.09881

Given that we are trying to bring photons together in pairs, in order to connect more than two nodes we need some sort of switch, a problem that Marii has taken charge of. I have been interested in topologies for interconnections since first being exposed to the Caltech Cosmic Cube, back in the days of stone tablets. It turns out that the problem of matching pairs is different enough from the ordinary all-to-all reordering of inputs to outputs to give us the opportunity to optimize the design compared to Clos, BeneÅ¡, and other interconnects and sorting networks. Every optimization we can make will reduce photon loss, which is our biggest enemy. We have reduced the number of switch points by about a factor of two compared to basic designs, focusing on layouts restricted to a plane for photonic chip implementation, and we think our designs are optimal. The only related work we could find was Drost et al., who used exhaustive search to solve the problem up to 10x10, but without a general solution and without considering whether the network can be reduced to a plane. Next, we need to extend our results to include non-planar designs suitable for connecting discrete switches, and prepare to implement this in a photonic chip.
https://arxiv.org/abs/2405.09860

Finally, the most technically obscure of the topics on the menu, Naphan's work on repeater graph states (RGSes). The challenge is that photons get lost, but the no cloning theorem and work related to quantum error correction tells us that we can't successfully reconstruct a quantum state if we lose more than half the photons. But with RGS, originally found by Azuma-san and with extensive work by the group of Sophia Economou, if you create a complex entangled state and can make quick decisions about the choice of what to do with the photons you do get, then you can handle losing certain subsets of photons. The theory is great, but implementing networks using these states isn't going to be easy, even aside from the physics problem of generating them correctly. Naphan has found ways to reduce the required classical communication by three (decimal) orders of magnitude. We continue to make progress toward practical implementation!
https://arxiv.org/abs/2405.09876

Whew! That's a ton of work, and we hope that it all helps bring closer the day when we can deploy quantum networks for production use, as a system interconnect for a multicomputer, in the data center, and in LANs, MANs, WANs and ultimately intercontinental quantum networks.

Oh, did I mention that the first authors of three of the above papers are undergraduates who just entered their third (junior) year? Bet you'll have trouble picking out which ones. Our students are amazing (both the undergrads and grad students).

Thanks for reading this long posting, and let us know if you have comments!

Monday, May 13, 2024

Important CS/CE Papers of Recent Years

What are your nominations for papers of the last decade (or so), ones that will show up in textbooks or change the fabric of what we do and how for decades to come?

Here are a few candidates my team (mostly quantum folks) and I came up with. Although I initially phrased it as "of the last decade", some took the liberty of going back a little farther, and I personally followed on from that with a few suggestions of my own.


(n.b.: I consider this to still be an open discussion, but am publishing so that people can comment)

Friday, May 10, 2024

IEEE QCNC

 Two papers accepted (out of two submitted) for the inaugural IEEE QCNC. Both on quantum networking, one on error management and one on links using entangled photon pair sources. See you in Kanazawa in July!


Tuesday, May 07, 2024

Spelunking CACM (1976)



 

Above are two tables from a report on production and employment of CS Ph.D.s in the U.S. I fear those ratios have not changed as much as they should in the intervening half century. One or two new Black CS PhDs a year. Typically single digits for women. And, of course, it was the 1970s, so the categories of minorities of interest were Females, Blacks, and Foreigners; my apologies to those who find the categorization offensive, which most people will by modern standards.  I have heard that the percentage of women PhDs was higher in 1980 than today, but it was clearly bad in the early 1970s. Pursuing a more complete comparison is a task for another time, but just a few months later CACM had an additional article on the status of women and minorities in CS. Read them both.

Allen and Cocke talked about directed graphs as representations of control flow and possible data modification in programs. I don’t think this is the introduction of the concept of data flow; certainly control flow and flow charts are much older than the mid-1970s. The paper does list work by Kennedy, Aho and Ullman on liveness and data flow. I don’t quite understand how they break loops and how they define “intervals”, which are roughly akin to small functions with a single entry point. But this is an important concept in understanding "liveness" of data, dependency in computation, and how to schedule operations. Without work like this, concepts as varied as garbage collection, caching and out-of-order instruction execution have no theoretical foundation.

Oliver Smoot presented a report on a WIPO meeting on protecting software. It wasn't the first such meeting; the UN authorized a group back in 1971.

Lampson and Sturgis have a beautiful (of course) paper, a retrospective on an OS designed at Berkeley. Of course they had the benefit of hindsight, but as always, they clearly laid out goals for the system, a philosophy, an analysis of what they thought was possible, then the system itself, and importantly, provide extensive discussion of what went wrong both technically and in project management. This paper alone could be a great model for how to write about computer systems.

Keller talked about formal proofs of parallel programs, two years before Hoare's CSP.

Perhaps the highest impact of year was Metcalfe and Boggs on a little idea called Ethernet. They cite an earlier paper by Farber in the Feb. 1975 issue of Datamation magazine.

The number of papers I want to talk about each year continues to grow. I'll have to be more selective, so I can go back to doing deeper dives.

Wednesday, May 01, 2024

Seven IEEE Quantum Week Submissions!

 Hi y'all, I'm back!

Have spent the last two months in travel, start of academic year work, and especially with my head buried in pushing out some important research. As a result, we have have submitted not one, two, three, four, five, or even six papers, but SEVEN research papers (one company collaboration, one collaboration with France, one collaboration with Kanazawa as part of our Moonshot work, and four from AQUA at SFC; one appllication, one systems, and five communications papers) to IEEE Quantum Week. Oh, and a workshop proposal.

Add in that we have three other papers under review (two conference, one magazine), and counting that workshop proposal, I have eleven things under review right now. Geez. And I wonder why I'm tired...

Hoping to get caught up on some reading and writing during upcoming Golden Week holidays. I get a couple of days off! (But first, that urgent email...)

Monday, February 26, 2024

Nerd Nite Videos Online!

 

The full set of videos of Nerd Nite Tokyo talks, going back to June 2016, are now available on YouTube!

I have talked twice at NN:

Thanks to Andrew Todd for making the entire archive available. Join us at upcoming events!




Thursday, February 22, 2024

Spelunking CACM (1975): Phong Shading



Have you heard of the Phong shader? A shader is a small function or program that determines the color of each individual pixel in a computer-generated image. Gouraud’s 1971 shader is one of the simplest, both to understand and to implement. In 1975, along comes Bui Tuong Phong and completely changes the game. (Well, Wikipedia says he first published his algorithm in 1973, in his Ph.D. thesis.)

The figure on the left above uses a flat polygon shader by Newell, Newell and Sancha. Each polygon has a uniform brightness to it, a technique known as flat shading. The figure on the right uses Phong's new shader. The difference is incredible, isn't it? For the one on the left, you would think either that the vase is itself faceted, or that it's reflecting rectangles such as windows. The one on the right is much more natural.

Gouraud's shader doesn't flat-shade each polygon.  It begins with the set of normal vectors for each polygon, then it finds the normal at each vertex by averaging the normals of the surrounding polygons, a clever idea.  Then it interpolates linearly along each edge.  Finally, when creating the pixels in each scan line, it looks left and right from each pixel to find the nearest edge in each direction, and linearly interpolates from there. I'm a little surprised this works with polygons that are rather skewed, but if you think about it it's generally fine.

The problem arises because the interpolation is smooth within the polygon, but the slope of that interpolation has a discontinuity where you cross boundaries. This results in an effect known as Mach bands, where the human visual system assigns a ridge of brightness just to one side of that discontinuity, visible in the figure on the left below (or in the first figure if your browser shows them stacked).



To really see the Mach band effect, if we zoom way in on the figure on the left above, we get:



If you concentrate only on the region on one side of the boundary, you will see that the shading doesn't have the bright line that seems to be there in the zoomed out picture. It's just an artifact of your brain's image processing.

Most importantly, in Gouraud shading, the values for the pixels are linearly interpolated. With Phong shading, instead, normal vectors are interpolated. Once you have the normal vector for the point, you can then calculate the appropriate projections to determine specular reflection as well as actual value.  This means calculating cosines, or approximations of them, and taking square roots. It's a lot more math than Gouraud, but the quality of the results in the figure on the right above is obvious. This comes largely due to the now continuous first derivative as you cross polygon boundaries.

Oh, and a historical note: I mentioned in the 1974 spelunking that Utah was (and is) an important proving ground in computer graphics. Guess what? Both Gouraud and Phong were at the University of Utah. And, sadly, Phong died very young, of leukemia, indeed right after publishing this paper.

Shifting gears, Weber and Gilchrist (presumably both men, based on their given names) reported on progress of women in the field of computing, mostly from the point of view of wages in comparable jobs rather than job opportunities and hiring. Unfortunately they didn't have good numbers on scientific programming jobs and the like, but they did find modest growth in business analysts -- up to 13% women! Depressingly, their conclusion hoping for advances has only partially been realized:
In looking to the future, it will be interesting to observe whether or not the number of women graduates in- creases in the fields related to computer design and use. It will be even more in- teresting to see if the percentages of women in the higher levels of programming and systems analysis increase as the women now in the lower levels gain experience and become eligible for promotion... For the present, the conclusion must be reached that there is still a long way to go before the statistics will indicate that women enjoy complete equality with men in the computer field.

 Guy Steele himself wrote a long paper on garbage collection using a shared-memory multiprocessor. Very forward thinking. I'm actually a little surprised at the complexity of this.

Overall, to me, 1975 was a much quieter year than the explosive year of 1974.


Monday, February 19, 2024

Maebashi: the Second Cyber Civilization Research Center

I should have commented on this when the news came out a while ago: Professor Jiro Kokuryo will be retiring from Keio in a year, after which he will spread the gospel of Cyber Civilization by creating a new CCRC at Kyoai Gakurn University in Maebashi. Great news for him personally, as well as being the first outpost for our plan for global domination. Congratulations!

(Research) Distributed Systems that Influenced Me

 Recently I wrote about some heavily-used, commercial distributed systems that influenced me. They were the obvious ones: NFS, VAXclusters, X Windows. It's important to note that those systems, in turn, were influenced by the seminal work of the Xerox PARC team that did the Alto, Ethernet, the laser printer, the mouse and GUI, and object-oriented programming. I read about them but I didn't have the privilege of working with them.  The PARC outputs became commercial systems in the 1970s, but they were so radical that it is almost fair to call them research instead.

And, of course, before PARC there was the legendary Mother of All Demos. I'm not sure when I first heard about it; not until after much of my study, I think. Now you can watch it on YouTube, but in my formative years it wasn't available and so was myth more than actual influence.

My basic story goes something like this: I arrived at Caltech wanting to be an astronomer or paleontologist, but wound up drifting, studying CS, EE, and optics (Applied Physics). (The optics study pays big dividends now in my quantum research, especially Quantum Internet.) I learned to program a little, including in CS1 taught by Carver Mead and a great lab by own advisor, Chuck Seitz, on 8-bit and 16-bit microprocessors. (If memory serves, we did assembly for 8088, 8086, Z-80, 6502 and 6800.) Thanks to Ron Ayres, who taught me silicon compilation and who worked for the MOSIS Service, I wound up with a job at USC/ISI in the Computer Center as VMS sysadmin supporting MOSIS, and discovered that I loved working on computer systems (a field I have never really left since). 

At some point, I was wandering the ISI library, and pulled the (paper) proceedings of ASPLOS-III off the shelf, and basically devoured it cover to cover. (I was particularly intrigued by a paper showing how to balance vector and scalar performance in an architecture.) I decided I was interested in computer systems research, and applied to but wasn't accepted into Ph.D. programs at Caltech, MIT and UT-Austin. Instead, I entered the USC master's program part time while continuing to work at ISI. For some reason I picked EE-Systems (computer engineering) rather than CS. My advisor, I believe, was Alice Parker, but my true mentors were Kim Korner and Deborah Estrin, with a little influence from Peter Danzig, all from the CS department. It was with them that I studied distributed systems.

In one seminar class taught by Deborah and Kim, we read about sixty papers, including a few on AI and other broad topics that Kim felt like we should study. Kim's graduate-level OS class was essentially reading the important literature. I also took a summer class on federated databases by Peter, and a couple of parallel architecture classes, one that was more lecture style and one that was focused on the literature. Over those four classes focused on the computing systems research literature, I probably read in excess of two hundred papers. I should dig up those reading lists.  (I did the non-thesis option because I knew there were a lot of classes I wanted to take; in addition to those four and the architecture lecture, I took a compiler class, an OS lab class (which I would teach a few years later), probability, and queueing theory, for a total of nine classes, each of which was a lot of work.)

In that seminar-style class taught by Deborah and Kim, we studied these systems and concepts, and quite a few more (in no particular order):

  • Lamport’s clocks: To me, this was a revelation. It's one of those ideas that changes the way you think, which of course is one of the goals of education, right?
  • Byzantine generals (Lamport again): Since I was a VMS sysadmin, this one also helped me to understand what I was seeing when the VAXcluster hung. Not that it really helped me debug the cluster; usually, we just rebooted nodes until we found the one that was the problem, then the rest of the cluster would spring back to life.
  • Virtual time (USC and/or JPL): WOW! Speaking of ideas that changed my perception of the world, virtual time showed how optimistic computation can be used to create a system that computes on an effectively globally synchronous model. Jefferson wrote in terms of simulations that have a notion of time, but the techniques are far more general than just physical simulations. The paper lists Jefferson's affiliation as USC, but I think he was at JPL by the time I was reading this. I have never met him.
  • Amoeba (Vrije University, Amsterdam): Among the systems on this list, Amoeba probably influenced my thinking more than the others here, though it had important drawbacks that meant it was never competitive as a production system. It was just too different from UNIX at a time when that meant death. But its true distributed object model and the expectation that objects could execute on VAX, 68000-family or other processors seamlessly was a critical, and radical, goal. In fact, I tried to follow that model myself in a project I did in one of my grad school classes, shipping code from node to node across a socket for recompilation and execution at the far end.
    Among other features, it was more or less what we would call a microkernel, with services focused on supporting individual program objects rather than a full process at it is understood today. Communication between objects was a form of RPC.
    The 1990 CACM paper linked to above (which I think was the paper we read) doesn't mention the Amoeba project's ongoing, largest contribution to computing: the Python programming language.
  • Sprite (Berkeley): my primary memory of Sprite is its process migration, which was implemented early on and extended and studied into the mid-1990s. Its single-system-image approach is important and powerful, but not easy to make robust and high performance. Much later, I was impressed and intrigued by Mor Harchol-Balter's observation that process lifetimes follow a long-tailed distribution, making process migration more valuable than it might be otherwise. It is, however, difficult to pull off; you have to maintain open files and any names or services that the process was accessing before migration have to be transparent once moved. Once the process has moved, it also becomes more fragile; it is now dependent on not only its current machine environment, but the continued accessibility of the server from which it migrated, which must continue to provide some of these services. Making this robust is hard.
    And so, today, that migration isn't a feature of any major OS: instead, we migrate entire virtual machines in order to achieve load balance within a data center! This VM migration is often engineered so that post-migration, the VM is no longer dependent on the machine it just left, which improved robustness and allows that machine to taken down for maintenance (or shut down altogether). This depends, of course, on any external devices -- most especially storage, in block or file servers -- being independently accessible.
  • V (Stanford): Across the bay from Berkeley, a separate distributed OS was being developed. One of the most interesting aspects of V was VMTP, a transaction-oriented network transport protocol that supported reliable multicast and was useful for RPCs to multiple servers at the same time. VMTP is one of the few protocols to arise from OS research of that era that resulted in an RFC. We considered adopting VMTP or elements of its design when I worked at Quantum on a network-attached storage system. I am intrigued that the only author listed on the RFC is David Cheriton himself.
  • LOCUS (UCLA): I was not particularly a USC partisan, so the cross-town rivalry didn't really figure into my thinking, but for some reason LOCUS didn't make a particularly strong impression on me. Although its goal was a single system image (SSI), neither the mechanisms nor the results seemed particularly radical to me at the time. Later I came to realize how valuable the remote I/O mechanisms were and how important Gerry Popek was; I still have my students read his formal description of virtualization from the early 1970s. Popek also was Ph.D. advisor to my friend John Heidemann, who has gone on to become one of the most-cited computer systems people of all time.
  • Grapevine (Xerox PARC): The granddaddy of them all. PARC is famous as the birthplace of the modern office LAN environment, with GUI workstations, file servers and network-connected laser printers. The Alto was the machine, and Grapevine was the distributed naming and messaging system that made it all work together.  Grapevine had a naming system which, to me, has echoes in LDAP; it was focused on naming of services rather than just machines, and so could be used for discovery as well as simple lookup. On top of this naming service was built a messaging (email, more or less) system as well as a method of finding file servers. It included authentication of users in ways that were left out of contemporary systems, a weakness that echoes down through today. It even ran in something of an inter-office fashion, though I think the federation of systems was probably a little wonky.  Less than a decade later, Grapevine would be displaced by Sun’s machines, NFS, RPC, LPD, and SMTP, all built on top of DNS, TCP, UDP and IP running over the Ethernet created at PARC. Arguably, Grapevine would have been a better platform for building those early services, though I don’t think the implementation language(s), APIs and libraries were well matched to the move to UNIX and C. I wish I had had some direct experience with Grapevine and Altos; I believe there were still a few around ISI when I arrived in 1985, though I don’t think they were in active use.
  • Condor (Wisconsin): the hunter of idle workstations. A great idea, scavenging CPU cycles from idle machines to use in an office-wide batch queueing system. Doing this well requires sharing file namespaces across machines.  I remain baffled that UNIX has never had a true distributed batch system for ordinary users to submit compute-heavy jobs.
  • Andrew File System (AFS) (CMU): When I first encountered it, AFS seemed to me like a "better NFS".  It doesn't have actual locking, but it caches files locally, allowing offline writes and working better in a wide-area environment. That wide-area ability comes from reducing latency for operations because of the nearby location of your current copy, security, and in part because it uses TCP by default instead of UDP. This local copy approach also improves global performance and scalability, but it is not designed to support a truly shared write workload to the same file; it depends on a callback approach to resolving write conflicts, including when reconnecting after a network partition. One feature is that the AFS file namespace was essentially global because it incorporated the origin of the file in the name, almost but not quite precursor to URLs.  AFS never quite took off as a one-for-one replacement for NFS, though, perhaps because NFS worked "well enough" for most installations most of the time, perhaps because AFS was initially free but later controlled by a company that wanted to commercialize it. Most likely it was just the NFS first mover advantage.
    A little more than decade after studying it,  while working at Nokia, I would propose using AFS in mobile phones as a file sharing system and automatic backup system for photos and documents, years before the iPhone, Google Drive and other cloud storage. Unfortunately, I didn't sell the vision well enough, and the idea didn't go anywhere. Sigh. That's one of my biggest regrets in my career. I think we could have changed the way people thought of mobile terminals well before the iPhone became a thing.
    Around Christmas one year, I did prototype IPv6 code for AFS; I think my code was partially or completely rewritten before becoming part of the main AFS distribution, though.
  • DNS (USC/ISI) (I think; we might have done this in another class.): I also played bridge and poker with Paul Mockapetris, who was at ISI, so it seemed a bit of a lark to be studying his work. I didn't grasp at the time how important DNS was and Paul would become. Unlike most of the other things listed here, DNS isn't a programmable platform in and of itself, but the depth of its relationship to the key ideas we all study in distributed systems work cannot be overstated. The widely distributed, federated, high availability, soft failure, eventual consistency model is crucial to global-scale systems.
  • Plan 9 (Bell Labs) (I think; we might not have done this in class. It’s even possible that I didn’t really study it until returning to ISI in 1995.): Plan 9 has an extensive Wikipedia entry, so quite a number of people were apparently influenced by its "everything is a file" model, where processes, IPC, sockets, and everything else appeared somewhere in the file tree under /.  I think Plan 9 might have been the origin of /proc. Each process could also have its own namespace, an idea that now shows up in virtual root directories used as a common means of isolating processes. (chroot existed long before this, of course.) (Edit: Just two days after I published this, The Register published a very thorough description of Plan9, including its current state. The article was apparently based on a talk given by the author at the FOSDEM conference.)
  • Closure in Prospero (Washington): speaking of naming, this short paper by Cliff Neumann (who would shortly come to ISI, where he remains) has a closure, or a program that resolves names, to accompany each name. I liked this idea.

I think logical clocks, Byzantine generals, and virtual time were probably in Kim's advanced operating systems class, rather than the class with Deborah; in Kim's class we also read about Multics, VM (the IBM OS), and other already-classic systems that seemed a little alien to me and rather opened my eyes. And, of course, we learned about Mach, the epitome of the microkernel system and also a topic of research at ISI.
I mentioned that deviating from the UNIX model meant death for your system in the 1990s. This was, of course, the Killer Micro era. Microprocessors were coming out at a furious pace, with clock speeds and overall performance climbing to breathtaking levels, with the CISC architectures (x86, 680x0) being displaced by RISC (MIPS, Sparc, PowerPC, PA-RISC). Performance was moving so fast that there was no need to deploy multiple processors on ordinary programming tasks; by the time you got your code working, there was a processor with twice the performance available.

A glance at the origin institutions of the list above shows clearly why California was the place to be. Some versions of Silicon Valley’s history focus on the literal silicon (chips), and perhaps rust (disk drives). Some focus on PARC's outsized role. But it's important to be aware of the overall fecundity of systems work in general and California in particular in the 1970s, 80s and 90s.

Oof, more than enough for now. This entry has gotten a lot longer than I intended, even though it still only scratches the surface of ideas that have influenced my own thinking. Thanks for reading this far!

Sunday, February 11, 2024

Spelunking CACM, Vol. 17 (1974): Clipping Polygons, Parallelism, and UNIX

Well, we are finally within half a century of the present! And what a year 1974 was...

Up to this point, we have not had a lot of graphics-related articles, but of course Ivan Sutherland is the father of all. He and Gary Hodgman described clipping algorithms for polygons, in a well-cited paper. This paper is fairly long (11 pages) for this era in CACM, though a good fraction is figures and pseudocode. The authors were at Evans & Sutherland, spun out of the University of Utah in 1968, perhaps the first computer graphics company and the training ground for some of the most important people in the field, some of whom went on to found other companies. I love that the authors even consider polygons that surround the observer (as in the figure above). Also gotta like the sentence, "We are somewhat chagrined that the obvious extension of work on line clipping with which we have been involved [11] kept us so long from seeing the simplicity of the present approach."

Leslie Lamport showed how to parallelize an in-place computation on an n-dimensional array with nearest-neighbor dependencies by working along the diagonal from one corner toward the opposite corner, in a technique he called the hyperplane method. Every entry on a line (2D), plane (3D) or hyperplane (>3D) perpendicular to a line extending along the diagonal can be updated at once, safely, with the amount of parallelism in each time step growing as you get farther from the corner. It’s mathematically elegant, and might still be useful when each entry is some complex object, but today’s practice is to cache the previous value long enough to allow entire arrays to update at once, where “cache” here could be “send your current value to all neighbors in a message”.

Since I worked on hierarchical storage management (HSM) systems, there is a soft spot in my heart for a paper proposing a heuristic for planning file placement on removable disk packs. Howard Lee Morgan's affiliation in this paper is listed as Caltech, which also intrigues me, since his Wikipedia page doesn't list Caltech. (At least, I assume it's the same guy; how many Howard Lee Morgans would have been in CS in 1974? And I can't find any other mention of the name associated with Caltech.)

Definitely have to cheer on my friend Mary Shaw with an early (her first?) CACM article, on efficient compilation. I have only known her since the COVID-19 pandemic and only remotely, but in this article she writes as clearly as she speaks today. (Edit: Mary tells me this was her thesis work.)

Ah, Hydra. Back in 1970 I talked about Per Brinch Hansen's Nucleus; this one is another important paper in the history of operating systems. It's the first full multiprocessing kernel of which I am aware, and talks about resources and objects in modern ways. Protection is provided via capabilities, and there is clear talk of separation of policy and mechanism. Capabilities can only be modified by the kernel. At the time of this paper, they did not yet have a file system, nor apparently users. The machine itself, C.mmp, was a set of PDP-11s modified to share memory through a crossbar switch. They also had a form of virtual memory, though I am not clear about whether VM was a protection mechanism and whether each process had its own address space. I don't think so; I think VM was used only to map the shared memory into the processor's address space.

Yes, C.mmp and Hydra were from CMU, where Mary Shaw worked then and still works today; CMU has been important in computing for this entire half century.

And then, of course, there is a paper about another PDP-11 operating system, this one named UNIX. I'm shocked to see that, by ACM counting, it has still been cited less than a thousand times. C'mon, people, what are we doing? Maybe the counter has rolled over?

Interestingly, the UNIX paper is part of a special issue on the fourth SOSP, but the Hydra paper is not. Nor, for that matter, is Hoare's paper on monitors.

And this is without even talking about Dijkstra and Knuth. Heavens, what a year! If all of the years from this point forward are going to be this rich, I'm going to have to change the way I go about this -- there is too much to ingest, and too much for a single blog posting!

Sunday, January 28, 2024

Code Monkey and the Magnificent Seven

This is something I wrote back in 2007, slightly adapted; the original is available on the IP list archives.

I really was going to keep my mouth shut on this topic, but I just can't help myself; the topic is stuck in my brain and I have to get it out through my fingers.

This discussion has ranged from how CS is taught to the grand structure of society, touching on whether or not a college education is worthwhile, the place of the programmer in society and the company, choice of career, and the issues of globalization. Let me offer first a link that I think is an actual contribution to the discussion; the rest I'll let you judge for yourself :-). First the CS education, then the larger issue, including a few embryonic thoughts about apprenticeship.

On how to teach CS, let me point out that this issue has received attention at the highest levels. The ACM issues guidelines on curricula for:
  • computer science
  • computer engineering
  • information systems
  • software engineering
  • associate degree programs
  • K-12
and most have been revised since 2000. The CS undergrad curriculum recommends, for example, a core of only 18 lecture hours on operating systems, out of a total of 280 or so core hours. In a U.S. research university, that would roughly represent the sophomore-junior content of a CS degree, and would probably roughly be doubled in the senior year to a total of perhaps 560 hours of lecture and maybe three times that in homework, a couple of thousand hours on CS alone, ignoring math, other sciences and engineering, humanities and social sciences. In other types of advanced learning institution, the amount could be substantially less or quite a bit more.

Craig Partridge mentioned a workload for students that amounted to about 600 lines of code a week for the term. I would hope that master's students in CS would be able to work at that rate, but that's clearly too much to expect of first-year programming students. The ACM curriculum above includes a several-page discussion of the importance of teaching programming v. principles, which I won't go into here, but I highly recommend reading it.
(If you're keeping score on the anecdotal topics mentioned so far in this conversation, version control probably makes the core curriculum, automata and shared libraries are in the supplementary curriculum. A couple of lectures on privacy, intellectual property and other aspects of computers in society seems like a positive thing.)
I'm not involved in the curriculum effort, but I suspect they are under more or less constant revision. In recent years, some universities have been adding programs in game design, bioinformatics, business/technology dual majors. One can certainly argue that the overall CS curriculum is in flux and varies across the country (and around the world), but beyond that it's difficult to generalize.
The above I'm fairly confident is a positive contribution to the discussion. From here down, I'll let you form your own opinion. Much of what I would write Karl Auerbach has already written, and far more succinctly and eloquently than I seem to be capable of. But let me offer a twist or two, and go a step farther.
Zach and other writers made several major points:

  1. The undergrad CS curriculum at many colleges/universities is inadequate.
  2. The undergrad CS curriculum at many colleges/universities is poorly taught.
  3. Students arrive with differing backgrounds and ability levels, and consequently come away with different learning experiences and knowledge.
There are several corollary points:
  • A. It's possible to pick up "on the street" what you need to know to do most jobs in a Silicon Valley startup.
  • B. Promotion at major research universities forces a focus on research rather than teaching.
  • C. CS as a major is not very attractive right now, for a variety of reasons, culminating in "programmers are treated like dirt in companies, and with all the jobs being exported to India, it's a bad career choice anyway".

I think all of these points except #3 are debatable. Certainly the discussion so far has not made a distinction between #1 & #2. I'm not really going to discuss most of these points directly, but they form the undercurrent in what follows.

For millenia, kids were apprenticed to a profession at an early age. We still have this, after a form. LeBron James, Michelle Wie, the Magnificent Seven (remember them?), and all pro go players and musicians took up their vocations essentially full time well before what we consider "working age". Many of us starting hacking at a similar age, but maybe we need formal extracurricular mentoring and a similar support system? A way for kids to "go pro" earlier and still get the coaching they need? Virtuosos almost all spend 20-40 hours a week at it from the age of fourteen or earlier, why should programming be any different?

I've heard people say, "Oh, who needs college?" But I'm a fan of the modern apprenticeship -- the Western liberal arts college degree. It's your chance to read Wilde, Plath, Joyce, Thoreau, the Federalist Papers; learn a foreign language (okay, I flunked German), play on the basketball team, take up scuba diving; study how the encroachment of the modern world is impacting the Mbuti pygmies and the Navajo; maybe do a year abroad (I wish I had). Personally -- call me a science chauvinist -- I think anyone who claims to have a well-rounded education should be able to talk intelligently about relativity (special, at least) and quantum theory, arguably the two most important (and misunderstood) discoveries of the twentieth century (fans of the nature of the chemical bond, the structure of DNA, the Big Bang and plate tectonics all have a case, though). (On a tangent, let me recommend Alan Lightman's The Discoveries and Richard Muller's course titled "Physics for Future Presidents".)

For most people, college will be the broadest point of your life. For many -- including a particular skinny sixteen-year-old from southern West Virginia -- it's your first extended contact with people from other parts of the country and the world, and with the breadth and wealth of the world of ideas. Take time to hang out with the music majors (okay, Caltech doesn't have any of those, but it does have English majors). Consider a career in astronomy or geology instead of hacking. Trust me, once you're in a company and writing code, the world gets a whole lot narrower, even if you're working shoulder to shoulder with people from China, India and Finland.

If you're of pragmatic bent and recoil in horror at the thought of whiling away your days trying to decipher Joyce and Eliot rather than starting that get-rich-quick Internet business, well, I have trouble imagining that anyone can be effective at building large-scale computer systems without some background in probability. Probability requires at least basic calculus, and if we can get you as far as queueing theory, it will change the way you think about systems problems. If you are interested in image or audio processing, you need linear algebra and Fourier transforms, and the basics of amplifiers and RC circuits won't hurt; if you're in communications, antenna design and wave guides help; if you write firmware, you gotta understand what the funny lines on a signal analyzer mean. And I have yet to meet the engineer (myself included) whose writing and presentation skills need no polishing. Note that none of that is included in the 560 lecture hours mentioned above, but I wouldn't trade any of it for another hour hacking code.
Is it possible to learn all of this without going to college? Of course! Two of the brightest people I know are extremely well-rounded, and neither finished college. Did it hurt their careers? Probably in a minor way. Are they successful anyway? Very. (And no, neither is named Gates or Jobs). If you're smart, disciplined, get a good break and a good mentor, you can do it. But for most people, the direction and semi-structure (compared to high school) of a university will bring them needed breadth, depth, and maturity -- and a higher probability of success in life. (No guarantees, though!)

It's also true that the university environment as currently constructed is not perfect for everyone. But for many people, bailing out with the excuse, "I wasn't learning anything anyway," is a true statement that says as much about them as it does about the system.

And lest you think I'm in favor of stuffing people's heads with facts but not teaching them to solve problems, let me reassure you: I tell my students that the only thing they *must* learn in college is how to think, everything else is optional. While I do believe there is a core curriculum I wish they would learn, four years is a short time and a focus on any given technology is extraordinarily short-sighted when planning for a forty- to sixty-year career.

By now I've no doubt convinced you that I'm an unrepentant ivory-tower academic. However, I've spent about half my career to date in industry. Was I well-prepared when I came out of school? Let's say I was still a work in progress, and I owe more to a mentor and friend named Wook than any professor. Certainly I have regrets, the largest of which is not taking Jim Blinn's graphics class, but most of them are due to my own shortcomings. But because I was taught how to learn and how to work hard, I have followed up with enormous amounts of learning, and at every step of the learning process I felt that my time at Caltech helped.

From here, I was going to springboard into the place of the programmer in society and globalization, but this is way too long already. Let me thank you for reading this far, and finish with four thoughts:
  • The Western liberal arts CS degree, properly taught and learned, has long-lasting value. The key question is, is it worth a thousand bucks a week for four years? 
  • Engineers are paid to solve problems, not to write code. Demonstrate an ability to learn about the problem domain in which you are working, and your prospects improve; breadth and depth help. 
  • What is it that makes Code Monkey such a pathetic figure? The fact that he's not in control of his own destiny. If you want that control (and, IMO, if you want a date), being articulate and well-rounded is the path to leadership.
  • Finally, keying on that last word, the job of a university is to provide productive members of society. You should graduate knowing not just a handful of technical facts, but how you can find the best role for yourself, where "best" makes you happy, earns you a buck or two, and allows you to contribute to the greater good.
Do we accomplish all of this in four years? Nah. Do we start students on the right path? Some of them. Should we figure out how to do our best for more of them? You betcha. Are professors perfect, giants striding the Earth before whom all should bow? No way. They're taught to swagger, but if they haven't humbled deep down by their own learning experience, they haven't been paying attention.

One thing's for sure, though: the development of the next generation of leaders, both in technology and society, is far too important to be left to what disaffected teenagers pick up surfing the blogs of disaffected, semi-literate, technical and political ideologues.