rdv live from Tokyo: (Research) Distributed Systems that Influenced Me

Recently I wrote about some heavily-used, commercial distributed systems that influenced me. They were the obvious ones: NFS, VAXclusters, X Windows. It's important to note that those systems, in turn, were influenced by the seminal work of the Xerox PARC team that did the Alto, Ethernet, the laser printer, the mouse and GUI, and object-oriented programming. I read about them but I didn't have the privilege of working with them. The PARC outputs became commercial systems in the 1970s, but they were so radical that it is almost fair to call them research instead.

And, of course, before PARC there was the legendary Mother of All Demos. I'm not sure when I first heard about it; not until after much of my study, I think. Now you can watch it on YouTube, but in my formative years it wasn't available and so was myth more than actual influence.

My basic story goes something like this: I arrived at Caltech wanting to be an astronomer or paleontologist, but wound up drifting, studying CS, EE, and optics (Applied Physics). (The optics study pays big dividends now in my quantum research, especially Quantum Internet.) I learned to program a little, including in CS1 taught by Carver Mead and a great lab by own advisor, Chuck Seitz, on 8-bit and 16-bit microprocessors. (If memory serves, we did assembly for 8088, 8086, Z-80, 6502 and 6800.) Thanks to Ron Ayres, who taught me silicon compilation and who worked for the MOSIS Service, I wound up with a job at USC/ISI in the Computer Center as VMS sysadmin supporting MOSIS, and discovered that I loved working on computer systems (a field I have never really left since).

At some point, I was wandering the ISI library, and pulled the (paper) proceedings of ASPLOS-III off the shelf, and basically devoured it cover to cover. (I was particularly intrigued by a paper showing how to balance vector and scalar performance in an architecture.) I decided I was interested in computer systems research, and applied to but wasn't accepted into Ph.D. programs at Caltech, MIT and UT-Austin. Instead, I entered the USC master's program part time while continuing to work at ISI. For some reason I picked EE-Systems (computer engineering) rather than CS. My advisor, I believe, was Alice Parker, but my true mentors were Kim Korner and Deborah Estrin, with a little influence from Peter Danzig, all from the CS department. It was with them that I studied distributed systems.

In one seminar class taught by Deborah and Kim, we read about sixty papers, including a few on AI and other broad topics that Kim felt like we should study. Kim's graduate-level OS class was essentially reading the important literature. I also took a summer class on federated databases by Peter, and a couple of parallel architecture classes, one that was more lecture style and one that was focused on the literature. Over those four classes focused on the computing systems research literature, I probably read in excess of two hundred papers. I should dig up those reading lists. (I did the non-thesis option because I knew there were a lot of classes I wanted to take; in addition to those four and the architecture lecture, I took a compiler class, an OS lab class (which I would teach a few years later), probability, and queueing theory, for a total of nine classes, each of which was a lot of work.)

In that seminar-style class taught by Deborah and Kim, we studied these systems and concepts, and quite a few more (in no particular order):

Lamport’s clocks: To me, this was a revelation. It's one of those ideas that changes the way you think, which of course is one of the goals of education, right?
Byzantine generals (Lamport again): Since I was a VMS sysadmin, this one also helped me to understand what I was seeing when the VAXcluster hung. Not that it really helped me debug the cluster; usually, we just rebooted nodes until we found the one that was the problem, then the rest of the cluster would spring back to life.
Virtual time (USC and/or JPL): WOW! Speaking of ideas that changed my perception of the world, virtual time showed how optimistic computation can be used to create a system that computes on an effectively globally synchronous model. Jefferson wrote in terms of simulations that have a notion of time, but the techniques are far more general than just physical simulations. The paper lists Jefferson's affiliation as USC, but I think he was at JPL by the time I was reading this. I have never met him.
Amoeba (Vrije University, Amsterdam): Among the systems on this list, Amoeba probably influenced my thinking more than the others here, though it had important drawbacks that meant it was never competitive as a production system. It was just too different from UNIX at a time when that meant death. But its true distributed object model and the expectation that objects could execute on VAX, 68000-family or other processors seamlessly was a critical, and radical, goal. In fact, I tried to follow that model myself in a project I did in one of my grad school classes, shipping code from node to node across a socket for recompilation and execution at the far end.
Among other features, it was more or less what we would call a microkernel, with services focused on supporting individual program objects rather than a full process at it is understood today. Communication between objects was a form of RPC.
The 1990 CACM paper linked to above (which I think was the paper we read) doesn't mention the Amoeba project's ongoing, largest contribution to computing: the Python programming language.
Sprite (Berkeley): my primary memory of Sprite is its process migration, which was implemented early on and extended and studied into the mid-1990s. Its single-system-image approach is important and powerful, but not easy to make robust and high performance. Much later, I was impressed and intrigued by Mor Harchol-Balter's observation that process lifetimes follow a long-tailed distribution, making process migration more valuable than it might be otherwise. It is, however, difficult to pull off; you have to maintain open files and any names or services that the process was accessing before migration have to be transparent once moved. Once the process has moved, it also becomes more fragile; it is now dependent on not only its current machine environment, but the continued accessibility of the server from which it migrated, which must continue to provide some of these services. Making this robust is hard.
And so, today, that migration isn't a feature of any major OS: instead, we migrate entire virtual machines in order to achieve load balance within a data center! This VM migration is often engineered so that post-migration, the VM is no longer dependent on the machine it just left, which improved robustness and allows that machine to taken down for maintenance (or shut down altogether). This depends, of course, on any external devices -- most especially storage, in block or file servers -- being independently accessible.
V (Stanford): Across the bay from Berkeley, a separate distributed OS was being developed. One of the most interesting aspects of V was VMTP, a transaction-oriented network transport protocol that supported reliable multicast and was useful for RPCs to multiple servers at the same time. VMTP is one of the few protocols to arise from OS research of that era that resulted in an RFC. We considered adopting VMTP or elements of its design when I worked at Quantum on a network-attached storage system. I am intrigued that the only author listed on the RFC is David Cheriton himself.
LOCUS (UCLA): I was not particularly a USC partisan, so the cross-town rivalry didn't really figure into my thinking, but for some reason LOCUS didn't make a particularly strong impression on me. Although its goal was a single system image (SSI), neither the mechanisms nor the results seemed particularly radical to me at the time. Later I came to realize how valuable the remote I/O mechanisms were and how important Gerry Popek was; I still have my students read his formal description of virtualization from the early 1970s. Popek also was Ph.D. advisor to my friend John Heidemann, who has gone on to become one of the most-cited computer systems people of all time.
Grapevine (Xerox PARC): The granddaddy of them all. PARC is famous as the birthplace of the modern office LAN environment, with GUI workstations, file servers and network-connected laser printers. The Alto was the machine, and Grapevine was the distributed naming and messaging system that made it all work together. Grapevine had a naming system which, to me, has echoes in LDAP; it was focused on naming of services rather than just machines, and so could be used for discovery as well as simple lookup. On top of this naming service was built a messaging (email, more or less) system as well as a method of finding file servers. It included authentication of users in ways that were left out of contemporary systems, a weakness that echoes down through today. It even ran in something of an inter-office fashion, though I think the federation of systems was probably a little wonky. Less than a decade later, Grapevine would be displaced by Sun’s machines, NFS, RPC, LPD, and SMTP, all built on top of DNS, TCP, UDP and IP running over the Ethernet created at PARC. Arguably, Grapevine would have been a better platform for building those early services, though I don’t think the implementation language(s), APIs and libraries were well matched to the move to UNIX and C. I wish I had had some direct experience with Grapevine and Altos; I believe there were still a few around ISI when I arrived in 1985, though I don’t think they were in active use.
Condor (Wisconsin): the hunter of idle workstations. A great idea, scavenging CPU cycles from idle machines to use in an office-wide batch queueing system. Doing this well requires sharing file namespaces across machines. I remain baffled that UNIX has never had a true distributed batch system for ordinary users to submit compute-heavy jobs.
Andrew File System (AFS) (CMU): When I first encountered it, AFS seemed to me like a "better NFS". It doesn't have actual locking, but it caches files locally, allowing offline writes and working better in a wide-area environment. That wide-area ability comes from reducing latency for operations because of the nearby location of your current copy, security, and in part because it uses TCP by default instead of UDP. This local copy approach also improves global performance and scalability, but it is not designed to support a truly shared write workload to the same file; it depends on a callback approach to resolving write conflicts, including when reconnecting after a network partition. One feature is that the AFS file namespace was essentially global because it incorporated the origin of the file in the name, almost but not quite precursor to URLs. AFS never quite took off as a one-for-one replacement for NFS, though, perhaps because NFS worked "well enough" for most installations most of the time, perhaps because AFS was initially free but later controlled by a company that wanted to commercialize it. Most likely it was just the NFS first mover advantage.
A little more than decade after studying it, while working at Nokia, I would propose using AFS in mobile phones as a file sharing system and automatic backup system for photos and documents, years before the iPhone, Google Drive and other cloud storage. Unfortunately, I didn't sell the vision well enough, and the idea didn't go anywhere. Sigh. That's one of my biggest regrets in my career. I think we could have changed the way people thought of mobile terminals well before the iPhone became a thing.
Around Christmas one year, I did prototype IPv6 code for AFS; I think my code was partially or completely rewritten before becoming part of the main AFS distribution, though.
DNS (USC/ISI) (I think; we might have done this in another class.): I also played bridge and poker with Paul Mockapetris, who was at ISI, so it seemed a bit of a lark to be studying his work. I didn't grasp at the time how important DNS was and Paul would become. Unlike most of the other things listed here, DNS isn't a programmable platform in and of itself, but the depth of its relationship to the key ideas we all study in distributed systems work cannot be overstated. The widely distributed, federated, high availability, soft failure, eventual consistency model is crucial to global-scale systems.
Plan 9 (Bell Labs) (I think; we might not have done this in class. It’s even possible that I didn’t really study it until returning to ISI in 1995.): Plan 9 has an extensive Wikipedia entry, so quite a number of people were apparently influenced by its "everything is a file" model, where processes, IPC, sockets, and everything else appeared somewhere in the file tree under /. I think Plan 9 might have been the origin of /proc. Each process could also have its own namespace, an idea that now shows up in virtual root directories used as a common means of isolating processes. (chroot existed long before this, of course.) (Edit: Just two days after I published this, The Register published a very thorough description of Plan9, including its current state. The article was apparently based on a talk given by the author at the FOSDEM conference.)
Closure in Prospero (Washington): speaking of naming, this short paper by Cliff Neumann (who would shortly come to ISI, where he remains) has a closure, or a program that resolves names, to accompany each name. I liked this idea.

I think logical clocks, Byzantine generals, and virtual time were probably in Kim's advanced operating systems class, rather than the class with Deborah; in Kim's class we also read about Multics, VM (the IBM OS), and other already-classic systems that seemed a little alien to me and rather opened my eyes. And, of course, we learned about Mach, the epitome of the microkernel system and also a topic of research at ISI.
I mentioned that deviating from the UNIX model meant death for your system in the 1990s. This was, of course, the Killer Micro era. Microprocessors were coming out at a furious pace, with clock speeds and overall performance climbing to breathtaking levels, with the CISC architectures (x86, 680x0) being displaced by RISC (MIPS, Sparc, PowerPC, PA-RISC). Performance was moving so fast that there was no need to deploy multiple processors on ordinary programming tasks; by the time you got your code working, there was a processor with twice the performance available.

A glance at the origin institutions of the list above shows clearly why California was the place to be. Some versions of Silicon Valley’s history focus on the literal silicon (chips), and perhaps rust (disk drives). Some focus on PARC's outsized role. But it's important to be aware of the overall fecundity of systems work in general and California in particular in the 1970s, 80s and 90s.

Oof, more than enough for now. This entry has gotten a lot longer than I intended, even though it still only scratches the surface of ideas that have influenced my own thinking. Thanks for reading this far!

rdv live from Tokyo

Monday, February 19, 2024

(Research) Distributed Systems that Influenced Me

No comments: