Wednesday, July 29, 2009

Help Me With My Summer Reading

I'm feeling the need to recharge my store of ideas, and I have the
nagging feeling that my lack of currency in a bunch of fields is
causing me to miss some connections I could use in my own research.

So, I'm looking for a reading list of, say, the one hundred most
important papers of the decade. It doesn't have to be an even
hundred, but I'm looking for a good summer's reading. (Given that
it's mid-2009, now would be a good time to start composing such a list
anyway, depending on where you want to place the "decade" boundary.)

I want these papers to cover *ALL* fields of computer science and
engineering; I am by nature catholic in my reading.

You probably know that my current field is quantum computing, but prior to
that I did network-attached storage. The 1990s were a very creative,
ambitious decade in that area; I admit I read much less there now than
I used to, but I haven't seen anything really earthshaking in storage
in the last few years. (Friends still in the area will no doubt
correct me.)

If such a list already exists, I'm happy to use it as-is, otherwise
I'm willing to manage a conversation and create such a list.

Since this list will be very broad, I want only a few "MUST READ"
papers in each field. What are the new ideas in your field? If you
had a short meeting with, say, an NSF god descended from Olympus, what
ideas would you cite to convince him/her that your *field* (not your
pet idea) is a vibrant field with real-world impact, worthy of
large-scale support? What papers or ideas have changed the way you
think?

So, if your field is:

Theory:
* complexity theory
* algorithms
* data structures
* automata/Turing machines/FSMs, etc.
Systems:
* hardware technology (Moore's Law, etc.)
* processor architecture
* systems architecture
* operating systems
* storage systems
Programming languages
Software engineering
Networking
Artificial Intelligence
Search
Databases
Security
Human-computer interaction
Network or systems operations
Ubiquitous systems/sensor networks
Computer Science education (please help me learn to teach!)
etc.

please send me your suggestions.

I'm even willing to go as far afield as robotics and bioinformatics,
if you can convince me it's worth my time to go read. I'm also
willing to accept old ideas that are finding new urgency;
transactional memory would be a good example, virtualization another.

I will leave it to you to balance newness of idea with real-world
impact, and to decide whether to recommend the key original paper(s)
or a survey paper, if one exists already.

(I'm doing this a tad ad hoc; I really should reconcile against, say,
the ACM computing curricula or a journal keyword list, but I probably
won't bother. Some may also object to the categorizations above;
don't worry about it, just send me the papers/ideas you think are
critical, and we will work on the categorization and balance of the
list later.)

(Yes, the choice to limit this to the last decade is arbitrary; there
are plenty of old ideas I'm not familiar with, too -- I recently ran
across R-trees for the first time, for example -- but, generally
speaking, I'm after ideas too new to have made it into textbooks,
otherwise I'd just pick up a recent text.)

(And on these grounds let me say that I'm enjoying the revitalized
CACM, which seems to be helpful in focusing on new, important ideas,
with more timely and accessible review articles than Computing
Surveys.)

A few thoughts to get started:

* The hierarchy of limits in computing technology:
An outstanding synthesis of Moore's Law, Landauer's principle, etc.
Dense, but well-written and worth the effort to understand. I don't
care if the ideas are old, this one is critical.

@Article{meindl01:_terascale-si,
author = {James D. Meindl and Qiang Chen and Jeffrey A. Davis},
title = {Limits on Silicon Nanoelectronics for Terascale
Integration},
journal = {Science},
year = 2001,
volume = 293,
pages = {2044--2049}
}

* proper network topology analysis:
A clear-eyed look at mathematical analysis of and understanding of the
Internet.

@article{JohnCDoyle10112005,
author = {Doyle, John C. and Alderson, David L. and Li, Lun and Low, Steven and Roughan, Matthew and Shalunov, Stanislav and Tanaka, Reiko and Willinger, Walter},
title = {{The "robust yet fragile" nature of the Internet}},
journal = {Proceedings of the National Academy of Sciences},
volume = {102},
number = {41},
pages = {14497-14502},
doi = {10.1073/pnas.0501426102},
year = {2005}
}

* distributed hash tables (DHTs):
Just barely makes my "decade" cutoff, but one of the most influential
ideas of recent years; fault-tolerant, truly distributed,
loosely-coherent key-value pairs, useful for managing e.g. a lookup
system for P2P networks. (You can argue for another choice of
paper.)

@inproceedings{stoica2001chord,
title={{Chord: A scalable peer-to-peer lookup service for internet applications}},
author={Stoica, I. and Morris, R. and Karger, D. and Kaashoek, M.F. and Balakrishnan, H.},
booktitle={Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications},
pages={149--160},
year={2001},
organization={ACM New York, NY, USA}
}

* Peer-to-peer (P2P) networks:
No recommended paper here yet. I've read a few, but none really stand
out in my mind. Suggestions?

* Delay-tolerant networks (DTNs):
No recommended paper here yet. Might not make the decade cutoff.

* Network coding:
Just barely makes the decade cutoff. This is the seminal paper,
AFAIK, but the writing in it is poor; recommendations for an easier
read gladly accepted. A highly theoretical idea that seems to be
gaining surprising traction in real-world systems.

@article{ahlswede2000nif,
title={Network information flow},
author={Ahlswede, R. and Cai, N. and Li, S.Y.R. and Yeung, R.W.},
journal={Information Theory, IEEE Transactions on},
volume={46},
number={4},
pages={1204--1216},
year={2000}
}

* Ajax:
I haven't seen any good papers on it, and it's a philosophy rather
than a technology, but surely it's important enough to rate
here...suggestions?

* Transactional memory:
One possible route to effective parallel programming. The right one?
Good question!

@Article{larus07:_trans_mem,
author = {James Larus and Christos Kozyrakis},
title = {Transactional Memory},
journal = cacm,
year = 2008,
volume = 51,
number = 7,
pages = {80--89},
doi = {10.1145/1364782.1364800}
}

* MapReduce:
One of a set of very good ideas to come out of Google in the last few
years. The right way to get to real parallel programming on very
large datasets? Good question! The database community seems to think
not. But having done a little MPI programming, I can assure you that
MPI is not for the masses, at least not in its current form.

@article{dean:mapreduce-cacm,
author = {Jeffrey Dean and Sanjay Ghemawat},
title = {{MapReduce}: simplified data processing on large clusters},
journal = {Commun. ACM},
volume = {51},
number = {1},
year = {2008},
issn = {0001-0782},
pages = {107--113},
doi = {http://doi.acm.org/10.1145/1327452.1327492},
publisher = {ACM},
address = {New York, NY, USA},
}

* Rethink the Sync:
Recommended by a friend, who called it his favorite paper of the
decade.

@inproceedings{nightingale2006rethink,
title={Rethink the sync},
author={Nightingale, E.B. and Veeraraghavan, K. and Chen, P.M. and Flinn, J.},
booktitle={Proceedings of the 7th symposium on Operating systems design and implementation},
pages={1--14},
year={2006},
comment ={Best paper; Honey's favorite paper of the last decade.}
}

* Disk MTTF:
An analysis rather than idea paper, but important for anyone who does
large-scale systems or cares when their laptop disk is likely to
fail.

@article{schroeder2007dfr,
title={{Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you}},
author={Schroeder, B. and Gibson, G.A.},
journal={Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST)},
year={2007}
}

That's eight papers on eleven ideas that have changed the way I think
about computing systems in the last few years, mostly in the
networking area (a couple may have to get deleted to hold the final
list to 100 papers). There are many other topics, of course (chip
multiprocessors and radical new architectures such as TRIPS;
hypervisors and virtualization of systems and networks is another
old/new idea), but I'll leave them to others to promote. I've read
hundreds of papers in the last decade, but I'm interested in what
*you* consider important, not my own biases here. And while IP is a
heavily network-oriented list and will no doubt expand on that set of
ideas above, please look as far abroad as you comfortably can -- or
forward on to colleagues in other areas who might be interested in
such a list.

Please help educate me!

6 comments:

Jim Harrington said...

Hello Rod! I believe that one of the seminal works of the past decade in information processing is arXiv:math/0003117v1 by Peter Gacs. Since 166 pages (over 200 pages in published version) may be heavy for summer reading, I highly recommend the reader's guide (only 34 pages) by Larry Gray that was published alongside the above work.

John Sidles said...

If you read Vladimir Arnol'd's book on symplectic mechanics, then perhaps you agree with Sergei Novikov's review that it carries you to an end-point, not a beginning.

But then if you read Frenkel and Smits fine textbook on computational simulation (which is constructed upon Arnol'd's symplectic foundations), then you see that Novikov's criticism came too early ... that Arnol'd did provide a mathematical starting point for IBM's 21st century vision of molecular biology, as reviewed in our QSE Group's Spin microscopy's heritage, achievements, and prospects.

Can anyone foresee the scale of this 21st century enterprise? Well, Simon Ramo did pretty well at foreseeing the later quarter of the 20th century ... and so maybe we should all be looking into the future ourselves.

That's my summer reading, anyway ... BibTeX follows ...

----------
@book{, Address = {New York}, Author = {Arnol{\cprime}d, V. I.}, Publisher = {Springer-Verlag}, Series = {Graduate Texts in Mathematics}, Title = {Mathematical methods of classical mechanics}, Volume = {60}, Year = {1999}}

@incollection{, Address = {Berlin}, Author = {Novikov, Sergei}, Booktitle = {Mathematical research today and tomorrow ({B}arcelona, 1991)}, Pages = {13--28}, Publisher = {Springer}, Series = {Lecture Notes in Math.}, Title = {R\^ole of integrable models in the development of mathematics}, Volume = {1525}, Year = {1992}}

@book{, Address = {San Diego}, Author = {Frenkel, Daan, and Smit, Berend,}, Publisher = {Academic Press}, Title = {Understanding molecular simulation : from algorithms to applications}, Year = {1996}}

@article{, Author = {J. A. Sidles}, Date-Added = {2009-02-19 14:52:42 -0800}, Date-Modified = {2009-02-19 14:57:35 -0800}, Journal = {Proc. Nat. Acad. Sci. USA}, Number = {8}, Pages = {2477--2478}, Title = {Spin microscopy's heritage, achievements, and prospects}, Volume = {106}, Year = {2009}}

@article{, Author = {R. Booton and S. Ramo}, Journal = {{IEEE} Transactions on Aerospace and Electronic Systems}, Month = jul, Pages = {306--9}, Title = {The development of systems engineering}, Volume = {{AES}--20}, Year = 1984}

rdv said...

Comment test.

Unknown said...

hi rod,

in the networking field, a few important guys have already done part of this job (selecting major papers).

i would suggest you to refer to the "recommended reading" zone of ccr:

http://ccr.sigcomm.org/online/?q=node/157

best,

marcelo

rdv said...

Awesome, Marcelo, thanks. That's exactly the kind of thing I was looking for.

rdv said...

Marcelo,

I just went through the recommendations of those "senior people", and I'm relieved to see that I've read most of the older papers. The newer ones, though, I mostly haven't, so they're exactly what I'm looking for.

Thanks again!

Now for equivalent lists in architecture, AI, theory...