Wednesday, July 29, 2009

Help Me With My Summer Reading

I'm feeling the need to recharge my store of ideas, and I have the
nagging feeling that my lack of currency in a bunch of fields is
causing me to miss some connections I could use in my own research.

So, I'm looking for a reading list of, say, the one hundred most
important papers of the decade. It doesn't have to be an even
hundred, but I'm looking for a good summer's reading. (Given that
it's mid-2009, now would be a good time to start composing such a list
anyway, depending on where you want to place the "decade" boundary.)

I want these papers to cover *ALL* fields of computer science and
engineering; I am by nature catholic in my reading.

You probably know that my current field is quantum computing, but prior to
that I did network-attached storage. The 1990s were a very creative,
ambitious decade in that area; I admit I read much less there now than
I used to, but I haven't seen anything really earthshaking in storage
in the last few years. (Friends still in the area will no doubt
correct me.)

If such a list already exists, I'm happy to use it as-is, otherwise
I'm willing to manage a conversation and create such a list.

Since this list will be very broad, I want only a few "MUST READ"
papers in each field. What are the new ideas in your field? If you
had a short meeting with, say, an NSF god descended from Olympus, what
ideas would you cite to convince him/her that your *field* (not your
pet idea) is a vibrant field with real-world impact, worthy of
large-scale support? What papers or ideas have changed the way you
think?

So, if your field is:

Theory:
* complexity theory
* algorithms
* data structures
* automata/Turing machines/FSMs, etc.
Systems:
* hardware technology (Moore's Law, etc.)
* processor architecture
* systems architecture
* operating systems
* storage systems
Programming languages
Software engineering
Networking
Artificial Intelligence
Search
Databases
Security
Human-computer interaction
Network or systems operations
Ubiquitous systems/sensor networks
Computer Science education (please help me learn to teach!)
etc.

please send me your suggestions.

I'm even willing to go as far afield as robotics and bioinformatics,
if you can convince me it's worth my time to go read. I'm also
willing to accept old ideas that are finding new urgency;
transactional memory would be a good example, virtualization another.

I will leave it to you to balance newness of idea with real-world
impact, and to decide whether to recommend the key original paper(s)
or a survey paper, if one exists already.

(I'm doing this a tad ad hoc; I really should reconcile against, say,
the ACM computing curricula or a journal keyword list, but I probably
won't bother. Some may also object to the categorizations above;
don't worry about it, just send me the papers/ideas you think are
critical, and we will work on the categorization and balance of the
list later.)

(Yes, the choice to limit this to the last decade is arbitrary; there
are plenty of old ideas I'm not familiar with, too -- I recently ran
across R-trees for the first time, for example -- but, generally
speaking, I'm after ideas too new to have made it into textbooks,
otherwise I'd just pick up a recent text.)

(And on these grounds let me say that I'm enjoying the revitalized
CACM, which seems to be helpful in focusing on new, important ideas,
with more timely and accessible review articles than Computing
Surveys.)

A few thoughts to get started:

* The hierarchy of limits in computing technology:
An outstanding synthesis of Moore's Law, Landauer's principle, etc.
Dense, but well-written and worth the effort to understand. I don't
care if the ideas are old, this one is critical.

@Article{meindl01:_terascale-si,
author = {James D. Meindl and Qiang Chen and Jeffrey A. Davis},
title = {Limits on Silicon Nanoelectronics for Terascale
Integration},
journal = {Science},
year = 2001,
volume = 293,
pages = {2044--2049}
}

* proper network topology analysis:
A clear-eyed look at mathematical analysis of and understanding of the
Internet.

@article{JohnCDoyle10112005,
author = {Doyle, John C. and Alderson, David L. and Li, Lun and Low, Steven and Roughan, Matthew and Shalunov, Stanislav and Tanaka, Reiko and Willinger, Walter},
title = {{The "robust yet fragile" nature of the Internet}},
journal = {Proceedings of the National Academy of Sciences},
volume = {102},
number = {41},
pages = {14497-14502},
doi = {10.1073/pnas.0501426102},
year = {2005}
}

* distributed hash tables (DHTs):
Just barely makes my "decade" cutoff, but one of the most influential
ideas of recent years; fault-tolerant, truly distributed,
loosely-coherent key-value pairs, useful for managing e.g. a lookup
system for P2P networks. (You can argue for another choice of
paper.)

@inproceedings{stoica2001chord,
title={{Chord: A scalable peer-to-peer lookup service for internet applications}},
author={Stoica, I. and Morris, R. and Karger, D. and Kaashoek, M.F. and Balakrishnan, H.},
booktitle={Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications},
pages={149--160},
year={2001},
organization={ACM New York, NY, USA}
}

* Peer-to-peer (P2P) networks:
No recommended paper here yet. I've read a few, but none really stand
out in my mind. Suggestions?

* Delay-tolerant networks (DTNs):
No recommended paper here yet. Might not make the decade cutoff.

* Network coding:
Just barely makes the decade cutoff. This is the seminal paper,
AFAIK, but the writing in it is poor; recommendations for an easier
read gladly accepted. A highly theoretical idea that seems to be
gaining surprising traction in real-world systems.

@article{ahlswede2000nif,
title={Network information flow},
author={Ahlswede, R. and Cai, N. and Li, S.Y.R. and Yeung, R.W.},
journal={Information Theory, IEEE Transactions on},
volume={46},
number={4},
pages={1204--1216},
year={2000}
}

* Ajax:
I haven't seen any good papers on it, and it's a philosophy rather
than a technology, but surely it's important enough to rate
here...suggestions?

* Transactional memory:
One possible route to effective parallel programming. The right one?
Good question!

@Article{larus07:_trans_mem,
author = {James Larus and Christos Kozyrakis},
title = {Transactional Memory},
journal = cacm,
year = 2008,
volume = 51,
number = 7,
pages = {80--89},
doi = {10.1145/1364782.1364800}
}

* MapReduce:
One of a set of very good ideas to come out of Google in the last few
years. The right way to get to real parallel programming on very
large datasets? Good question! The database community seems to think
not. But having done a little MPI programming, I can assure you that
MPI is not for the masses, at least not in its current form.

@article{dean:mapreduce-cacm,
author = {Jeffrey Dean and Sanjay Ghemawat},
title = {{MapReduce}: simplified data processing on large clusters},
journal = {Commun. ACM},
volume = {51},
number = {1},
year = {2008},
issn = {0001-0782},
pages = {107--113},
doi = {http://doi.acm.org/10.1145/1327452.1327492},
publisher = {ACM},
address = {New York, NY, USA},
}

* Rethink the Sync:
Recommended by a friend, who called it his favorite paper of the
decade.

@inproceedings{nightingale2006rethink,
title={Rethink the sync},
author={Nightingale, E.B. and Veeraraghavan, K. and Chen, P.M. and Flinn, J.},
booktitle={Proceedings of the 7th symposium on Operating systems design and implementation},
pages={1--14},
year={2006},
comment ={Best paper; Honey's favorite paper of the last decade.}
}

* Disk MTTF:
An analysis rather than idea paper, but important for anyone who does
large-scale systems or cares when their laptop disk is likely to
fail.

@article{schroeder2007dfr,
title={{Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you}},
author={Schroeder, B. and Gibson, G.A.},
journal={Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST)},
year={2007}
}

That's eight papers on eleven ideas that have changed the way I think
about computing systems in the last few years, mostly in the
networking area (a couple may have to get deleted to hold the final
list to 100 papers). There are many other topics, of course (chip
multiprocessors and radical new architectures such as TRIPS;
hypervisors and virtualization of systems and networks is another
old/new idea), but I'll leave them to others to promote. I've read
hundreds of papers in the last decade, but I'm interested in what
*you* consider important, not my own biases here. And while IP is a
heavily network-oriented list and will no doubt expand on that set of
ideas above, please look as far abroad as you comfortably can -- or
forward on to colleagues in other areas who might be interested in
such a list.

Please help educate me!