This morning, the New York Times carried a story on an effort to build "large data centers that students can tap into over the Internet to program and research remotely, which is called 'cloud computing.'" The problem is that even well funded universities have a tough time putting students into compute environments that give them experience writing applications that use and manage multiple machines over a network.
What IBM and Google are doing is aimed at research in this area. A critical need also exists on the teaching side. Randy Bryant, the Chair of Carnegie Mellon's CS program is quoted in the article:
"We in academia and the government labs have not kept up with the times," said Randal E. Bryant, dean of the computer science school at Carnegie Mellon University. "Universities really need to get on board."
I've taught a class for 10 years where I'd really have loved for each student to use multiple machines. For many years it was all I could do to get each student in the class a machine that they could use for the entire semester so that they could experience being root, installing packages, messing it all up, starting over, and so on. Lately virtualization helped, but it was still hard to get students more than one or two machines.
This semester, for the first time, I'm doing my class the way I've always wanted to thanks to Amazon's EC2 and S3 services. Using EC2, students can create as many machines as they need and Amazon's SQS (simple queue service) gives them the infrastructure they need to hook these machines to each other asynchronously, and to other services on the Web. Here's a diagram of the system my students are building this semester.
Without opportunities like this, most CS students graduate without ever writing anything bigger than an application written in one programming language that runs on a single machine. Much of today's computing demands more and training students in those technologies requires access to the right compute platforms.
The cost for EC2 is $0.10/hour of compute time. With some careful management of the EC2 cloud (like making sure machines aren't left running when they don't need to be) I'll be able to do the class for (hopefully, much) less that $100/student. That's less than the textbooks for many classes.
That brings up the second problem: there aren't many good texts in this area. In fact, there aren't any that I know of that are written as college texts. Most are "how to" books that emphasize specific technologies over general principles. Students need to understand principles even as they experiment with the technologies of today. That way they'll easily adjust to the technologies of tomorrow.
The final problem is that there aren't a lot of professors who understand these technologies. Many understand the ideas, but have never done it. We need summer training that can help faculty get up to speed. Maybe IBM, Google, or Amazon would like to help with that?
Years ago, the computer chip business faced a similar problem. Students had a tough time learning about chip design and fabrication for all the reasons we've discussed above. The solution was industry and government working together to put programs in place that made it practical to teach the subject. MOSIS, Mead and Conway's text, and NSF summer camps for faculty were vital components. That same model could have a dramatic impact for the future of distributed systems.