Cloud Computing for Students

Cloud Computing for Students

Summary: This morning, the New York Times carried a story on an effort to build "large data centers that students can tap into over the Internet to program and research remotely, which is called 'cloud computing.'" The problem is that even well funded universities have a tough time putting students into compute environments that give them experience writing applications that use and manage multiple machines over a network.

SHARE:

This morning, the New York Times carried a story on an effort to build "large data centers that students can tap into over the Internet to program and research remotely, which is called 'cloud computing.'" The problem is that even well funded universities have a tough time putting students into compute environments that give them experience writing applications that use and manage multiple machines over a network.

What IBM and Google are doing is aimed at research in this area. A critical need also exists on the teaching side. Randy Bryant, the Chair of Carnegie Mellon's CS program is quoted in the article:

"We in academia and the government labs have not kept up with the times," said Randal E. Bryant, dean of the computer science school at Carnegie Mellon University. "Universities really need to get on board."

I've taught a class for 10 years where I'd really have loved for each student to use multiple machines. For many years it was all I could do to get each student in the class a machine that they could use for the entire semester so that they could experience being root, installing packages, messing it all up, starting over, and so on. Lately virtualization helped, but it was still hard to get students more than one or two machines.

This semester, for the first time, I'm doing my class the way I've always wanted to thanks to Amazon's EC2 and S3 services. Using EC2, students can create as many machines as they need and Amazon's SQS (simple queue service) gives them the infrastructure they need to hook these machines to each other asynchronously, and to other services on the Web. Here's a diagram of the system my students are building this semester.

CS462 Project Architecture 2007

Without opportunities like this, most CS students graduate without ever writing anything bigger than an application written in one programming language that runs on a single machine. Much of today's computing demands more and training students in those technologies requires access to the right compute platforms.

The cost for EC2 is $0.10/hour of compute time. With some careful management of the EC2 cloud (like making sure machines aren't left running when they don't need to be) I'll be able to do the class for (hopefully, much) less that $100/student. That's less than the textbooks for many classes.

That brings up the second problem: there aren't many good texts in this area. In fact, there aren't any that I know of that are written as college texts. Most are "how to" books that emphasize specific technologies over general principles. Students need to understand principles even as they experiment with the technologies of today. That way they'll easily adjust to the technologies of tomorrow.

The final problem is that there aren't a lot of professors who understand these technologies. Many understand the ideas, but have never done it. We need summer training that can help faculty get up to speed. Maybe IBM, Google, or Amazon would like to help with that?

Years ago, the computer chip business faced a similar problem. Students had a tough time learning about chip design and fabrication for all the reasons we've discussed above. The solution was industry and government working together to put programs in place that made it practical to teach the subject. MOSIS, Mead and Conway's text, and NSF summer camps for faculty were vital components. That same model could have a dramatic impact for the future of distributed systems.

Topics: Hardware, CXO, Processors, IT Employment

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

6 comments
Log in or register to join the discussion
  • Rather absurd...

    "Without opportunities like this, most CS students graduate without ever writing anything bigger than an application written in one programming language that runs on a single machine."

    Why not? Most large universities have big, heterogeneous environments with Windows, Apple, and several flavors of Unix. As long as machines aren't tyranically locked down there's nothing that stops a student from doing distributed computing.

    I seem to remember being *required* to develop some distributed software.

    I guess maybe institutional IT is slowly destroying CS.
    Erik Engbrecht
  • Student access to large-scale computing

    Of course, current university environments provide access to multiple machines, and our students can and do implement distributed applications on them.

    What is more difficult is to get a number of machines working together on a collective computation, such as would be required to crawl web pages, to do a statistical analysis of Wikipedia articles, or to generate a statistical model of the English language by processing millions of English-language documents. Existing programming environments, such as Condor for managing computing farms or MPI for programming supercomputers just don't apply to this large collection of data-centric applications.

    Google has developed very powerful programming facilities for internal use. The open source Hadoop Project has made a similar environment available to everyone. Classes are proliferating to get students familiar with this style of programming.
    rebryant
  • Message has been deleted.

    searchexpo
  • Message has been deleted.

    nchikkam
  • RE: Cloud Computing for Students

    I have found Solaris 10's zones virtualization technology very effective for this purpose. Zones are very lightweight (not much more overhead than a dozen extra processes per zone), each has its own virtual IP and root password, and for all practical purposes behave like independent hosts. You can run thousands of zones on a single box with quite moderate RAM, something impossible with VMWare or equivalent virtualization products.

    You won't use zones to replicate hard-to-debug race conditions (unless you have several cores), but at my company we built a staging environment replica of 7 production machines running on a single box.
    fazalmajid
  • RE: Cloud Computing for Students

    Hi Phil:

    Great article. We at RightScale heartily agree. There is a free developer edition of RightScale that is quite useful for managing and monitoring EC2 usage, and we also have a number of server templates that can be useful as starting points for projects like the one you describe (including one forthcoming for Hadoop).

    You might also be interested in the CS class our founder Thorsten von Eicken taught at UCSB, based on EC2.

    There's a link at the bottom of our home page at www.rightscale.com.

    Let us know if you find RightScale helpful in what you're doing!

    Regards,

    Michael Crandell
    CEO
    RightScale.com
    mcrandell