Hot Cloud 2009
Jun 15, 2009 Cloud Computing, Conferences
Here are my notes on some of the interesting talks at Hot Cloud 2009. The full list of talks and papers are available at the hot cloud site. There were interesting talks on a variety of topics, but my notes here focus mostly on cloud platforms and work around resource provisioning from the first half of the day.
Open Cirrus Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research
Roy Campbell, Indranil Gupta, Michael Heath, and Steven Y. Ko, University of Illinois at Urbana-Champaign; Michael Kozuch, Intel Research; Marcel Kunze, KIT, Germany; Thomas Kwan, Yahoo!; Kevin Lai, HP Labs; Hing Yan Lee, IDA, Singapore; Martha Lyons and Dejan Milojicic, HP Labs; David O’Hallaron, Intel Research; Yeng Chai Soh, IDA, Singapore
This is a very large (more than 10K nodes spread across 9 sites) testbed being setup by HP and others to study large scale cloud computing problems. They are focusing on computation provisioning issues, and can provide users with either full physical or virtual resources.
Nebulas: Using Distributed Voluntary Resources to Build Clouds
Abhishek Chandra and Jon Weissman, University of Minnesota
The idea here is to explore the potential for creating peer-to-peer style cloud computing platforms that uses resources provided by volunteers similar to something like SETI @home. I like this idea a lot, but there have been many attempts at making things like volunteer based network file systems which never quite took off, and this seems even harder. The difficulty will be determining what the basic platform that people are given access to is like (ie. can you run any app you want within some VM, or is it a specific platform you must develop your app against to make it work), and how do you make the resources shared by users not impact their own application performance. People are pretty willing to share network bandwidth and disk space, but that is because those are generally over provisioned resources. CPU is over-provisioned in a different way — most of the time desktop users use only a fraction of the power provided by their system, but when they do decied to go do something computation intensive, they expect it to respond quickly. This also reminds me of the “transparent memory contribution” work done by Jim Cipar when he was still at UMass, since it had to deal with similar issues of volunteering resources in as transparent a way as possible.
The Case for Enterprise-Ready Virtual Private Clouds
Timothy Wood and Prashant Shenoy, University of Massachusetts Amherst; Alexandre Gerber, K.K. Ramakrishnan, and Jacobus Van der Merwe, AT&T Labs—Research
I thought this paper was really great, but maybe I’m biased since I wrote it. I’ve written a separate blog post about my own work, but the gist is that current cloud computing platforms are insufficient for enterprise users, and we propose using network virtualization techniques to make seamless and secure connections between the cloud resources and enterprise sites.
ElasTraS: An Elastic Transactional Data Store in the Cloud
Sudipto Das, Divyakant Agrawal, and Amr El Abbadi, University of California, Santa Barbara
The idea here is that databases currently don’t scale well into the cloud. Instead people are using simpler (but more easily scaled) key-value stores to keep track of data in the cloud. This doesn’t work well because key stores don’t provide the transaction and consistency features of real databases. They propose Elastras – a scalable, trasactional data store based around the idea of partitioned databases. It wasn’t clear how difficult the problem of determining how to partition data is in the first place, as it tends to be application specific.
Reflective Control for an Elastic Cloud Application: An Automated Experiment Workbench
Azbayar Demberel, Jeff Chase, and Shivnath Babu, Duke University
The idea of reflection is to make an application change its behavior based on the available resources. This could be based on energy or computation resources. This lets you opportunistically exploit surplus resources, and to defer work during congestion. An example of a reflective application is a digital experiment (generally has large data sets, can be partitioned, does not have strong time requirements). Seems to me like this is useful for any batch processing style application. The work focuses on figuring out how to determine the utility of running different experiments depending on what resources are available, which may be very difficult since the experiment design space can be huge. It seems to me that the idea of reflective applications is useful even at a more basic level to both let applications be aware of what resources are available and for service providers to know what applications desire.
Colocation Games and Their Application to Distributed Resource Management
Jorge Londoño, Azer Bestavros, and Shang-Hua Teng, Boston University
This paper explores the placement problem within data centers using game theory techniques. In general they find that a Nash Equilibrium will not be reached, but that in a restricted environments it will always converge. I’ll be interested to look through their results more carefully to better understand how the potential for multiplexing resources in these environments can be reduced based on the self-interests of users.
Virtual Putty: Reshaping the Physical Footprint of Virtual Machines
Jason Sonnek and Abhishek Chandra, University of Minnesota
The idea here is that the physical footprint required by a VM can vary depending on its environment. For example, VMs colocated together may be able to share memory, or may require much fewer network resources if they can put on the same LAN. To exploit this, you need to estimate the “virtual” footprint of a VM that captures how its physical requirements can change depending on its environment. The first challenge here is to efficiently capture this model — you will only be able to get a significant benefit from this kind of technique if it is being applied across a very large number of VMs (my memory sharing work suggests this as well). Second is the issue of determining how to deal with applications changing over time – memory and network communication patterns may change over time, so how often do you need to recompute the footprint?
Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters
Peter Bodík, Rean Griffith, Charles Sutton, Armando Fox, Michael Jordan, and David Patterson, University of California, Berkeley
The goal here is to model application performance and automate management online. Models are based on data gathered from the system as it is running, allowing it to be adapted as more data is produced. Has some automated techniques to detect phase shifts in application type that will require a new model. The problem with these systems is always a question of how well they can deal with data that is outside of their training data. One of this system’s benefits is supposed to be that it doesn’t rely on training data produced from experimental setups, and instead builds the model on the fly as data is gathered. Bbut of course that may mean that the models are only really applicable for “normal” operating conditions, and that it will not be able to make reasonable predictions for what will happen after a load spike.
Other Hot Cloud Reports
I’ll add any other hot cloud blogs or reports as I find them (or comment below).
- NetworkWorld 5 Cool Cloud Computing Research Projects – sadly does not mention me… ;]
June 17th, 2009 at 1:04 am
[...] setting up a list of links to all blog posts written about their conferences (you can start with my notes from HotCloud!). I want to know what other researchers are thinking about, and I also want to be updated whenever [...]