Main Content RSS FeedLatest Entry

The Singularity

A nice comic on the imminent “singularity”–when technology will revolutionize our lives, bodies, and societies. I’m still a bit skeptical…


Recent Entries

Job Application Resources

I’m not going to be applying for jobs until next year, but recently I’ve been helping a few friends with their own applications which has gotten me interested in the subject. To learn a bit more, I went to the Chronicle of Higher Education’s website earlier today.  I’ve heard my dad (a recently retired sociology professor) refer to the Chronicle many times, but this was the first time I’ve actually read any of it.  I found quite a bit of interesting information, which I will link to here.  This is largely for my own benefit a year from now when I need the info, but hopefully some others will find it useful as well.

  • First Time on the Market – a collection of articles on interviews, teaching statements, and generally what to expect
  • How to Write a Teaching Statement – luckily I got some practice with this when taking a course on teaching in scientific disciplines last year, but otherwise it can be a tricky piece to write when you are coming from a graduate program that focuses almost entirely on research
  • Facing the Truth – an interesting piece on the chances of new grads applying to four year teaching colleges without any teaching experience.

Clustered Bar Graphs in Mac OS X

I use gnuplot for most of my graphing needs, but using it for complicated bar charts has always been a pain. Fortunately, there is a very handy clustered/stacked bar chart generator which wraps gnuplot in a nice perl script to add some extra features. I’d used it previously under Linux without any problems, but to work on a Mac you need to first setup gnuplot (which can be a pain), plus you need the fig2dev utility to actually produce the final output files. Luckily, I found a copy of it compiled for OS X on the jfig webpage, and although it has a warning from 2006 that it may not work on Intel Macs, it works fine on mine. This will let you make eps/pdf versions of your graphs which will work nicely in latex documents.

Setting up Gnuplot on a Mac

I wrote these directions down over a year ago, so they could be a bit out of date. I’d like a permanent record though since some of the steps are a bit tricky…

Gnuplot is used for making graphs. If you try to compile it normally you will get some errors. Here is how to make it work:

  1. Download and install aquaterm – this is a program which will handle the actual plotting graphics for gnuplot.
  2. Download the source code for gnpulot – I am using 4.2.3.
  3. Extract the source code somewhere (double click the file in finder or use “tar xzf FILENAME” from a terminal.
  4. Open a terminal and change to the extracted source directory.
  5. Configure the source code distribution by running: ./configure –with-readline=builtin You must use the –with-readline flag because Mac OS X comes with a bad version of this library. More details here.
  6. Build the source code by running make
  7. Install the resulting package by running make install
  8. You are done!

You can test it out by running gnuplot at the terminal and then typing plot sin(x)

Grad Students Officially Obsolete

Robot Scientist 'Adam' at Aberystwyth University

Adam: The first robotic scientist

So much for job security as an academic researcher… soon we’ll all be replaced with giant robots and monkeys on typewriters…

Scientific Publishing

Pretty good comics…



Improving Data Center Resource Management, Deployment, and Availability with Virtualization

That’s the title of my thesis proposal, which attempts to cram all the work I’ve done over the past four years in just a few words. In the end, I’m pretty happy with the result–I’ve been able to tie together the various projects I’ve worked on to show how virtualization provides powerful new techniques for deploying applications, more efficiently managing resources, and providing high reliability in large data centers.

If you are interested, you can read the full version, or look through my slides.  It should make for absolutely thrilling bed time reading.

Here is the executive summary of what I’ve worked on:

Deployment

I start by looking at the deployment challenges of transitioning to a virtual environment and figuring out where to place VMs. This is an interesting area because virtualization can provide great benefits such as improved server consolidation, but also adds new challenges in the form of virtualization overheads.

MOVE (Modeling Overheads of Virtual Environments)

When you first consider transitioning from running applications natively to using virtual machines, it is important to understand how application resource requirements will change due to the overheads incurred by the virtualization layer. The MOVE project is designed to help predict these resource changes by building a regression model that relates the native and virtual platforms. This was work that I started during an internship at HP Labs in the summer of 2007, working with Lucy Cherkasova.

Memory Buddies – Guiding VM placement with memory information

Once you know your resource requirements, you need to figure out where to put each of your virtual machines.  The Memory Buddies project tries to place virtual machines in order to maximize the amount of memory sharing that can be achieved — if VMs are running similar operating systems or applications, then the virtualization layer can share copies of these duplicated pages. In order to make this practical in a data center with many thousands of VMs, we propose an efficient fingerprinting technique that uses Bloom filters to quickly compare virtual machine memory contents.

Resource Management

Making data centers more efficient is a key concern throughout all of my work.  Virtualization’s greatest benefit comes in the promise of improved server utilization, leading to lower hardware costs and decreased energy consumption.

Sandpiper – automated VM loadbalancing

Alright, now we’ve figured out initial resource allocations and placements for all of our virtual machines, but those initial decisions may not be sufficient (or efficient) if an application’s workload changes over time. Sandpiper is a system which monitors the resource utilization and performance of a set of VMs and dynamically adjusts their resources or migrates them between hosts in order to prevent servers from becoming overloaded. This was the first project I worked on when I came to grad school, and now there are several commercial products out there doing similar things. We recently revised and extended this paper for a journal.

Reliability

High performance systems are only useful if they are reliable. The remaining work for my thesis uses virtualization to decrease the cost of high availability and fault tolerance systems.

ZZ: Cheap Practical Byzantine Fault Tolerance

Byzantine Fault Tolerance is a way of providing very strong reliability guarantees, even in the face of malicious users or application components.  Unfortunately, BFT has a very high cost because each application request must be executed 2f+1 times in order to handle f simultaneous faults. In ZZ, we try to reduce this cost down to only f+1, by using an additional f sleeping VM replicas which are only woken up after a fault is detected.

CloudNet: Wide Area Resource Management and Availability

My most recent work was started while at AT&T in Fall 2008, and looks at how VPNs can be combined with cloud computing platforms to make data center resources appear seamlessly connected to an enterprise’s existing infrastructure. We are further exploring this area to see how we can provide disaster recovery services so that if a data center becomes unavailable, the critical applications running within it can transparently fail over to servers at a different data center.

Usenix ATC 09 Awards & Keynotes

Best Paper Awards

The first best paper award went to Grzegorz Miłoś for Satori: Enlightened Page Sharing. I’m a big fan of memory sharing between virtual machines, so I’m glad to see some recognition for this type of work. I talked with Irfan Ahmad from VMware after the talk and I have to agree with his view that the real benefit of this type of system is not in attempting to free up memory for other VMs, but in reducing I/O latency since fewer blocks need to be read from disk.

Next up was Tolerating File-System Mistakes with EnvyFS, which I haven’t read yet, but now I’ll have to take a look.

Keynote

The keynote was by James Hamilton from Amazon Web Services. He made some pretty interesting points about how enterprise costs are so different from services costs. I’m sure that in time, enterprises will do the best they can to get closer to the services model by eliminating their “people” costs (sorry folks!) and trying to make larger scale homogenous systems. The talk did a great job at providing the high level view for where the real problems are in big data centers. I also recommend his blog for anyone not already checking it out, it is full of a wealth of interesting data (plus some good ideas). The slides from his talk are available on this page.

  • Enterprise’s main cost is people
    • Often about 100 servers per admin
    • Have many different apps, each with relatively small scale -> difficult to automate
  • Services world’s main cost is hardware
    • >1,000:1 server:admin
    • Don’t look at raw performance, look at work done per dollar, or work per joule.
  • Data Center monthly costs (3 year amortization for servers, 15 year for infrastructure)
    • 15MW data center ~$200M
    • Servers 50%
    • Power & Cooling infrastructure 25%
    • Power 22%
    • Other infrastructure 3%
    • Even combined, power and power infrastructure is less than 50%
    • but server costs are decreasing, while energy is not…
    • So current headlines are wrong, but still correct
    • Take away: server cost is still very high, so it makes sense to USE servers if at all possible. Turning them off only saves on the energy costs, which is relatively small (22%)
  • PUE = (total facility power / IT Equipment power
    • 1.7 PUE is a “normal” new Data center (wastes 0.7 watts per 1 watt used by servers)
    • But PUE can be deceiving since it counts things like server fans as useful energy
    • tPUE is new metric that just counts useful server energy – see blog post for more info
    • Key points:
      • Server efficiency and utilization is a good thing to improve
      • Cooling waste is unreasonably high
      • Power distribution waste isn’t too bad
      • When provisioning energy, don’t assume DC will have all servers running at peak load. “Oversell” power, and then shed load to other data centers if somehow all servers simultaneously ramp up to full load.
  • Temperature
    • Most people run at 81 degrees, but systems can handle much higher (Dell 95, Rackable 104)
    • Raising temp within the data center can save lots of money (especially if it is cool outside)
  • Resource Consumption Shaping
    • Apply resource optimization across entire data center
    • Move work from peaks into valleys since costs are based on peaks

My only disappointment from the keynote? His hair wasn’t as big as I’d imagined ;)

What Usenix Can Do for Students

After the Usenix ATC welcome session tonight there was a brief Students BoF meeting to discuss what students get out of Usenix (both the organization and its conferences). We talked a fair amount about the idea of student conferences run by students, which I think is a very good idea. The main issue seems to be one of transportation – if it is a national conference, then only people with sufficient funding will be able to get to it, but if it is a regional conference, then only regions with a high density of students (ie the North East and California) will be capable of gathering a big enough crowd.

I still think this is an idea worth pursuing, although it probably works best at the regional level which will sadly leave out a lot of people in the middle of the country. I know that many AI students in my department attend NESCAI (the North East Student Colloquium on Artificial Intelligence held at Cornell each year), and find it very useful since it gives them a chance to practice presenting their work and networking with other people in a low stress environment. It was repeated several times during the discussion that the “hallway track” at Usenix can be the most valuable part, but many students miss out on that because it can be a bit intimidating to strike up conversations, especially with faculty or industry researchers. Giving students opportunities to practice that at a conference just among their peers would be very helpful. For the students helping with conference organization, they would be exposed to reviewing and how program committees work, experience which is normally very hard to acquire as a graduate student.  I don’t think that Usenix would have too much trouble finding students to help organize such a venture, and I’d be tempted to volunteer myself.

On a more broader note, I feel like Usenix currently does a great job in these areas:

  • Technical research: Usenix ATC provides a forum for the presentation of top quality academic and industrial research. I consider it a great venue for any type of general systems work with strong technical components.
  • Mixing industry and academia: In my (relatively limited) experience, Usenix ATC is the conference with closest to an even match between academics and industry professionals. This is good since both sides need the other, but in most other conferences I’ve seen their is a clear majority in one direction or the other.

Other areas that Usenix could expand on to better support students are:

  • Graduate student development: offer tutorials or seminars on topics like research methods or personal organization (ie systems like GTD). A professor in my department teaches a research methods course which was incredibly helpful for me, and I know he has given 1 hour talks on the subject at other schools to rave reviews. These are the kinds of things that graduate students currently are learning on the job through trial and error, and it is much better to just have them taught to you upfront. I’m not sure how well this would fit at something like ATC, but it would definitely be ideal for a student conference, and even just lists of online resources could help.
  • Insights into academia: this would include things like organizing student run conferences or shadow PCs that allow students to get a better idea of what keeps their advisors busy when they aren’t meeting with us. Learning how to review papers helps us become more critical (in a good way) of all the other papers we read, letting us get more out of them than we would otherwise.
  • Realtime research updates: I wish I had a list of blogs written by systems researchers. Usenix could help organize this by at least setting up a list of links to all blog posts written about their conferences (you can start with my notes from HotCloud!). I want to know what other researchers are thinking about, and I also want to be updated whenever people in my area publish new pieces of work (currently I rely on elaborate mechanisms that automatically check the publicaion webpages of the top people in my field to see if they change each day).  Obviously for this to be fully useful, it needs to support more than just Usenix conferences and workshops, and the updates need to be propagated when papers are accepted, not four months later when they are presented.  Usenix’s push into social networks may help with this too, although I’ll admit that I haven’t “friended” Usenix yet, so I’m not sure…

That’s all I can think of for now, and I’m still on east-coast time, so I need to get to sleep.

Hot Cloud 2009

Here are my notes on some of the interesting talks at Hot Cloud 2009. The full list of talks and papers are available at the hot cloud site. There were interesting talks on a variety of topics, but my notes here focus mostly on cloud platforms and work around resource provisioning from the first half of the day.

Open Cirrus Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research

Roy Campbell, Indranil Gupta, Michael Heath, and Steven Y. Ko, University of Illinois at Urbana-Champaign; Michael Kozuch, Intel Research; Marcel Kunze, KIT, Germany; Thomas Kwan, Yahoo!; Kevin Lai, HP Labs; Hing Yan Lee, IDA, Singapore; Martha Lyons and Dejan Milojicic, HP Labs; David O’Hallaron, Intel Research; Yeng Chai Soh, IDA, Singapore

This is a very large (more than 10K nodes spread across 9 sites) testbed being setup by HP and others to study large scale cloud computing problems. They are focusing on computation provisioning issues, and can provide users with either full physical or virtual resources.

Nebulas: Using Distributed Voluntary Resources to Build Clouds

Abhishek Chandra and Jon Weissman, University of Minnesota

The idea here is to explore the potential for creating peer-to-peer style cloud computing platforms that uses resources provided by volunteers similar to something like SETI @home.  I like this idea a lot, but there have been many attempts at making things like volunteer based network file systems which never quite took off, and this seems even harder.  The difficulty will be determining what the basic platform that people are given access to is like (ie. can you run any app you want within some VM, or is it a specific platform you must develop your app against to make it work), and how do you make the resources shared by users not impact their own application performance.  People are pretty willing to share network bandwidth and disk space, but that is because those are generally over provisioned resources.  CPU is over-provisioned in a different way — most of the time desktop users use only a fraction of the power provided by their system, but when they do decied to go do something computation intensive, they expect it to respond quickly.  This also reminds me of the “transparent memory contribution” work done by Jim Cipar when he was still at UMass, since it had to deal with similar issues of volunteering resources in as transparent a way as possible.

The Case for Enterprise-Ready Virtual Private Clouds

Timothy Wood and Prashant Shenoy, University of Massachusetts Amherst; Alexandre Gerber, K.K. Ramakrishnan, and Jacobus Van der Merwe, AT&T Labs—Research

I thought this paper was really great, but maybe I’m biased since I wrote it.  I’ve written a separate blog post about my own work, but the gist is that current cloud computing platforms are insufficient for enterprise users, and we propose using network virtualization techniques to make seamless and secure connections between the cloud resources and enterprise sites.

ElasTraS: An Elastic Transactional Data Store in the Cloud

Sudipto Das, Divyakant Agrawal, and Amr El Abbadi, University of California, Santa Barbara

The idea here is that databases currently don’t scale well into the cloud.  Instead people are using simpler (but more easily scaled) key-value stores to keep track of data in the cloud.  This doesn’t work well because key stores don’t provide the transaction and consistency features of real databases. They propose Elastras – a scalable, trasactional data store based around the idea of partitioned databases. It wasn’t clear how difficult the problem of determining how to partition data is in the first place, as it tends to be application specific.

Reflective Control for an Elastic Cloud Application: An Automated Experiment Workbench

Azbayar Demberel, Jeff Chase, and Shivnath Babu, Duke University

The idea of reflection is to make an application change its behavior based on the available resources.  This could be based on energy or computation resources.  This lets you opportunistically exploit surplus resources, and to defer work during congestion. An example of a reflective application is a digital experiment (generally has large data sets, can be partitioned, does not have strong time requirements). Seems to me like this is useful for any batch processing style application.  The work focuses on figuring out how to determine the utility of running different experiments depending on what resources are available, which may be very difficult since the experiment design space can be huge. It seems to me that the idea of reflective applications is useful even at a more basic level to both let applications be aware of what resources are available and for service providers to know what applications desire.

Colocation Games and Their Application to Distributed Resource Management

Jorge Londoño, Azer Bestavros, and Shang-Hua Teng, Boston University

This paper explores the placement problem within data centers using game theory techniques. In general they find that a Nash Equilibrium will not be reached, but that in a restricted environments it will always converge.  I’ll be interested to look through their results more carefully to better understand how the potential for multiplexing resources in these environments can be reduced based on the self-interests of users.

Virtual Putty: Reshaping the Physical Footprint of Virtual Machines

Jason Sonnek and Abhishek Chandra, University of Minnesota

The idea here is that the physical footprint required by a VM can vary depending on its environment. For example, VMs colocated together may be able to share memory, or may require much fewer network resources if they can put on the same LAN. To exploit this, you need to estimate the “virtual” footprint of a VM that captures how its physical requirements can change depending on its environment.  The first challenge here is to efficiently capture this model — you will only be able to get a significant benefit from this kind of technique if it is being applied across a very large number of VMs (my memory sharing work suggests this as well). Second is the issue of determining how to deal with applications changing over time – memory and network communication patterns may change over time, so how often do you need to recompute the footprint?

Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters

Peter Bodík, Rean Griffith, Charles Sutton, Armando Fox, Michael Jordan, and David Patterson, University of California, Berkeley

The goal here is to model application performance and automate management online. Models are based on data gathered from the system as it is running, allowing it to be adapted as more data is produced. Has some automated techniques to detect phase shifts in application type that will require a new model. The problem with these systems is always a question of how well they can deal with data that is outside of their training data.  One of this system’s benefits is supposed to be that it doesn’t rely on training data produced from experimental setups, and instead builds the model on the fly as data is gathered. Bbut of course that may mean that the models are only really applicable for “normal” operating conditions, and that it will not be able to make reasonable predictions for what will happen after a load spike.

Other Hot Cloud Reports

I’ll add any other hot cloud blogs or reports as I find them (or comment below).