Watts, MIPs, and Google

Last night a few hundred assorted New York geeks packed into Google’s new Chelsea offices to hear Google engineer Luiz Barroso discuss “Watts, Faults, and Other Fascinating Dirty Words Computer Architects Can No Longer Afford to Ignore.” Capsule version: While CPU power has been increasing regularly, for the last fifteen years or so MIPs per watt may actually have been going down. At the very least, it has not been growing nearly as fast as total processing power per server. The good news is that since we haven’t really bothered to optimize our computers for energy efficiency there’s still a lot of low-hanging fruit left to pick off in this space.

MIPs vs. Watts

Given Google’s large data centers, and the sheer cost of building one ($10-22 per Watt construction cost and $0.80 per Watt-year operating cost) Google is quite concerned about getting maximally efficient use out of its systems. So should we all be if as Barroso predicts the energy cost per server is likely to soon outstrip the initial purchase price (which he joked might lead power companies to implement a cell phone business model: sign up for two years of electricity and they’ll give you the servers for free).

There are numerous problems here. The first and one of the easiest to fix is power supplies. A typical computer power supply is only 50%-70% efficient. By paying a little more up-front for your power supplies you can easily get one that’s 90% efficient. Most of us don’t think about the energy efficiency of a power supply when shopping for PCs, but we should. It’s like buying compact fluorescent light bulbs: a little more expense up front but you more than make the difference up over time.

After that, the problems get harder to fix and require some work from chip vendors and others. The first problem is that all the speculative execution done in modern chips costs power. Chips guess where a program is likely to go and precalculate several different paths, all of which cost energy. However, they then throw away the results of all but one of those paths. That’s energy wasted. The more predictions a chip makes and the deeper it looks down the pipeline the more energy it wastes calculating results it will never use.

Problems also arise because power drain isn’t linear. A server that draws 100W when pegged at 100% CPU may still draw 50W when idling. This may be able to be improved by better system design. Until then, though, it’s important to try to peg all your servers. It’s much more efficient power-wise to have 50 servers running at 100% load than 100 servers running at 50% load or 100 servers running at an average 10% load.

System vendors should be able to improve on this. Currently the power ratio between peak usage and idling is about 2:1. By contrast human bodies can manage a 20-30:1 ratio between peak power usage and idling.

Of course, in reality, not all servers can run at 100% all the time, and you do need to have some room to spare for peak times. Consequently, Barroso noted that if you’re careful you can oversubscribe a datacenter. I don’t think he gave exact numbers, but it looked like about 20% was safe for Google’s usage patterns.

Faults and Disks

Barroso also spent some time discussing disk failures. He’s co-author??? of a much publicized paper noting that hard disks fail a lot more often than their manufacturers say they do. After losing couple of days to a failed LaCie disk recently, I was more interested in this subject than I used to be.

Google hasn’t found any solutions yet, but they have ruled out several possibilities. For one thing, they find no correlation between temperature and disk failure. Cooling a drive does not help it.

For another S.M.A.R.T diagnostics appear to be a waste of time. Only 46% of disks that fail show any S.M.A.R.T errors at all. 54% fail catastrophically with no prior errors at all. Furthermore, of the disks that do report errors, many continue for months with no actual failures. Barroso did not know how many of the disks that did not fail also reported errors. That is there are four groups of disks:

  • Fail, reported errors
  • Fail, did not report errors
  • Did not fail, reported errors
  • Did not fail, reported no errors

He only had numbers for the first three.

The Future

Google is working with a variety of vendors on these problems. I happen to know they’re not the only ones either. Sun is very concerned with server power issues, and I suspect others are as well. The Environmental Protection Agency is now looking into server power requirements as servers become a larger and larger portion of the nation’s power budget.

I don’t know when we’re likely to see results, but I hope it’s sooner rather than later. My electricity bill is already too high, and it;s only likely to go up if I get one of the new octo-core Macs I’ve been drooling over. Gentlemen, start your multimeters.

6 Responses to “Watts, MIPs, and Google”

  1. Marc Adams Says:

    One of the larger issues in powering data centers is keeping the equipment cooled. It costs more to cool a data center than the servers and the electricity to power them. In-Row cooling configurations help pinpoint the supply / demand for cooling which helps but more needs to be done.

  2. Luis Says:

    I wonder if changing from AC Power supplies to DC power supplies on servers would help. We have some telco switches in our data center- I was talking to one of the telco guys, who pointed out that they have 48V DC power going straight into their equipment.

    But, for our computer servers, we have to bring in AC power, turn it into DC power for the battery and UPS, then turn that DC power in AC power for the power supplies in the servers, which in turn turn the power back into DC for the chip.

    I know that there are some 3rd party power supplies that will take 48V DC, but you’d think this type of thing would be standard for data centers.

  3. eas Says:

    Luis, I’m pretty sure that google adopted DC-DC power supplies for servers a couple years back.

    Some of the highest capacity long-haul power transmission lines are DC, but I don’t know if going from hundreds of kV DC to

  4. Ann E. Mouse Says:

    Ah:

    Barroso did not know how many of the disks that did not fail also reported errors. That is there are four groups of disks:

    Fail, reported errors
    Fail, did not report errors
    Did not fail, reported errors

    He only had numbers for the first three.

    That sentence says he didn’t have data for group #3 but then you say the opposite.

  5. Chris Says:

    I wonder whether building servers based on multiple ARM RISC chips would make a difference. Those chips only draw a small wattage (ARM 11 0.6mW/MHz (0.13µm, 1.2V) including cache controllers). Even a multi CPU set up per server would look like it would use less power than CISC based servers.

  6. Samuel Says:

    Stack-architecture CPUs will consume even less power, since they have even fewer transistors than ARM, and do not require a pipeline (since all instructions execute as-is in a single-cycle). There is no need for speculative execution, since all the data you’re immediate computation will need is already on the stack, or is being prefetched via explicit instructions that have already executed.