mission-critical services

We talk a lot around here about unanswered questions. Quite often , those questions involve acronyms like PUE (Power Usage Efficiency) , COD (Cost of Downtime), DCIE (Data Center Infrastructure Efficiency) and DCIM (Data Center Information Management). I am not going to spend a lot of time here explaining what all these acronyms really mean and how to calculate them , everybody has Google and there are a million articles on the web that explain all this much better than I can. What I want to talk about is the reason why you need to know the answers as they pertain to these metrics , the reasons why far too many businesses don't have these answers and some ideas about how to go about getting these crucial answers.

Let's just run these down by the numbers regarding their importance:

PUE (Power Usage Efficiency) - This is an important metric because it can uncover some key issues with power delivery equipment, UPSs and PDUs primarily. Why should we care? Take the example of a data center using the latest blade server technology running state of the art virtualization and....powered by a 20 year old UPS! Don't laugh ....this happens all the time. PUEs of 3 (30%) efficiency are common place throughout the industry. Now we know that older UPSs and PDUs use less efficient technology but until the PUE is measured there is no way to quantitatively state that case. Without a measurement there is no ROI for equipment replacement and no way to benchmark efficiency gains (or losses).In the case of the 3 PUE , to deliver 50 watts to the equipment, it takes 150 watts of total data center facility input. This kind of information can be critical in expansion projects where capacity is already strained to the breaking point and you need to shoehorn in one more rack.

DCIE (Data Center Infrastructure Efficiency) is the reciprocal of PUE. DCIE is where we see things like the efficiency of cooling equipment. Essentially PUE and DCIE look at the same things and just express them a little differently.

COD (Cost Of Downtime) This is not a new idea but we need to think about it a little differently. Downtime is a cost just like electricity and salaries. We know there is going to be some and we know that there is a cost associated with it. Let's put a number on it and use it to our advantage. In the case of the 20 year old UPS, we could not make a case for replacement based merely on age and energy efficiency but now we are seeing some downtime associated with UPS problems. Armed with the downtime cost associated with the UPS plus some projected energy efficiencies that could be derived from a new UPS, the conversation about replacing the UPS changes considerably.

Not only does the information derived from these metrics help with the budgeting and cost conversations but it helps our colleagues and our management understand IT better and may help to change the way they think about IT and its impact on business in general. This is especially important if your core business isn't IT related.

So why doesn't everyone routinely do these calculations? For one thing, with the exception of COD, the data may be relatively difficult to capture in some situations. If you have a separate power meter for your data center, you are 50% there but typically in mixed use facilities this may not be in place. Additionally, spot checking some of these conditions is much less effective than looking at data collected over a period of time. Ideally sensors designed to measure conditions and are tied in to a software capture of some kind is better. There are some very good and very flexible RFID systems out that can capture and collect just about any kind of data and do it pretty inexpensively. One of the biggest reasons that more people aren't capturing efficiency data is they didn't know to do it. The first white paper on Data Center Metrics was published in 2007 and frankly it resided mainly with engineers and managers of larger data centers for a long time. So do data center metrics make sense for the small to medium operation? I think so for two reasons: 1) small to medium centers have the same kind of issues that big ones do and 2) in the current climate of eco friendliness, we will all be trying to justify our carbon footprint and the company data center will make an easy target if you don't have your ducks in a row.

So how do you capture this information? I mentioned RFID and that is probably the most flexible and cost effective way. The real key is to capture the data which brings us to DCIM (Data Center Information Management). Temperature and humidity , power consumption , inlet and outlet temperatures at the server level are all pretty easy to do and yet I don't see that information being captured very often. The monitoring software that you may use now can be adapted pretty easily to these tasks. I urge you to start capturing the information that is already available and start planning to access more.

As power and cooling demands grow and as data even in SMB companies becomes more critical, the ability to make intelligent informed decisions about your infrastructure will becomemore and more important. Add to this the current trend towards companies having to justify their carbon footprint to governmental and quasi-governmental entities, and you can see how more information regarding your infrastructure will become a must have.