The Uptime Imperative

In these settings, even a few minutes of unplanned downtime can inflict serious consequences, which only grow worse with the length of the outage. Downtime’s harmful effects—and aftereffects—range from hazardous operations to regulatory reporting lapses to millions of dollars in lost business.

A study by Meta Group, Stamford, Conn., reported that manufacturing industry respondents estimate the potential revenue loss for each hour of computer downtime at more than $1.6 million. According to the report, “IT Performance Engineering & Measurement Strategies: Quantifying Performance Loss,” published in October 2000, the potential for lost revenue per employee is more than $130 per hour.

Because so much is at stake, manufacturing companies that depend on real-time operations and vital IT systems must take steps to aggressively protect these systems and applications from unplanned outages. For many companies, computer servers that deliver high levels of uptime are important not only for application availability, but also for maintaining business continuity strategies.

Of course, not all servers are the same, just as no manufacturing plant is like another. The correct selection of a server can depend on many variables: Can you tolerate some downtime for certain applications? Is the IT staff readily available to fix problems? Is planned downtime for system maintenance or upgrades an option? What other functions will be affected if the application is unavailable? How much flexibility is needed to adjust to variable workloads? When it comes to hardware recommendations, many software solution providers historically have been agnostic. That posture, however, is fast becoming a liability in the real-time manufacturing environment. Solution providers should understand the significant difference between server technologies, know how their software will perform on different platforms, and be able to act as a trusted adviser to the manufacturer. As more and more applications move to open standards, users simply have more platform choices available.

One of the most recent server developments is industry-standard fault-tolerant technology, which didn’t exist as recently as two years ago on an Intel platform. Traditionally proprietary and expensive, fault-tolerant servers today are open and affordable. Other more traditional methods that have been used to satisfy increased availability requirements include standalone servers, standby servers, and high-availability server clusters. Each has its place, as the table illustrates.

The evaluation process boils down to a couple of basics: the level of expected availability and the complexity of deployment and ongoing operation. Price is always a factor, but a user should look beyond the sticker. Evaluate the potential revenue and productivity lost from just one hour of downtime and you’ll almost always find that it exceeds the price of many top-end systems. Some solutions consume more IT staff resources than others—another cost. Clusters require multiple operating system and application licenses, while others typically do not. When there is a failure—and assume there will be—do the service model and response times align with your production requirements?

The server landscape has changed a great deal over the past few years. Furthermore, the best solution is not necessarily the most expensive, even on the basis of purchase price alone. It may be time to re-evaluate your assumptions about server technology and to challenge your software providers to do the same.

Jindrich Liska, [email protected], is industry manager for manufacturing at Stratus Computer.