Heading off breakdowns

By deploying predictive maintenance on equipment ranging from power supplies to chiller/compressors, Intel Corp. is saving millions through avoidance of costly downtime.

Wes Iversen

Nov. 1, 2003

8 min read

At Intel Corp., the giant, Santa Clara, Calif.-based semiconductor manufacturer, one major measure of success is factory uptime. A few hours of downtime on just one chip fabrication line can result in millions of dollars in lost production.

And it is not just the high-technology wafer processing machines that are key. A failure of any piece of facility support equipment, such as a power supply, a heating, ventilation and air conditioning (HVAC) system, or a water or chemical treatment system, can likewise bring production to a rapid—and costly—standstill.

“The reality is that replacing a fan or pump motor is a fraction of the cost of having a fabrication line down for any amount of time,” says Mick Flanigan, a predictive maintenance program manager and project leader at Intel’s Northwest Regional Operations facility, in Hillsboro, Ore. “If production is down for even one or two hours, the lost revenue would far exceed the cost of a replacement motor, or any other ancillary component.”

Historically, Intel’s facility maintenance department has minimized downtime by using a preventive strategy that incorporates redundant machines to protect critical systems. For example, on its water and chemical treatment systems, Intel may employ three pumps on a single isolation base. The first operates the equipment, the second waits in hot standby mode and the third serves as a backup to the second. If the primary pump fails, the second will go online immediately. Although functional and fairly reliable, this uncontrolled switchover approach is not ideal, because even a minor disruption or hiccup in the switchover can cause a slight pressure or temperature fluctuation, resulting in lost production and product waste.

Since 1998, however, Intel has progressively moved toward a facility equipment maintenance approach based more heavily on a “predictive” strategy. The approach, which Flanigan played a major role in developing, is designed to maximize uptime by enabling maintenance technicians to identify and correct potential problems before equipment fails or production is interrupted.

As part of the predictive maintenance program, technicians today make broad use of infrared scanning, vibration, temperature and oil-composition analysis tools to monitor machine conditions and gather information necessary to identify when a controlled switchover is warranted. The company credits the program with savings measured in the millions of dollars. But it wasn’t always that way.

“When I started with Intel in 1997, we had some scattered predictive maintenance programs around the company, but everybody was using different software and hardware, and none of the systems could talk to each other,” Flanigan recalls. Further, most of the preventive maintenance work was focused on HVAC equipment such as chillers, he adds, with little attention paid to generators, fans, pumps and similar equipment.

Shortly after his arrival, Flanigan volunteered to take over preventive maintenance development at Hillsboro. And he soon made a useful discovery. An engineer who had preceded him in the role, and who had since left the company, had persuaded management to purchase some handheld devices for use in collecting facility equipment data. The devices, which were Entek Datapac handheld data collectors from Milwaukee-based Rockwell Automation, “had never really been used,” says Flanigan. “They were just sitting there.”

Let’s use ‘em

As Flanigan moved to improve and develop preventive maintenance strategies at the Hillsboro site, he quickly put the handheld units to work collecting vibration data from a variety of equipment types. The handheld units work by attaching to sensors on the equipment, which record vibration data directly into the handheld devices. “I set it up for one building at first, and as we found out it worked well, we started propagating it out to other buildings on the site,” he says.

Today, technicians on the four-building Hillsboro campus routinely use the Datapac units to collect equipment vibration data on a monthly or weekly basis, depending on the type of equipment. With a total of 108 routes, each technician is responsible for two or three routes that average three hours each. During the first quarter of 2003, the Hillsboro facility logged 450 data collection hours.

Prior to the program, technicians recorded data on clipboards, and then manually input it into a system. Now, once the information is gathered, it is automatically downloaded from the handheld data collectors into Emonitor Enshare Enterprise Asset Health software, from Rockwell Software. “For things like oil and water, where we send our samples off to a lab, the data comes back in a text file format, which we just manually import into Enshare,” Flanigan says. Likewise, data on parameters such as temperature and pressure, which are collected separately by technicians using Pocket PCs, can also be downloaded into Enshare. The software analyzes all of the data, measures it against preset parameters, and provides advance warning of equipment abnormalities and potential points of failure.

When your job is to prevent certain events from happening, going unnoticed is usually a good thing. But for Flanigan and his Hillsboro team, this “invisibility” also presented a unique challenge: justifying and demonstrating the value of their predictive maintenance program. In an effort to develop a solid business case for the program, the team started by using real application examples and carefully documenting uptime performance results.

“To a large degree, we are dealing with an intangible when we’re talking about the potential of downtime events and the value of loss avoidance,” Flanigan says. “In the end, after developing a justification using two separate methods, we were able to develop both hard, tangible results, along with significant soft cost-avoidance projections.”

Flanigan’s team used the cost projections and performance data to convince other Intel production sites of the value of a predictive maintenance program. But selling the idea to some other facilities was challenging. The concept garnered mixed reviews. “Old school” technicians at some facilities felt predictive maintenance would be a waste of time, while others immediately saw the potential benefits, and in fact, already had their own predictive maintenance programs underway.

To convince the skeptics, Flanigan used metrics from the Hillsboro facility to validate the program’s results. At the same time, to build support among potential supporters, his team developed customized training courses based on the program’s vibration-analysis technology and capabilities. The strategy worked, and Flanigan and his team ultimately succeeded in persuading other sites to join the program.

“With the work we did in Hillsboro, people across the maintenance organization were able to see the real cost savings, as well as the long-term strategic benefits,” Flanigan says. “Once our technicians used the tools and saw that they really worked, they were able to dramatically cut the time they spent in the field chasing problems. It let them focus their energies on more pressing issues.”

Driving the standard

As the predictive maintenance program spread throughout Intel, the team began focusing on the fact that many of the sites were using hardware and software from a variety of vendors. Flanigan co-chaired an internal group called the Vibration Analysis Working Group, which included representatives from most of the Intel sites. The group took on the challenge of finding a way to share predictive maintenance information across different sites. “Around late 1999 or early 2000, we decided that we had to standardize our software packages across the company,” says Flanigan. “At the time, I think we had software and hardware from six different vendors being used at various sites.”

After the Working Group evaluated the various vendors’ products, says Flanigan, “we came to the conclusion that the Rockwell stuff was best for us.” Rockwell’s Emonitor Enshare software, which relied on a SQL database, was the only product that was compliant with the platform standards of Intel’s Information Technology (IT) department, Flanigan says. “A lot of the other vendors’ software had proprietary databases,” he notes, “and IT would not allow proprietary databases to be supported cross-site.”

The decision favoring Rockwell met with resistance at some Intel sites, which continued to use other vendors’ equipment and software for a time, Flanigan concedes. But with time, these sites were converted, and the program continues to spread within the company. Fourteen of Intel’s 19 global production sites are either already participating or plan to enroll in the program, which uses the Enshare system as a central repository for the information gathered at the sites. The system’s database capability allows facility engineers and technicians to access any site’s database to examine equipment lifecycle trends and determine if there are common failures occurring on specific types or models of equipment.

At the Hillsboro site, where the program got its start, approximately 4,000 pieces of equipment—94 percent of the facility’s qualified equipment—are now involved in the program. Since implementation, Intel has found countless minor vibration issues and identified several hundred major vibration problems, helping the company avoid prolonged production shutdowns. More specifically, Intel Oregon has realized a five-to-one return on investment, and the program helped the company avoid estimated lost-production costs of more than $1.4 million in 2002 alone. In another telling example of the program’s benefits, Intel Oregon has not had a catastrophic equipment failure since early 2002.

Additionally, the technology has allowed Intel to evolve into a more predictive-based maintenance organization. Instead of reacting to failures, Intel can make informed decisions based on performance data. It can now provide “real-need” maintenance instead of calendar-based, “whether-it needs-it-or-not” maintenance.

See sidebar to this article: Ensuring the spec before sending the check