It's Hard to Keep a Bad Alarm Down
If you’ve ever tried to gets your facility’s alarm management system under control in the past, you might have found that it was a bit like whacking moles: As soon as you get a few knocked down, up pop a few more.
Alarm management is pretty widely recognized as an issue in the process industries, with alarm-related problems costing more than $20 billion a year to U.S. industry. So more and more plants are working to bring their stale, chattering and nuisance alarms under control. Unfortunately, if you try to do that without having a plan or without taking the time needed to do it right, those moles will just keep popping up.
The Husky PG Oil Refinery in Prince George, B.C., faced such a scenario. They made inroads in trying to improve the alarm situations that operators complained about, but never really made the progress that was needed. “Operators complained that there were too many alarms, especially on upset conditions,” notes Yiqun Ying, senior staff process control engineer at the Prince George refinery. “Some alarms don’t mean anything, and the operator doesn’t have time to respond. They just hit the keyboard and acknowledge it, and sometimes hit repress.”
Too many alarms
This situation is all too common among process operators who are inundated with more alarms than they could possibly acknowledge, let alone respond to appropriately. Steve Elliott, Triconex product director for Invensys, tells of a time when he was observing one customer’s control room. He watched an operator on his personal laptop, his back to the control screens. As an alarm came in, he swung one arm back behind him and quickly silenced the annoying alarm, without ever shifting his focus away from his laptop.
“One of the biggest challenges still is the human element in everything we do; the unpredictability of the human,” Elliott says. “We’re probably putting more reliance on the human factor than we ever have done. And that operator has more information than ever to deal with.”
Alarm management has always been a problem, notes Mike Chmilewski, vice president of DCS and safety solutions for Invensys, because it’s so easy to just keep adding more alarms as more situations arise. “The next thing you know, there’s more information coming in that wasn’t part of the original design,” he says.
What’s driving the alarm count? Chmilewski says it’s more and more I/O points; better instrumentation, which is providing cost-effective measures for advanced process control (APC) and other optimization methods; and additional data points “that could be alarmed and often are” because it takes very little effort to add an alarm.
Poorly performing alarm systems often contribute significantly to industrial incidents. BP’s Texas City refinery—where a hydrocarbon vapor cloud explosion killed 15 workers in March 2005—is often referenced as the poster child of poor alarm management. “We believe better operator information would have prevented that incident,” says Eddie Habibi, founder and CEO of PAS, an alarm management focused company. “Alarms and operator displays were two critical elements of the BP Texas City incident. The operator was blind to emerging operator limits.”
Equipment reliability has come a long way over the years, Habibi notes,“but we have overwhelmed the operator with information.” Automation has focused increasingly on process optimization to get that last molecule of gasoline or hydrocarbon out of the crude oil. “To do so, we’ve put in complex systems. We’ve put so much information in front of the operator, the operator has just become blinded. He’s driving blind,” he says.
Control operators are increasingly managing operations by alarm, according to Ian Nimmo, president and founder of User Centered Design Services (UCDS), which designs optimized control rooms, particularly within refining and petrochemical industries. Speaking at a conference recently, Nimmo complained that his team often sees one-third of the controllers on manual control.
Operating too much by alarms, today’s operators are too often reactive rather than proactive, Nimmo says. “If you have reactive operators, we have an issue; I want to see proactive operators,” he adds. Today’s operator—often tasked with more plants to manage, and with limited time and exposure with each unit—“has been brought up to believe that the automation system is doing its job. He sits around until an alarm goes off, and then he does his job. That’s not good.”
Nimmo likens the situation to a pilot flying by alarms—flying up until reaching an upper limit and getting an alarm, and then flying down again until reaching a lower limit. “What would an airplane be like if we flew it by alarms?” he asks. “Wouldn’t be very comfortable, would it?”
Husky’s refinery in Prince George has about 3,500 process/equipment variables in its DCS. Those processes are monitored in an alarm-driven environment, with plant operators using the alarms as the primary indication of an event requiring action or attention.
A good indication of a plant doing things correctly is zero standing alarms, Nimmo says. According to EEMUA best practices, standing alarms should be at fewer than 10. ISA says fewer than five. The average number of standing alarms for the oil and gas industry, however, is 50. Petrochemical plants are seeing double that, on average.
Documenting the problem
So, although the state of Husky’s alarm management wasn’t terrible, it wasn’t ideal either. They had no site-specific alarm standard, incomplete documentation, and no system to track the changes of the alarms. The average number of alarms wasn’t bad, but peak alarms were beginning to get out of hand, and the average number of standing alarms was well outside not only best practices but also industry averages.
Also, although the alarm situation at Husky had been getting better, in Ying’s observation, improvements weren’t being documented. Documentation was difficult, he says, with all projects going through contractors or consultants, and alarms being designed differently each time. “Some [projects] have more alarms than others. They’re all over the place. Sometimes we’d make changes, but all the changes we made, we didn’t document. We kind of lost track of things,” he says.
The staff tried to use a spreadsheet to track alarm changes, but that was difficult to keep updated, Ying says. Plant staff knew they needed a better system—not only a better alarm system, but also a better way to keep track of changes in that system.
The important takeaway for any facility trying to manage its alarms: “Know what you’re trying to accomplish before you start throwing money at it,” says Kim VanCamp, product manager, DeltaV abnormal situation prevention, at Emerson Process Management.
Although companies are approaching alarm management and learning a lot about what to do to get their systems in shape, a lot of clients are not even taking full advantage of the alarm management tools that they purchase, according to Kevin Brown, global best practice lead for Honeywell Process Solutions. “They use a couple reports, but they’re not incorporating the tools into their daily work practices,” he says. “They’re not succeeding to the level they could, or they actually backslide.”
In spring 2012, Husky’s Prince George refinery set out to develop and implement an alarm management system based on EEMUA-191 and ISA-18.2 standards. The project involved setting up an alarm philosophy document to clearly define the site-specific alarm standard; installing alarm management tools and software; documenting the setting, cause, consequence and corrective action of alarms in a master alarm database; improving plant alarm systems; and establishing an alarm management/rationalization lifecycle.
Since 2009, the ISA-18.2 standard (ANSI/ISA-18.2-2009: Management of Alarm Systems for the Process Industries) has been instrumental in helping plants understand what they are trying to accomplish and how they are going to measure it, VanCamp says. “Customers felt like they hadn’t received good value on whatever they did around alarms. Now they’re regrouping around the ISA standard,” he says, noting that people didn’t know what “good” was. “They have a better starting point now. They have a model.”
When chemical behemoth DuPont set out to overhaul its alarm management system, the company opted to follow the ISA-18.2 standard rather than write its own internal standards, according to Nick Sands, manufacturing technology fellow at DuPont. “That’s a big change for us,” he says. “We’re all about trying to make those layers better but, wow, what a culture change.”
>> DuPont’s Alarm Management Plan: Click here for more information.
The ISA-18.2 standard provides clear definitions of the common terminology and helps to create a universal alarm management language. It also defines an alarm management lifecycle model, which establishes the recommended workflow processes. The lifecycle sets the framework for understanding the requirements for building an alarm management program.
Conversations around lifecycles involve, for example, what should and shouldn’t be an alarm. “Is there an operator action associated with the alarm? No. Then there shouldn’t be an alarm,” VanCamp explains. “Will there be a consequence if the operator ignores it? No. Then there shouldn’t be an alarm.”
Discussions also involve how to prioritize the alarms. “Prioritization used to be based on the strongest-willed person in the room who had a vote on the matter,” VanCamp says. But considerations should actually be focused on health and safety, economic ramifications and environmental concerns, he adds. “Is there a consequence to inaction? Is there an operator action to take? If he has less than 60 seconds to respond, it shouldn’t be up to the operator; it should be designed into the safety system.”
One of the most important steps described by the ISA-18.2 standard is alarm rationalization. Although getting rid of the bad actors—often the first step in alarm management—will quiet the control room, it’s the process of reviewing and documenting alarms that will help keep it that way when something legitimately bad happens with the system.
At a plant that hasn’t previously done much with alarm management, Honeywell’s Brown says he expects to be able to achieve a 75-80 percent reduction in bad actors pretty easily. This can quickly bring average alarm rates to ISA standards, which is one alarm every 10 minutes.
Then facilities want to move on to taking care of the alarm floods and they don’t think they need to worry about alarm rationalization. “What happens with the bad actors, though, is once they get it cleaned up and down to normal rates, they stop focusing on the bad actors,” Brown says. “But it’s a dynamic system—instrumentation is failing, they’re changing the way they’re operating, and we’re going to continuously have bad actors showing up. It needs to be incorporated into a regular work routine.”
Even for companies who undertake rationalization, Brown says, sometimes they’ll do it on their own and do it poorly. “When you look at the effort and cost that’s involved with rationalization, that’s a huge investment that’s been lost,” he says. “You really don’t want to do it twice.”
>> Who Is On the Alarm Management Team? Click here to read who makes up modern alarm management teams.
Husky is going through the alarm rationalization process in several of its units. “We go through the alarm rationalization, and document all the alarm settings,” Ying says. “When we go through the alarm rationalization, we involve operators and process engineers. We talk about what’s the consequence if we do not action for the alarm. What’s the correct action?”
Documentation of everything is key, Ying points out. “Once we go through that, all the changes are documented—whether we need an alarm or not, what kind of priority we assign—everything is in the software, and it keeps updating all the time.”
VanCamp adds, “A lot of folks shy away from rationalization” because it’s very time-consuming looking at every single thing that could possibly happen. “Every tag could generate a lot of alarms.”
Make no mistake—getting alarms in shape is hard work, and it’s time-consuming too, requiring diligence and perseverance. “Rationalization is costly,” Brown agrees. “There’s only one way to do it. You can try to reduce the amount of work, but there’s still the effort required to analyze every alarm.”
As part of this process, Ying brings his team together once a week to go over all the alarms for the week, area by area. “You’ll find very interesting things that way,” he says. “You can see the difference from the process engineer or the operator. It involves a lot of discussion. The process engineer learns from the operations side. And the operators learn from the process side. It’s a very good conversation.”
Ying says he takes things slowly, making sure to discuss everything necessary. The whole process is not an easy one, he says, requiring considerable time from several people, including bringing operators in on overtime to meet. And it’s an ongoing process that must continue not only to achieve continued improvement, but also to keep the situation from slipping back.
The time needed to really get alarms under control and keep them under control is considerable, to be sure. Simply documenting everything that’s in the database is time-consuming, Ying notes. But it’s better than doing the work on the alarms and not documenting it, because after a while the work that’s been done will start to slip.
Alarm management has changed considerably in recent years, according to VanCamp. “If you go back five years, in the system bid specs, the section on alarms would just be a couple sentences,” he says. “Now when we get bid specs for a big project, we will find a separate companion document that’s about 50 pages on how they want the alarms to behave.”
Like the example at the beginning of this article about the operator who silenced the alarm without even looking, industry is rife with other such examples—control rooms in which the horn’s going off, and yet the operator’s hands aren’t doing anything; or operators sticking a penny in an alarm panel so the alarms are constantly acknowledged. An operator’s sensitivity becomes so deadened that when a legitimate alarm does go off—to say, for example, that the parking lot is about to be flooded with acid—he doesn’t do anything about it.
There’s still a long way to go and new people to teach. “We don’t care so much about reducing numbers,” notes DuPont’s Sands. “We want to have the right alarms.”
At many facilities alarms are still an issue, and still need to be brought under control. “I would like to say with all the learnings that we’ve gone through, with all the things we’ve seen like with BP Texas City…unfortunately, people come up who haven’t been active with alarms,” Honeywell’s Brown says. “They don’t see the issue. It’s just one alarm."