Taking Abnormal Situation Management to the Outer Limits

As the plant floor becomes more connected, so do abnormal situations—which can originate from anywhere in the enterprise. To help operators navigate this new universe of anomalies, suppliers are turning to artificial intelligence and machine learning technologies that enable proactive situational awareness vs. reactive alarm management.

Our early ancestors looked to the sky searching for changes between the sun, moon and stars to predict weather patterns, natural disasters and even war. Over time, mankind’s knowledge and tools evolved so that we didn’t depend so much on human interpretation of celestial bodies. But the world is a complex place. And even with today’s sophisticated technology, we are always exploring new ways to forecast the future.

In process industries like oil and gas and chemicals, the ability to identify shifts in patterns that could eventually cause problems is critical. That’s because a seemingly simple issue, like a pump malfunction, could result in a not so natural disaster. From oil spills to gas leaks and fires, manufacturing incidents are still fairly frequent. And when death, injury and environmental destruction are involved, company executives face fines and criminal charges.

Just last month, Shell’s Puget Sound refinery was fined $77,000 by the Washington State Department of Labor & Industries for an uncontrolled release of toxins that sent people to the hospital. In Canada, criminal charges were filed against Veolia Environmental Services and one of its managers for a deadly plant explosion last year in Ontario.

Plant engineers and operators have a difficult job these days given the arrival of the Industrial Internet of Things (IIoT) and the inevitable interconnectivity between information technology (IT) and operations technology (OT) networks. Industrial anomalies are no longer isolated incidents in a closed plant floor because connected business systems and network devices become extensions of the control system infrastructure. That means a strange situation in the oil refinery could originate from an Ethernet switch or a cybersecurity breach somewhere in the enterprise.

As a result, when it comes to abnormal situation management (ASM) in the plant, traditional distributed control system (DCS) alarms aren’t enough anymore. Manufacturers are applying boundary management techniques that test and manage the process limits. To work effectively, however, a variety of structured and unstructured data sources need to be aggregated in a common database. Only then can predictive analytics and root cause analysis be applied.

“When you look at alarm management, it is still in a reactive phase,” says Tyron Vardy, product director of Honeywell Process Solutions’ Advanced Solutions business. The industry needs to get away from making an operator sit at a console and wait for a problem, he says. Instead, we should be finding the root cause and fixing it before it becomes a problem. “It’s more than alarm management. It’s putting a lot of things together in a holistic view to reduce operator error by being able to predict abnormal situations. We need to know when something is out of control before it is out of control.”

Interestingly, as Honeywell and other control system suppliers create better ways to deliver alarms to operators, a new group of software companies are emerging in this space. These vendors—many of which are startups—are asking the industry to boldly go where no manufacturer has gone before by using artificial intelligence (AI) and machine learning to keep everything in control.

Cognitive systems in the industrial space

Ever since IBM’s Watson appeared on Jeopardy—and won—the technology industry has been working on using cognitive computing to solve real business problems. To date, the healthcare, pharmaceutical and insurance industries have been benefiting from Watson-powered applications, using natural language queries to make connections between a vast array of disparate data in order to formulate accurate conclusions.

Watson was not the first AI offering. The concept emerged in the 1960s when computer scientists tried to create intelligent systems using massively parallel processing supercomputers. Of course, that was an expensive proposition given the cost of a computing infrastructure back then. Today—thanks to lower CPU costs, along with cloud and virtualization technologies—processing power and storage are no longer obstacles. And that has opened the door to new AI opportunities.

For example, AppOrchid, a 2-year-old startup company, has built a semantic web for manufacturing processes. The company uses predictive analytics, AI, machine learning and a natural language interface to correlate business and industrial data with trapped tribal knowledge. The data models are conversational and bidirectional, allowing the system to train a machine in much the same way we train a child.

In the traditional world of data analytics, figuring out if an ethane cracker is going to fail requires a blend of data models, along with engineers, data scientists and IT experts. It is not a nimble model. In the semantic approach, the data models live in a flexible architecture and then the system is trained—in plain English—by building definitions around an underlying concept.

“Like Siri,” says AppOrchid CEO Krishna Kumar. “If I want an alarm set for 5 a.m., I just say it and it understands.”

The beauty of the model is that you don’t need a Java programmer to teach the system a new concept because interactions happen in natural language. And the system reorganizes based on the feedback. An operator can type in, “How many boilers are functioning under capacity?”; or could annotate data on top of a graphic, identifying a boiler as “high risk.” The AI system will know that “high risk” means “bad,” because that’s what it has been taught.

“We see a fantastic opportunity for artificial intelligence to combine the Internet of Things with the Internet of People and the Internet of Process,” Kumar says. “Models get built as you ask questions.”

Indeed, it goes way beyond pattern recognition to identify anomalies, and incorporates the unstructured data of maintenance manuals, warranties or boiler ratings based on a person’s experience, for example. “It’s taking traditional analytics and blending it with tribal knowledge,” Kumar says.

The ability to capture intellectual property is important as baby boomers retire, says Tom Williams, director of the ASM Consortium and program manager for operator effectiveness at Honeywell Process Solutions. Many companies are turning to 3D modeling to train new operators, but he agrees that there is an opportunity for applications that can aggregate and analyze data in cooperation with alarm system management.

“The influence of Big Data is there,” Williams says. “We are beginning to mine our vast repository of DCS [data] in ways we couldn’t do before and produce visualization tools to figure out what to do with those alarms.”

Ultimately, operators need fewer alarms, says Jim Miller, Rockwell Automation’s Pavilion business director, who is referring to Pavilion’s advanced process control with predictive capabilities. “For continuous process, it is not about alarm management, but about getting in front of an abnormal situation,” he says. Like a Google map alerting you to a traffic jam ahead, “if constraints can predict what will happen, you can change the process before you get there.”

Open the plant floor doors

Non-traditional operations software companies are increasingly entering the industrial scene. Companies like Splunk, OpsDataStore and NexDefense promise to bring anomaly detection to a new level.

For Atlanta-based startup OpsData-Store, it’s all about relationship building. The product is a Big Data back end designed to accept any kind of data or metrics and relate items to each other at ingest time to continuously update a topology map of the entire environment. Meanwhile, business intelligence tools with dashboards provide real-time visibility across operations.

“The purpose is to use these relationships and the analytics built into products to tell you where things are going wrong,” says OpsDataStore CEO Bernd Harzog. Specifically, if instrumentation on the factory floor is failing, the first thing an operator wants to do is figure out what could be causing it. “Showing anomalies in things deterministically related to each other is an extraordinarily valuable thing that allows you to manage entire relationships proactively.”

Doug Wylie, vice president of product marketing and strategy at NexDefense, agrees. However, his company takes a more network-centric view of situational awareness. The NexDefense Sophia Industrial Network Anomaly Detection (INAD) system proactively detects deviations from normal automation or system controls operations, which could signal a security breach.

With a focus on industrial control systems, Sophia detects issues by comparing events against what is the “normal” defined state—or what Wylie calls the “fingerprint” of the system. Anything new or different is brought to the surface for an operator to isolate, while logs guide the operator to the information. It happens in real time and has the capability to replay traffic patterns to use for post-event forensics.

“This is a new category for the industry, but most control engineers just want help understanding what is taking place,” Wylie says. “The detect function is imperative, and over time, as comfort is built, they’ll begin to recognize how to feed information as a trigger to protect the system.”

Then there’s Splunk, an operational intelligence technology that can capture massive amounts of machine data and unstructured streaming data, using advanced analytics to detect system anomalies. It is used for operational analytics, cybersecurity, and understanding how SCADA and ICS operations are impacting revenue, notes Brian Gilmore, Splunk’s senior manager of IoT and industrial data.

As IoT builds out the ICS infrastructure, having a central “data fabric” to monitor many different data sources will be a manufacturing imperative. “Splunk distills a lot of data down to critical metrics to get to the underlying cause of a problem,” Gilmore says.

Learning from locomotives

Not only does Splunk work, but it has opened the door to new business opportunities, says Greg Hrebek, director of engineering at Train Dynamic Systems, a division of New York Air Brake (NYAB).

Following a 2008 train collision in California that killed 25 people—and was the result of a distracted engineer who was texting on the job—Congress passed the Rail Safety Improvement Act (RSIA) requiring passenger and freight railroads to implement Positive Train Control (PTC) systems that monitor and control train movements to increase safety. Much of the system has to do with the signaling infrastructure and wireless data radios to transmit dynamic information, but the trains, too, have to become smarter.

To support its customer above and beyond train control products, NYAB developed the Locomotive Engineer Assist/Display & Event Recorder (LEADER) software. The system contains data on the train’s length and weight, car types, power distribution and detailed track profile, allowing it to react to signal changes or weather conditions in real time to optimize performance and train handling.

LEADER performs onboard simulations that predict train performance several miles ahead, then evaluates multiple train operating strategies and selects the one that is the best way to increase fuel economy, railcar life and on-time schedule performance. This is accomplished with machine learning technology and Splunk.

Prior to using Splunk for backend analytics, NYAB was using Excel spreadsheets to prepare monthly reports for customers to relay system performance and resource utilization. The original goal of LEADER was to reduce fuel consumption. But given the PTC requirements, they are expanding what they can do to increase safety.

“The strength of Splunk is that it builds from different sources,” Hrebek says, including environmental factors. “Wind data is important to trains because an empty car can act like a giant parachute. So wind has a big impact on the way the train behaves.” Using Splunk, there is an opportunity to forecast and model driving conditions to plan a strategy, he adds.

Together with Splunk, NYAB developed real-time dashboards to provide information on train performance, fuel efficiency and in-train force events over a route. This has become a valuable service to customers and an additional source of revenue for NYAB.

Now, since NYAB is first and foremost a manufacturer of brake systems, the company is considering integrating Splunk with the factory floor.

Outer limits

As NYAB considers how it can add Splunk to its production process, vendors like Honeywell and PAS are evolving existing products to help manufacturers tackle abnormal situation management issues more effectively.

Honeywell’s DynAMo Alarm and Operations Management software monitors process conditions and provides automated early notification of events to operators.

In October, Honeywell released its new DynAMo Alarm and Operations Management software, which includes boundary and limit management, providing an operator with a visual of how the plant is operating against associated limits. It’s a pre-alarm system of sorts, with a predictive view of operations. “Now you can see things slowly moving out of spec and you can pull it back into control,” Vardy says.

Similarly, PAS has boundary management as part of its PlantState Suite, and, like Honeywell, has extended it beyond DCS data to read information from a variety of places and databases—from alarm and production limits to equipment design limits, as well as safety and environmental limits. A consolidated boundary database acts as the database of record and sets up hierarchy for managing variable boundaries.

“The goal is to improve situational awareness to make the plant run safely,” says Mark Carrigan, senior vice president of global operations at PAS.

Both Honeywell and PAS recognize that operators are not the only ones who need the insight into plant floor abnormalities. Managers and manufacturing executives, too, need to protect the plant—and themselves.

PAS’s Independent Protection Layers (IPL) dashboard, released this month, is a management tool providing a consolidated view of the safety status of critical devices.

Honeywell Pulse is a mobile app that sends alerts on plant performance in real time. Users define alert conditions they want to subscribe to, which can be sent from multiple data sources and visualized for better context.

“It has a Facebook-like collaboration feature,” says Rohit Robinson, director of portfolio innovation at Honeywell.As soon as a user gets an alert, it can be shared with subject matter experts to start commenting and collaborating.

Or a user can click the “own” feature to let everyone know someone is taking ownership of the issue. “Think about what this does for the customer,” Robinson says. “It’s one thing to raise situational awareness, but when managers see that someone has taken ownership, the rest of us can rest easy.”

More in Control