Finding Big Data Value in the Fog

While enterprise analytics are happening in the cloud, manufacturers should be looking to distributed analytics—or fog computing—for real-time intelligence processed at the edge devices in the plant.

Finding Big Data Value in the Fog
Finding Big Data Value in the Fog

As I reported the June Big Data cover story based on an Automation World reader survey, two things became clear to me: First, the value is not in the collection of all sorts of data, but in the distribution of the right data. Second, manufacturers need to consider the aging infrastructure upon which much of this information lives. Right now, there’s likely not enough network bandwidth to support an influx of the Industrial Internet of Things (IIoT), nor is there a standard way to share all of this disparate data.

While there’s a lot of talk about putting context around the data collected from smart sensors and devices, there’s not a clear vision of how to make that happen. There’s truly a big disconnect between what’s happening on a fieldbus network and what’s being collected in the enterprise as well as the cloud.

With that said, there are efforts underway to build a bridge between the plant floor network and the Internet. Just last month, the OPC Foundation and the EtherCAT Technology Group announced an agreement to develop open interfaces between their technologies. EtherCAT supports real-time communications for machine and factory floor control systems, and the OPC Unified Architecture (OPC UA) provides scalable communication and integrated security up to manufacturing executions systems (MES), enterprise resource planning (ERP) systems, and the cloud.

This is a step in the right direction toward continuous communication of the IIoT across multiple network layers. But do companies want to bring a lot of machine-level information from the plant floor into the enterprise to do analytics in a central database? Let’s face it, thousands of tiny little sensors can generate a lot of Big Data, which can tax the network and place real-time processes at risk.

Alternatively, manufacturers have the option to adopt edge computing technology in which analytics happen on—or near—the device. The concept of edge computing, which is also often referred to as distributed analytics or fog computing, provides compute, storage and networking between end devices, enabling real-time analysis as data is being loaded at the source. Data is localized and context-aware, and can be filtered to deliver only relevant data back to a centralized database or the cloud for Big Data analysis. There are a number of vendors offering IIoT analytics, AGT International being one of them.

AGT’s IoT Analytics Platform is designed to predict and manage potential threats to critical assets through a hierarchical command and control system. The first layer of the system offers analytics applied directly to raw IoT sensor data. Additional processing, such as feature extraction and video analytics, are applied to deliver context and value to the end user. The next layer applies more sophisticated analytics such as pattern recognition, event classification, anomaly detection and object tracking. The third layer of analytics includes complex operational capabilities, such as model-based and data-driven predictive analytics.

Of course, all of this must still have an infrastructure associated with it. And that’s where Apache Hadoop, an open source distributed processing framework, comes in. AGT uses Apache technology as part of its infrastructure as it enables local computational power across clusters of low-cost commodity servers. And while Hadoop has to date only been able to support offline batch processing of large data sets, a new product called Apache Storm adds real-time data processing. Now, a Hadoop cluster can process a range of workloads from real time to interactive to batch.

This is an important development for manufacturers because it is a cost-effective way to scale the computing infrastructure. Yet only 7.4 percent of the Big Data survey respondents say they are currently using Hadoop open source technology. I suspect, however, we’ll be hearing more about Hadoop, the fog, and distributed analytics in the near future.

More in IIoT