Machine learning may sound like a technology we will use at some point in the future. But it's a technology that many if not most of us interact with on a daily basis. If you’ve ever asked Siri to “call home” or seen name tagging suggestions appear while posting photos to Facebook or Instagram, you’ve encountered machine learning. In the Siri example, that kind of machine learning is known as natural language processing, while tagging suggestions are an example of image recognition. Other mainstream examples of machine learning include purchase recommendations on Amazon and search optimization on Google.
While these examples of machine learning use neural networks, decision trees and other technologies, industrial operations need more specialized tools. The reason for this is that industrial machine learning must leverage the data generated by machines and production systems, which means that it has to be able to deal with time series data.
“Nearly all operational systems—such as manufacturing lines, industrial equipment, data centers and handheld devices—produce streams or bursts of time series data in the form of sensor readings, log entries or activity traces,” says Nikunj Mehta, Ph.D., CEO and founder of Falkonry, a supplier of machine learning systems and predictive analytics technologies.
But using this data for machine learning is not straightforward. “Rarely does the behavior of a single signal tell the story of what is going on in a complex system,” says Mehta. “Most often, the data represents sampled values from a set of signals that vary in value continuously over time. In addition, the signal data may be sampled at both variable and irregular rates, continuously or in unpredictable bursts. They may also be rich in high frequency content.”
Mehta says that, when using time series data for machine learning, the only way to determine the future state of a system for throughput improvement, downtime reduction, operator safety or product quality is to examine multiple signals over a period of time. The challenge to doing this with time series data, however, is that “temporal patterns in data appear over windows in time—not in a single snapshot,” he adds.
One of the issues created by using time series data in common machine learning methods is that standard analytics middleware does not provide feature extraction capabilities from time series data. Reasons for this include the fact that the signal data may need to be buffered to feed a window of prior history into the model; or synchronization may need to be achieved in processing of multiple signals, each of which may have different sampling rates; and or there may be gaps in signal data.
This reality requires the classification of time series data into states or conditions. “These conditions are episodic,” says Mehta. “They start at some time and end at some later time. In real applications, there is almost never a well-defined training set where all conditions are accurately known for all time within a range. Often, we may only know about some periods of normal conditions and a very small number of approximate problem condition periods. For example, downtime events in maintenance log.”
As a result, time series machine learning needs to support unsupervised and semi-supervised models, and it needs to be able to build recognition from small numbers of example conditions, says Mehta.
“The traditional machine learning project approach, where data scientists construct a custom model over a period of months and then hand it over to the industrial subject matter experts [SMEs], is not practical and scalable for most operational problems,” he says. “The optimal solution is when the machine learning system can be used by industrial SMEs themselves without much effort. The SMEs can feed into the machine learning system existing multivariate time series data generated by their operations and, based on the predictive insights and alerts provided by the system, determine the corrective action to be taken.”