Fast Data: Would You Like to Supersize That?

The magnitude of Big Data has grown to the point that we have to keep coming up with new ways to describe it. Objectivity has been helping a geoscience company use Fast Data, Hard Data, Soft Data (i.e. All Data) to gain important insights for their oil and gas exploration customers.

The image of an iceberg is often used to talk about the proportion of data gathered that manufacturers are actually able to do something with—the relatively tiny portion that we can see poking out of the water is nothing compared with the behemoth mass that looms below the surface. Likewise, only about 15 percent of an enterprise’s data is actually used to gain insights, according to Noel Yuhanna, principal analyst at Forrester Research. “But what if you could flip the iceberg and uncover new insights?” he asks.

Objectivity is a company concentrating on the information fusion aspects of the Industrial Internet of Things (IIoT), working to integrate data coming from multiple sources into something more meaningful and insightful. In this role, they’re looking at the different types of data that make up the intelligence of a manufacturing organization, and bridging the gaps in between in an effort to get at the other 85 percent of the data that sits below the surface.

For example, they’re working to put Fast Data—the real-time data streaming from sensors—into context, gleaning valuable insight by giving it perspective from other Big Data sources. And they’re integrating what they call Soft Data. Hard Data is the quantifiable data that comes from sensors and other devices. Soft Data, by contrast, is what comes from human intelligence and is therefore subject to opinions and interpretations and other such uncertainties.

Objectivity has been using an object-oriented approach to information fusion since the 1980s, first in the CAD/CAM industry; then helping newly deregulated telecom companies in the 1990s make sense of increased data volume and complexity; heavily involved in the intelligence community after 9/11 to help connect the dots; and finally to big business, where U.S. IP traffic levels are some 50 times larger than what they were in 2006.

Especially in defense and the oil and gas industry, there has been a “growing need to build these fusion systems better and faster,” says Jin Kim, vice president of product marketing and partner development at Objectivity. “Most customers have spent millions, taking months and years, to build these systems.”

Most oil and gas operations have millions upon millions of data points to make sense of. As Brian Clark, corporate vice president of product for Objectivity, points out in a recent blog, a large oilfield could have tens of thousands of wells, with up to 10 tools per well generating multi-dimensional logs and curves. The gigabytes of data they produce quickly turn into terabytes once the source data is processed.

CGG, a geoscience company that provides geological, geophysical and reservoir analysis for customers primarily in oil and gas exploration, got its first IBM machines in the late 1950s and has been tackling Big Data since 1971, according to Hovey Cox, CGG’s senior vice president of marketing and strategy for geology.

CGG had the first 3D seismic acquisition in industry, Cox says. “There wasn’t a commercial system able to store or analyze that information, so we had to innovate,” he says. “Ever since then, we’ve been pushing that edge to make sure we could create and install that system.”

If you want to talk about Big Data, CGG knows Big Data. Imagine a vessel, towing a massive grid of sensors hooked together with wires. That’s how CGG collects data on land and at sea. It’s the largest moving infrastructure on Earth, Cox says, and it collects a Library of Congress worth of information every seven days. “It’s definitely very large data sets that we work with,” he adds.

In gathering information for its oil and gas exploration clients, it’s all about resolution and speed, Cox says. For its seismic surveys, CGG grew from 40,000 sensors/km to 36 million sensors/km just from 2005 to 2009.

“We have a lot more resolution to be able to make decisions from,” Cox says. “From 2006 to 2014, we saw about a tenfold increase in data volume.” To see what that means in real terms, take a look at the image above showing the step-change improvements in subsalt imaging. “We have much more confidence in determining where a well should be drilled.”

Today, CGG has a huge store of data, “available in this Big Data-ready database,” Cox says. To pull the necessary information from all that data, though, requires the kind of platform Objectivity can provide. “We need to not only store the data and acquire the data very quickly, but also make sure the client can look at the data very effectively, and find the data that they need, and look for correlations that can bring a new value add.”

Enriching Big Data with Fast Data in any meaningful timeframe is typically expensive, time-consuming and inefficient. But Objectivity has created a fusion platform that makes it dramatically easier to build these advanced fusion systems, Kim says, adding that Objectivity’s strategy has been to leverage the open source community to do so, including Hadoop and Spark.

The platform makes use of Hadoop to scale out rather than up. In the past, companies scaled up their servers by adding more memory. Now they can just add on inexpensive server nodes easily. “This tends to be orders of magnitude more cost-effective per byte than traditional systems,” Kim says.

CGG has a need for data mining at a massive scale, Kim notes, and began experimenting with Hadoop and its distributed file structure. “It enabled them to run on top of our system,” Kim explains. “To them, Hadoop is transparent. We do all the heavy lifting. They simply run their seismic system on top of ours.”

But Hadoop still has some limitations. It works well for the Googles of the world, but is “terrible for analytics workloads,” Kim contends. “It tends to be highly iterative, so is terrible on performance.”

In order for Hadoop to be truly used for analytics, Kim says, it requires other components, like YARN (Yet Another Resource Negotiator) or Spark. Spark is an open source cluster computing system developed out of Berkeley to make data analytics faster. It can keep heavily used data in memory, so it can load data and query it much quicker than Hadoop. “Spark has decreased the analytics workload by 2-3 orders of magnitude,” Kim says.

As Kim emphasizes, Objectivity is not an analytics company, but rather a data management and fusion company. And the open source tools have been key enablers. “If we had to develop the whole fusion platform on own, it would’ve taken five years and $80 million in cost,” Kim says. “But with Spark, we were able to do it in less than a year at far less cost.”

For companies like CGG with a need to process and understand massive amounts of data, Objectivity’s platform is instrumental. “We’ve been in the Big Data market since the early 1970s. But like everyone today, our data volumes continue to grow in volume, variety and velocity,” Cox says. “We need to understand and bring value to that data.”

 

More in IIoT