Drowning in Big Data

Smart HMIs and a pipeline of real-time data is only the beginning. An organization's ability to determine which metrics matter, and making sure the numbers actually report what you think they are measuring, is extremely difficult.

Aug. 26, 2013

4 min read

In his column this month, Gary Mintchell reports what National Instruments R&D Fellow Tom Bradicich says about Big Data: That all the analog data acquired from manufacturing and smart devices, a.k.a, the Internet of Things, dwarfs what is currently known a Big Data, and that the continuing challenge is turning data into information. “The four variable of data classically are volume, velocity, variety and value. We have added a fifth—visibility—for who needs to see and analyze results,” said Bradicich.

National Instruments is partnering with IBM and others to process vast amounts of streaming data in real time to turn it in to useful information. In my opinion, while “visibility” may be the latest buzzword in the evolution of big data, the real challenge with that much information remains in making any sense of it at all.

Due to the extreme volume alone, an organization's ability to determine which metrics matter, and making sure the numbers actually report what you think they are measuring, is extremely difficult. Getting data faster and in larger volumes—even if it is more visible—can cripple a team, often leading to risky behavior, as in “That alarm goes off all the time. We were told to switch it off and ignore it.”

In our own publishing organization, Summit Media, we produce content, then we design and build vehicles like magazines, websites and newsletters to distribute it, and manage the deployment of those products—the physical and electronic supply chain. To support and improve our process, we have concentrated on capturing more and more data. Our challenge, like that of our readers, is the ability to find the time to analyze the data, verify that what we are measuring is even correct, and reconcile disparate pieces.

A recent example here is that we’re using tools to measure web traffic and reader interest. One web measurement tool tells us that our editors’ opinion blogs are among the best-read items online. Another report showed they were not on top. With a third look at another source, we think we see that they are, indeed, best read.

We want to increase the number of these items and distribute them more aggressively. But how can we even begin to trust the data, let alone take action, when the situation required three different "looks" into the numbers? What would a fourth look reveal, and when do you start chasing your tail? It is also an interesting study in human nature to note we stopped looking the second time we got a "positive" answer. We did not go in with any assumptions, but we were more satisfied when the data showed “good news.”

Another huge issue with data visibility is the actual design and appearance of the metrics you are capturing in an easy-to-understand, user-friendly format on an HMI. One of the latest trends there (which I heartily endorse) is simplification: toning down colors, putting like metrics together in same format, etc. This makes sense, and helps viewers of the data to make better and quicker decisions.

I'm slightly less enthusiastic about "smart" HMI, which in the world of smartphones and tablets can mean simply the automatic resizing of content windows. Some of the dragging or expanding on interfaces is indeed cool, but could very well be a solution in search of a problem. I'm sometimes reminded of election night, where reporters manipulated the results board (the HMI, if you will) with great flourish. Sometimes the information given this treatment does reveal a deeper look into the results for a better understanding. Other times, I'm afraid it’s technology for technology's sake.

On a more personal note, I love when Big Data appears stupid. For instance, this summer we treated the AW team to a night at the theater. I researched ticket availability, prices and dates and purchased 15 tickets. Obviously, in an attempt to use their Big Data, some kind of cookie was planted and I was put into a "drip" e-mail campaign. Long after attending the event, I am getting regular e-mails noting that I expressed interest and nudging me to finish purchasing tickets.

Obviously, there's a disconnect somewhere in the ticketing software. I'm certain no human at the theater ticket website is aware that it happens. And of course, the end result is almost exactly opposite of what they intended. Rather than feeling like a supporter of the theater, I feel I am a number captured in their Big Data System and no one there cares about me at all. I'm just being manipulated electronically, with no authenticity or personalization.

So watch out for big data. You can drown in it. You can be easily misled. You can even do great harm. But if you do harness it effectively, the possibilities are staggering.

Jim Chrzan, [email protected], is Publisher of Automation World.