Last time in Three Powerful Insights from IBM Think 2018, we talked about some exciting insights from IBM Think 2018. And we found that a lot of takeaways from the conference lead in one direction: towards the promise of taking advantage of the new data explosion. The amount of data that companies can wield is increasing at an unprecedented rate, and it’s one of the most precious resources that a smart company has.
That might seem like an overly bold statement, given that data is totally intangible. But if you think about it, the only thing that’s really stopping a competitor from coming in and duplicating your business is your privately-held knowledge. And privately-held is a description that applies to most of the world’s data: only 20% of it is searchable. Most of it is hidden, like yours. It’s your competitive advantage.
But “data” isn’t some science-fiction material that instantly warps to your will. It requires proper handling. Inaccurate data usually produces incorrect insights. And even correctly-gathered data, plugged into the wrong models, can give misleading pseudo-insights. In this article, we’re going to talk about how data can go wrong, and how you can handle it right.
When Data Goes Wrong
Business leaders know that we’re in the age of data. And yet, many of them don’t have a good relationship with it. One in two surveyed business leaders say they can’t get the data they need, and one in three say they can’t trust it. Why?
Research done by Deloitte provides some clues. In a report called “Predictably Inaccurate: The Prevalence and Perils of Big Bad Data,” authors Lucker, Hogan, and Bischoff reveal that a troubling amount of commercially available data is, in fact, wrong. Upon auditing data available from reputable third-party data brokers by checking it against the dataset of actual customers, they found that more than two-thirds of subjects reported that data about them was less than 50% accurate. This was true even of data like vehicle registration, which is publicly available.
This means that business leaders’ distrust of data is perfectly rational. It points towards a need to listen more closely to proprietary data, as well as a more thorough screening of purchased data. And it also points towards some pitfalls with data collection in general. Often, companies rely on outdated datasets, due to the costs of updating them. They also interweave datasets inappropriately, making less data out of more data. Overall, there’s a clear picture here: information resources need to be handled carefully. You’ve got to make sure you’re sampling the right data, at the right time—which can be difficult, due to the sheer volume of information that any large organization has access to.
But even that isn’t enough. When you retrieve the correct data, problems can still occur.
When Models Fight Back
Patterns extracted from data aren’t always meaningful. The most infamous issue in data analysis is that of correlation vs. causation. Let’s imagine that, for a few months, your customer satisfaction increases as you spend more on customer service. Does that mean you should throw all of your available money at customer service? Not necessarily. It’s possible that the correlation is a temporary phenomenon, not an enduring relationship. Maybe the introduction of a new product has caused a temporary spike in support needs, which will die off as adoption continues. And, thus, your company won’t see any increased satisfaction from increased support spending. It would just be a waste of money.
At a more advanced level, all sorts of trouble can come from incorrectly applying cognitive solutions. One common problem with using cognitive solutions is “overfit”— when a machine learning algorithm looks too deeply into a dataset and creates complicated equations to describe random noise. In a sense, this is a case of machine learning being too smart, which is to say, possessing a level of attention to detail that becomes meaningless. Doing bad machine learning modeling is like building an intricate stairway to nowhere.
It’s obvious, then, that cognitive data analytics, while powerful, need to be applied with intelligence. You’ve got to make sure the data is right and relevant, and that the cognitive solutions applied to it are appropriate.