Some people see the term 'big data' as just broad-brush marketing coated with hype. But even those taking the big-data concept at face value need to overcome certain misconceptions.
Gartner thinks the hype can make it harder to choose the right course of action in this area and has done little to dispel some of the myths that still persist.
These fallacies include ideas such as 80 percent of data is unstructured — it isn't — and that advanced analytics is just a more complex form of normal analytics — again, not true, according to the analyst firm.
In an attempt to establish more of the facts relating to big data, Gartner has published two reports, covering myths about big data's impact on analytics and on information infrastructure. Here are the top five mistaken beliefs.
Myth 1: Everyone is ahead of us in big data
Although interest in big data technologies and services is running high — Gartner reckons 73 percent of firms are investing or planning to — most businesses are still in the early stages of adoption.
So people are wrong to worry that competitors are forging ahead with big data. In fact, only 13 percent of those surveyed had actually deployed any related technology.
"The biggest challenges that organisations face are to determine how to obtain value from big data, and how to decide where to start," Gartner said.
"Many organisations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use-cases."
Gartner concludes: You're not too late. Build strategy on real tasks and involve IT and the business.
Myth 2: There's so much data, little flaws don't matter
Some think because of the law of large numbers, individual data flaws are insignificant and don't influence analysis results.
It's true that each individual flaw may have a much smaller impact on the whole dataset than it did when there was less data, but there are more flaws than before because there is more data.
"Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organisations use in a big-data context comes from outside, or is of unknown structure and origin," Gartner said.
"This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data."
Gartner concludes: Devise new approaches to data quality and choose data quality levels. Follow the core principles of data quality assurance.
Myth 3: Big data will eliminate data integration
The hope is processing information via a schema-on-read approach will let firms read the same sources using multiple data models. That flexibility will enable end users to decide how to interpret any data asset on demand and provide data access tailored to individual users.
However, in reality most users rely on schema-on-write, where data is described and content prescribed, and there is agreement about the integrity of data.
Myth 4: No point using a data warehouse for advanced analytics
Some think building a data warehouse is a waste of time when advanced analytics can use new types of data. In fact, many advanced analytics projects use a data warehouse during the analysis.
Also, new data types may need to be refined to make them suitable for analysis. Furthermore, decisions have to made about which data is relevant, how to aggregate it, and the level of data quality necessary.
Gartner concludes: Use data warehouses where possible as a set of curated data for advanced analytics.
Myth 5: Data lakes will replace the data warehouse
Data lakes are often sold as enterprise-wide platforms for analysing disparate sources of data in native formats. But it's wrong to see them as replacements for data warehouses or as critical elements of an analytical infrastructure, Gartner said.
The technologies behind data lakes lack the maturity and breadth of features found in established data warehouse technologies: "Data warehouses already have the capabilities to support a broad variety of users." Firms don't have to wait for data lakes to catch up.
Gartner concludes: Use data lake technologies such as Hadoop alongside existing data warehouses. Data lakes won't deliver business value without investments in metadata management skills, tools and training.
The two Gartner reports are called Major myths about big data's impact on analytics and Major myths about big data's impact on information infrastructure.