Big Data: Revolution or evolution?

Summary:The technology to collect, process and analyze Big Data has been around for a while. So what's changed?

Andrew Brust

Andrew Brust

Revolution

or

Evolution

Dan Kusnetzky

Dan Kusnetzky

Best Argument: Revolution

The moderator has delivered a final verdict.

Opening Statements

Don’t be afraid

Andrew Brust:  Big Data is unmistakably revolutionary.  For the first time in the technology world, we’re thinking about how to collect more data and analyze it, instead of how to reduce data and archive what’s left.  We’re no longer intimidated by data volumes; now we seek out extra data to help us gain even further insight into our businesses, our governments, and our society.

The advent of distributed processing over clusters of commodity servers and disks is a big part of what’s driving this, but so too is the low and falling price of storage.  While the technology, and indeed the need, to collect, process and analyze Big Data, has been with us for quite some time, doing so hasn’t been efficient or economical until recently.  And therein lies the revolution: everything we always wanted to know about our data but were afraid to ask.  Now we don’t have to be afraid.

Not really new

Dan Kusntezky: Big data isn't really new. What we now know as Big Data comes out of ancient and honorable analysis of log data, from a long line of analytical tools that deal with rapidly moving, large amounts of data. Analyzing log data coming out of operating systems, application frameworks, database engines, networking giblets and storage systems has been around for decades as a “big data” task. Just ask vendors such as Splunk, Loggly, or RainStor.

The Rebuttal

  • Great Debate Moderator

    Future direction of Big Data

    Where do you think Big Data is going? Will it become its own subcategory of IT, or is it simply the next phase of BI and DW?

    Posted by Jason Hiner

    BD and BI are separate, but connected

    Big Data already is its own subcategory and will likely remain there. But it's part of the same food chain as BI and DW and these categories will exist along a continuum less than they will as discrete and perfectly distinct fields. That's exactly where things have stood for more than a decade with database administrators and modelers versus BI and data mining specialists. Some people do both, others specialize in on or the other. They're not mutually exclusive, nor is one merely a newer manifestation of the other. And so it will be with Big Data: an area of data expertise with its own technologies, products and constructs, but with an affinity to other data-focused tech specializations. Connections exist throughout the tech industry and computer science, and yet distinctions are still legitimate, helpful and real.

    Andrew Brust

    I am for Revolution

    Evolution not revolution.

    Big Data is going to become part of several IT disciplines rather than replacing any of them. Some of the likely categories are IT management, business analysis, retail systems and the like. IT management will be able to better sift through operational data found in operating system, networking, application framework, application, database, etc. logs to understand what leads up to failure. They'll be able to head them off at the pass rather than allowing them to come into town. Business analysts will be able to do their thing without having to always pester their IT colleagues to develop new code or change database schemas. Retail companies will be able to learn more about their customers so they can be better served. Big data is just providing some new tools to add to the tool kit in use today.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Thank you for joining us

    Andrew and Dan will post their closing arguments tomorrow and I will declare a winner on Thursday. Between now and then, don't forget to cast your vote and jump into the discussion below to post your thoughts on this topic.

    Posted by Jason Hiner

  • Great Debate Moderator

    Is it really the new hottest job in tech?

    There is reportedly going to be a need for 1.2 million new jobs in Big Data analytics over the next decade. Is this about to become the hottest job in IT, or will software engineers continue to be the hottest commodity?

    Posted by Jason Hiner

    Plenty of work to go around

    There will be demand for both. We don't need to make it an either/or question. Just as there have long been developers and database specialists, there will continue to be call for those who build software and those who specialize in the procurement and analysis of data that software produces and consumes. The two are complimentary. But in my mind, people who develop strong competency in both will have very high value indeed. This will be especially true as most tech professionals seem to self-select as one or the other. I've never thought there was a strong justification for this, but Ive long observed it as a trend in the industry. People who buck that trend will be rare, and thus in demand and very well-compensated.

    Andrew Brust

    I am for Revolution

    The key is that non-IT analysts can take part

    It is not at all clear how many new positions will be created or where they will be created. It is far more likely that this will be yet another specialization for software engineers rather than something totally new. The key part of this evolution is that non-IT analysts can now take part without necessarily having to become systems experts.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    U.S. government's investment in Big Data

    The U.S. government just announced a $200 million dollar investment in Big Data and likened it to the rise of the supercomputer and the Internet in terms of its potential impact. How significant is this investment?

    Posted by Jason Hiner

    Can Big Data work in big government?

    I think the investment has symbolic significance, but I also think it has flaws. $200 million is a relatively small amount of money, especially when split over numerous Federal agencies. It's difficult to tell if any of this money will be awarded in the form of grants to independent researchers or if all of the expenditure is for in-house Federal research. If the latter, then I worry that agency inefficiencies may further dilute the impact of this investment. But when the administration speaks to the importance of harnessing Big Data in the work of the government and the importance to society, that tells you it has power and impact. And when it mentions that there's a workforce need around Big Data, and not just around technology in general, that shows and even deeper conviction. The US Federal Government collects reams of data; the Obama administration makes it clear the data has huge latent value.

    Andrew Brust

    I am for Revolution

    Remember Ada?

    Just because the U.S. government invests in something doesn't mean it will become a broad trend. Anyone remember Ada, the programming language that was supposed to combine the best features of COBOL, Fortran and PL/I? While Ada is still important in some government projects, it didn't take over the world. I hope the investment allows the U.S. government to be more efficient and effective. Only time will tell if that dream will become a reality.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    New job: Data scientist

    Big Data is also launching a new job title: Data Scientist. However, aren't these new data wonks more about asking the right questions and using data analysis to tell stories than the data wonks of the past?

    Posted by Jason Hiner

    The term is rather unscientific

    If Big Data's definition suffers from abuse, then that of Data Scientist suffers an order of magnitude more. To me, the field of Data Science is about statistics, data analysis, modeling and computational thinking. Unfortunately, the term is getting dumbed down a bit to describe people with Big Data technology skill sets. For example, someone who understands how to configure and use Hadoop, and maybe knows a little bit about the R programming language (an open source statistics and analysis package) may be described as a Data Scientist, but really should be called a Hadoop specialist.

    Andrew Brust

    I am for Revolution

    What happens if the right questions aren't immediately appearant?

    It appears that analysts are sifting through data and don't often know what question to ask at first. This is one of the key benefits of the new tools. It is possible to sift through massive amounts of data without first knowing what you're looking for. Traditional BI and DW tools often require that an analyst already know what they're seeking.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Natural language queries

    Part of the promise of Big Data is better tools that allow non-database experts to run more natural language queries. Is this realistic? Are there already examples of tools that do this?

    Posted by Jason Hiner

    Sometimes natural, sometimes not

    There are solutions for carrying out Natural Language Processing (NLP) with Hadoop (and thus Big Data). One involves taking the Python programming language and a set of libraries called NTLK (Natural Language ToolKit) and mashing them up with a feature of Hadoop called ???Streaming,??? which allows the Big Data engine to be controlled by almost any programming language. Another example, of both the potential and challenges of natural language technology and Big Data is Apple???s Siri technology on the iPhone. Users simply talk to Siri to get answers from a huge array of domain expertise. Sometimes it works remarkably well; other times it???s a bit klunky. The former is testament to the power and value of Big Data; the latter to the shortcomings of speech processing and semantic understanding in machine learning. Interestingly, Big Data technology itself will help to improve natural language technology as it will allow greater volumes of written works to be processed and algorithmically understood. So Big Data will help itself become easier to use.

    Andrew Brust

    I am for Revolution

    Learning what is going on in real time is more important natural language

    This is only one of many promises the suppliers of Big Data tools are making. It isn???t the most important in many cases. A more important promise is that data analysts will be empowered to sift through data in real time to learn more about the business. This learning is far more important than if the queries are made using a set of check boxes or in natural language statements.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Unstructured data

    Let's drill down a little bit on unstructured data as part of the Big Data movement. What are some examples and why is it significant?

    Posted by Jason Hiner

    Text and media are unstructured

    Text is a good example to start with. Books, papers and reports are only as structured as their sentences and paragraphs, but patterns in that data still exist. Imagine looking at all the annual and quarterly reports submitted by public companies to the Securities and Exchange Commission, over the agency???s history, and correlating phrases and passages to economic phenomena in the reports. That???s using a terrific unstructured/Big Data scenario. Other media, including audio and video are good fodder as well. Since both are either digital or digitize-able, patterns could be mined from them for the purposes of optimizing public safety, customer service or operational improvement. If you start to contemplate the volume of data contained in 24/7 security or traffic camera video, or 911/customer service call center phone audio, you can understand why the intersection of big data and unstructured data is important. Event-driven data is often unstructured.

    Andrew Brust

    I am for Revolution

    How about listening to customers?

    The ability to search documents, presentations, wikis, blogs, videos and audios can help an organization better understand content they???ve created, content that customers have sent them in the form of messages, and the like. Listening to customers regardless of where and how they comment can help a company be much more successful. This goes far beyond simply analyzing shopping baskets to glean some level of understanding of what customers want.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    How is it different than BI and DW?

    How does Big Data differ from the Business Intelligence and Data Warehousing of the past decade?

    Posted by Jason Hiner

    Big Data goes to 11

    Again, it???s a question of the granularity (and therefore scale) of the data. Certain Data Warehousing products, including Massively Parallel Processing (MPP) appliances, can legitimately be called Big Data technology. Various data visualization products can be employed in both BI and Big Data contexts. Tableau is a great example of this as it natively connects to Hadoop and Hive, but also to Data Warehouse appliances, relational databases, and even spreadsheets and flat files. The fact that BI and DW are complimentary to Big Data is a good thing. Big Data lets older, conventional technologies provide insights on data sets that cover a much wider scope of operations and interactions than they could before. The fact that we can continue to use familiar tools in completely new contexts makes the something seemingly impossible suddenly become accessible, even casual. That is revolutionary.

    Andrew Brust

    I am for Revolution

    BI and DW work with highly structured data

    The three Vs come into play here once again. Most BI and Data warehousing rely on well-defined, structured data. Big Data includes many types of data including both structured and unstructured. For example, a Data Warehouse wouldn???t be able to answer a question like, How many company presentations included the catch phrase Big Data?

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    What's the most important factor?

    For business professionals who are trying to understand all of the buzz around Big Data, what would you tell them is the most important thing to understand about Big Data for 2012?

    Posted by Jason Hiner

    2012: Year of Big Data?

    The most important thing is that Big Data is becoming mainstream ??? it???s moving from specialized use in science and tech companies to Enterprise IT applications. That has major implications, as mainstream IT standards for tooling, usability and ease of setup are higher than in scientific and tech company circles. That???s why we???re seeing companies like Microsoft get into the game with cloud-based implementations of Big Data technology that can be requested and configured from a Web browser. The quest to make Big Data more Enterprise-friendly should result in the refinement of the technology and lowering the costs of operating it. Right now, the technology has a lot of rough edges and requires expensive, highly-specialized technologists to implement and operate it. That is changing though, which is further proof of its revolutionary quality.

    Andrew Brust

    I am for Revolution

    New tools appear to simplify the types of analysis done before

    Big Data is a catch phrase that has been bubbling up from the high performance computing niche of the IT market. It is largely the newest attempt to make sense of the ever-larger pile of data organizations have. What???s new this time is that many suppliers are offering powerful tools that are relatively easy to learn. Several open source projects, such as Apache Hadoop, Cassandra, Solr and the like are making tools available at low cost.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    How's it different than Excel reports?

    How does Big Data differ from what the Excel spreadsheet wizards have been doing for most businesses for the past couple decades?

    Posted by Jason Hiner

    Spreadmarts aren't Big Data, but they have a role

    What the spreadsheet jocks have been doing can legitimately be called analytics, but certainly not Big Data, as Excel just can't accommodate Big Data sets as defined earlier. It wasn't until 2007 that Excel could even handle more than 16,384 rows per spreadsheet. It can't handle larger operational data loads, much less Big Data loads. Now all that said, the results of Big Data analyses can be further crunched and explored in Excel. In fact, Microsoft has developed an add-in that connects Excel to Hive, the relational/data warehouse interface to Hadoop, the emblematic Big Data technology. Heres the low-down: the refined exploration and analysis on smaller data sets often done in Excel augments very nicely the comparatively simple work done with Big Data technology and data sets. Think of Big Data work as coarse editing and Excel-based analysis as post-production.

    Andrew Brust

    I am for Revolution

    Structured data is only the beginning

    The three Vs come into play here. The goal is making it easy to tease out useful information out of masses of data. This data is usually measure in the millions or billions of records. That is far beyond what a personal productivity tool, such as Excel, can handle.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Mic check

    Are my two debaters online and ready to go?

    Posted by Jason Hiner

    test

    I'm ready

    Andrew Brust

    I am for Revolution

    Dan K is online

    I'm online and looking forward to the conversation.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Let's define "Big Data"

    As a term, "Big Data" is already starting to get as overused and overhyped as "Cloud Computing." How would you define Big Data?

    Posted by Jason Hiner

    A Big Data Definition

    My primary definition of Big Data is the procurement and analysis of very granular, event-driven data. That involves Internet-derived data that scales well beyond Web site analytics, as well as sensor data, much of which we???ve thrown away until recently. Data that used to be cast off as exhaust is now the fuel for deeper understanding about operations, customer interactions and natural phenomena. To me, that???s the Big Data standard. Event-driven data sets are too big for transactional database systems to handle efficiently. Big Data technologies like Hadoop, complex event processing (CEP) and massively parallel processing (MPP) systems are built for these workloads. Transactional systems will improve, but there will always be a threshold beyond which they were not designed to be used. Other definitions are out there, but I go with the study of event data scaling beyond what operational databases were designed to handle.

    Andrew Brust

    I am for Revolution

    Many definitions are out there. One of the best is using the 3 Vs.

    In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. Think three Vs. Volume - The sheer amount of data, whether from a user base, such as Twitter, LinkedIn or Facebook, or a huge amount of machine/sensor data. Variety - Data is more than validated strings in fields - it???s text, images, video, and all sorts of machine data formats Velocity - Wherever and whoever it???s coming from, you have to capture tens or hundreds of thousands of writes per second, maybe even millions. People analyzed this data before. What???s new is that tools are now available that allow business analysts or non-IT people to do the analysis.

    Dan Kusnetzky

    I am for Evolution

Closing Statements

The revolution isn't televised

Andrew Brust

In this debate, we discussed a number of scenarios where Big Data ties into more established database, Data Warehouse, BI and analysis technologies. The tie-ins are numerous indeed, which may make Big Data’s advances seem merely incremental.  After all, if we can continue to use established tools, how can the change be "Big?"


But the revolution isn’t televised through these tools.  It’s happening away from them.

We're taking huge amounts of data, much of it unstructured, using cheap servers and disks.  And then we're on-boarding that sifted data into our traditional systems. We're answering new, bigger questions, and a lot of them.  We're using data we once threw away, because storage was too expensive and processing too slow. And then we're working with it, in familiar ways -- with little re-tooling or disruption.  It's empowering.  It's unprecedented.  And at the same time, it feels intuitive.

That's revolutionary.

An evolutionary step

Dan Kusnetzky

I find that my role is often that of a "systems archeologist.” I have learned a great deal by watching the market grow and evolve over the years. Big data is clearly an evolution rather than something entirely new and different.

Suppliers come forward with new products or services and declare that they are both unique and new. I’m often forced to rain on their parade by telling them of products from the 1970s, 1980s, 1990s, or 2000s that did the same thing.  Often the only thing new is the platform upon which they've built their product.  I see the same thing when suppliers of big data products and services take time to visit me.

Although the tools that big data suppliers are offering make the analytical process easier and allow IT analysts and non-IT analysts to sift through larger mounds of data, the analytical process is still the same.

What’s new is the sources of data, the volume of data, the different formats of that data and how fast the data is coming in -- not the basic process.

Big data is just an evolutionary step rather than something entirely new.

Big data has potential to change the game

Jason Hiner

There was a lot to like about this debate. I think it helped tease out some of the real value of Big Data and the differences between Big Data and Business Intelligence, Data Warehouse, and other types of old school business reports. I share Dan's skepticism with all of the new stuff in tech that gets championed as the next big thing, when in reality it's just a repackaged version of something that's been around for decades. 
 
That said, in this case Big Data has the potential to change the game. By bringing in real-time, unstructured data from the open web and social networks, Big Data is going to provide business reports with a new level of immediacy and much deeper insights into customer behavior and preferences. On the backend, Big Data is also going to give more people in the organization the tools to run reports and tap into these massive data streams. It's no longer going to be limited to Excel experts and programmers running SQL queries. Andrew explained all this quite clearly, and that's why I'm giving him the nod.

Topics: Great Debate

About

Jason Hiner is Editor in Chief of TechRepublic and Long Form Editor of ZDNet. He writes about the people, products, and ideas changing how we live and work in the 21st century. He's co-author of the upcoming book, Follow the Geeks (bit.ly/ftgeeks).

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.