Big Data: Revolution or evolution?

Moderated by Jason Hiner | April 2, 2012 -- 07:00 GMT (00:00 PDT)

Summary: The technology to collect, process and analyze Big Data has been around for a while. So what's changed?

Andrew Brust

Andrew Brust

Revolution

or

Evolution

Dan Kusnetzky

Dan Kusnetzky

Best Argument: Revolution

The Rebuttal

  • Great Debate Moderator

    Future direction of Big Data

    Where do you think Big Data is going? Will it become its own subcategory of IT, or is it simply the next phase of BI and DW?

    Posted by Jason Hiner

    BD and BI are separate, but connected

    Big Data already is its own subcategory and will likely remain there. But it's part of the same food chain as BI and DW and these categories will exist along a continuum less than they will as discrete and perfectly distinct fields. That's exactly where things have stood for more than a decade with database administrators and modelers versus BI and data mining specialists. Some people do both, others specialize in on or the other. They're not mutually exclusive, nor is one merely a newer manifestation of the other. And so it will be with Big Data: an area of data expertise with its own technologies, products and constructs, but with an affinity to other data-focused tech specializations. Connections exist throughout the tech industry and computer science, and yet distinctions are still legitimate, helpful and real.

    Andrew Brust

    I am for Revolution

    Evolution not revolution.

    Big Data is going to become part of several IT disciplines rather than replacing any of them. Some of the likely categories are IT management, business analysis, retail systems and the like. IT management will be able to better sift through operational data found in operating system, networking, application framework, application, database, etc. logs to understand what leads up to failure. They'll be able to head them off at the pass rather than allowing them to come into town. Business analysts will be able to do their thing without having to always pester their IT colleagues to develop new code or change database schemas. Retail companies will be able to learn more about their customers so they can be better served. Big data is just providing some new tools to add to the tool kit in use today.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Thank you for joining us

    Andrew and Dan will post their closing arguments tomorrow and I will declare a winner on Thursday. Between now and then, don't forget to cast your vote and jump into the discussion below to post your thoughts on this topic.

    Posted by Jason Hiner

  • Great Debate Moderator

    Is it really the new hottest job in tech?

    There is reportedly going to be a need for 1.2 million new jobs in Big Data analytics over the next decade. Is this about to become the hottest job in IT, or will software engineers continue to be the hottest commodity?

    Posted by Jason Hiner

    Plenty of work to go around

    There will be demand for both. We don't need to make it an either/or question. Just as there have long been developers and database specialists, there will continue to be call for those who build software and those who specialize in the procurement and analysis of data that software produces and consumes. The two are complimentary. But in my mind, people who develop strong competency in both will have very high value indeed. This will be especially true as most tech professionals seem to self-select as one or the other. I've never thought there was a strong justification for this, but Ive long observed it as a trend in the industry. People who buck that trend will be rare, and thus in demand and very well-compensated.

    Andrew Brust

    I am for Revolution

    The key is that non-IT analysts can take part

    It is not at all clear how many new positions will be created or where they will be created. It is far more likely that this will be yet another specialization for software engineers rather than something totally new. The key part of this evolution is that non-IT analysts can now take part without necessarily having to become systems experts.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    U.S. government's investment in Big Data

    The U.S. government just announced a $200 million dollar investment in Big Data and likened it to the rise of the supercomputer and the Internet in terms of its potential impact. How significant is this investment?

    Posted by Jason Hiner

    Can Big Data work in big government?

    I think the investment has symbolic significance, but I also think it has flaws. $200 million is a relatively small amount of money, especially when split over numerous Federal agencies. It's difficult to tell if any of this money will be awarded in the form of grants to independent researchers or if all of the expenditure is for in-house Federal research. If the latter, then I worry that agency inefficiencies may further dilute the impact of this investment. But when the administration speaks to the importance of harnessing Big Data in the work of the government and the importance to society, that tells you it has power and impact. And when it mentions that there's a workforce need around Big Data, and not just around technology in general, that shows and even deeper conviction. The US Federal Government collects reams of data; the Obama administration makes it clear the data has huge latent value.

    Andrew Brust

    I am for Revolution

    Remember Ada?

    Just because the U.S. government invests in something doesn't mean it will become a broad trend. Anyone remember Ada, the programming language that was supposed to combine the best features of COBOL, Fortran and PL/I? While Ada is still important in some government projects, it didn't take over the world. I hope the investment allows the U.S. government to be more efficient and effective. Only time will tell if that dream will become a reality.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    New job: Data scientist

    Big Data is also launching a new job title: Data Scientist. However, aren't these new data wonks more about asking the right questions and using data analysis to tell stories than the data wonks of the past?

    Posted by Jason Hiner

    The term is rather unscientific

    If Big Data's definition suffers from abuse, then that of Data Scientist suffers an order of magnitude more. To me, the field of Data Science is about statistics, data analysis, modeling and computational thinking. Unfortunately, the term is getting dumbed down a bit to describe people with Big Data technology skill sets. For example, someone who understands how to configure and use Hadoop, and maybe knows a little bit about the R programming language (an open source statistics and analysis package) may be described as a Data Scientist, but really should be called a Hadoop specialist.

    Andrew Brust

    I am for Revolution

    What happens if the right questions aren't immediately appearant?

    It appears that analysts are sifting through data and don't often know what question to ask at first. This is one of the key benefits of the new tools. It is possible to sift through massive amounts of data without first knowing what you're looking for. Traditional BI and DW tools often require that an analyst already know what they're seeking.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Natural language queries

    Part of the promise of Big Data is better tools that allow non-database experts to run more natural language queries. Is this realistic? Are there already examples of tools that do this?

    Posted by Jason Hiner

    Sometimes natural, sometimes not

    There are solutions for carrying out Natural Language Processing (NLP) with Hadoop (and thus Big Data). One involves taking the Python programming language and a set of libraries called NTLK (Natural Language ToolKit) and mashing them up with a feature of Hadoop called ???Streaming,??? which allows the Big Data engine to be controlled by almost any programming language. Another example, of both the potential and challenges of natural language technology and Big Data is Apple???s Siri technology on the iPhone. Users simply talk to Siri to get answers from a huge array of domain expertise. Sometimes it works remarkably well; other times it???s a bit klunky. The former is testament to the power and value of Big Data; the latter to the shortcomings of speech processing and semantic understanding in machine learning. Interestingly, Big Data technology itself will help to improve natural language technology as it will allow greater volumes of written works to be processed and algorithmically understood. So Big Data will help itself become easier to use.

    Andrew Brust

    I am for Revolution

    Learning what is going on in real time is more important natural language

    This is only one of many promises the suppliers of Big Data tools are making. It isn???t the most important in many cases. A more important promise is that data analysts will be empowered to sift through data in real time to learn more about the business. This learning is far more important than if the queries are made using a set of check boxes or in natural language statements.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Unstructured data

    Let's drill down a little bit on unstructured data as part of the Big Data movement. What are some examples and why is it significant?

    Posted by Jason Hiner

    Text and media are unstructured

    Text is a good example to start with. Books, papers and reports are only as structured as their sentences and paragraphs, but patterns in that data still exist. Imagine looking at all the annual and quarterly reports submitted by public companies to the Securities and Exchange Commission, over the agency???s history, and correlating phrases and passages to economic phenomena in the reports. That???s using a terrific unstructured/Big Data scenario. Other media, including audio and video are good fodder as well. Since both are either digital or digitize-able, patterns could be mined from them for the purposes of optimizing public safety, customer service or operational improvement. If you start to contemplate the volume of data contained in 24/7 security or traffic camera video, or 911/customer service call center phone audio, you can understand why the intersection of big data and unstructured data is important. Event-driven data is often unstructured.

    Andrew Brust

    I am for Revolution

    How about listening to customers?

    The ability to search documents, presentations, wikis, blogs, videos and audios can help an organization better understand content they???ve created, content that customers have sent them in the form of messages, and the like. Listening to customers regardless of where and how they comment can help a company be much more successful. This goes far beyond simply analyzing shopping baskets to glean some level of understanding of what customers want.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    How is it different than BI and DW?

    How does Big Data differ from the Business Intelligence and Data Warehousing of the past decade?

    Posted by Jason Hiner

    Big Data goes to 11

    Again, it???s a question of the granularity (and therefore scale) of the data. Certain Data Warehousing products, including Massively Parallel Processing (MPP) appliances, can legitimately be called Big Data technology. Various data visualization products can be employed in both BI and Big Data contexts. Tableau is a great example of this as it natively connects to Hadoop and Hive, but also to Data Warehouse appliances, relational databases, and even spreadsheets and flat files. The fact that BI and DW are complimentary to Big Data is a good thing. Big Data lets older, conventional technologies provide insights on data sets that cover a much wider scope of operations and interactions than they could before. The fact that we can continue to use familiar tools in completely new contexts makes the something seemingly impossible suddenly become accessible, even casual. That is revolutionary.

    Andrew Brust

    I am for Revolution

    BI and DW work with highly structured data

    The three Vs come into play here once again. Most BI and Data warehousing rely on well-defined, structured data. Big Data includes many types of data including both structured and unstructured. For example, a Data Warehouse wouldn???t be able to answer a question like, How many company presentations included the catch phrase Big Data?

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    What's the most important factor?

    For business professionals who are trying to understand all of the buzz around Big Data, what would you tell them is the most important thing to understand about Big Data for 2012?

    Posted by Jason Hiner

    2012: Year of Big Data?

    The most important thing is that Big Data is becoming mainstream ??? it???s moving from specialized use in science and tech companies to Enterprise IT applications. That has major implications, as mainstream IT standards for tooling, usability and ease of setup are higher than in scientific and tech company circles. That???s why we???re seeing companies like Microsoft get into the game with cloud-based implementations of Big Data technology that can be requested and configured from a Web browser. The quest to make Big Data more Enterprise-friendly should result in the refinement of the technology and lowering the costs of operating it. Right now, the technology has a lot of rough edges and requires expensive, highly-specialized technologists to implement and operate it. That is changing though, which is further proof of its revolutionary quality.

    Andrew Brust

    I am for Revolution

    New tools appear to simplify the types of analysis done before

    Big Data is a catch phrase that has been bubbling up from the high performance computing niche of the IT market. It is largely the newest attempt to make sense of the ever-larger pile of data organizations have. What???s new this time is that many suppliers are offering powerful tools that are relatively easy to learn. Several open source projects, such as Apache Hadoop, Cassandra, Solr and the like are making tools available at low cost.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    How's it different than Excel reports?

    How does Big Data differ from what the Excel spreadsheet wizards have been doing for most businesses for the past couple decades?

    Posted by Jason Hiner

    Spreadmarts aren't Big Data, but they have a role

    What the spreadsheet jocks have been doing can legitimately be called analytics, but certainly not Big Data, as Excel just can't accommodate Big Data sets as defined earlier. It wasn't until 2007 that Excel could even handle more than 16,384 rows per spreadsheet. It can't handle larger operational data loads, much less Big Data loads. Now all that said, the results of Big Data analyses can be further crunched and explored in Excel. In fact, Microsoft has developed an add-in that connects Excel to Hive, the relational/data warehouse interface to Hadoop, the emblematic Big Data technology. Heres the low-down: the refined exploration and analysis on smaller data sets often done in Excel augments very nicely the comparatively simple work done with Big Data technology and data sets. Think of Big Data work as coarse editing and Excel-based analysis as post-production.

    Andrew Brust

    I am for Revolution

    Structured data is only the beginning

    The three Vs come into play here. The goal is making it easy to tease out useful information out of masses of data. This data is usually measure in the millions or billions of records. That is far beyond what a personal productivity tool, such as Excel, can handle.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Mic check

    Are my two debaters online and ready to go?

    Posted by Jason Hiner

    test

    I'm ready

    Andrew Brust

    I am for Revolution

    Dan K is online

    I'm online and looking forward to the conversation.

    Dan Kusnetzky

    I am for Evolution

  • Great Debate Moderator

    Let's define "Big Data"

    As a term, "Big Data" is already starting to get as overused and overhyped as "Cloud Computing." How would you define Big Data?

    Posted by Jason Hiner

    A Big Data Definition

    My primary definition of Big Data is the procurement and analysis of very granular, event-driven data. That involves Internet-derived data that scales well beyond Web site analytics, as well as sensor data, much of which we???ve thrown away until recently. Data that used to be cast off as exhaust is now the fuel for deeper understanding about operations, customer interactions and natural phenomena. To me, that???s the Big Data standard. Event-driven data sets are too big for transactional database systems to handle efficiently. Big Data technologies like Hadoop, complex event processing (CEP) and massively parallel processing (MPP) systems are built for these workloads. Transactional systems will improve, but there will always be a threshold beyond which they were not designed to be used. Other definitions are out there, but I go with the study of event data scaling beyond what operational databases were designed to handle.

    Andrew Brust

    I am for Revolution

    Many definitions are out there. One of the best is using the 3 Vs.

    In simplest terms, the phrase refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. Think three Vs. Volume - The sheer amount of data, whether from a user base, such as Twitter, LinkedIn or Facebook, or a huge amount of machine/sensor data. Variety - Data is more than validated strings in fields - it???s text, images, video, and all sorts of machine data formats Velocity - Wherever and whoever it???s coming from, you have to capture tens or hundreds of thousands of writes per second, maybe even millions. People analyzed this data before. What???s new is that tools are now available that allow business analysts or non-IT people to do the analysis.

    Dan Kusnetzky

    I am for Evolution

Talkback

24 comments
Log in or register to join the discussion
  • Search is not Big Data analytics

    "Big Data", which has to be the worst term yet coined, goes far beyond log management and search. At the moment, it's being abused by those vendors to give relevance to their products.

    Big Data is combination of technologies, like new computing paradigms (e.g. Red Lambda's collaborative grids, or Hadoop), simpler throughput-oriented storage (e.g. lock-free NoSQL, graph databases, etc.) and most importantly, incremental data mining. Big Data is really best defined as the situation in which data is either too large, or too 'continuous' to ever make an analytical pass over the entire dataset. You can't just fire up k-means and cluster your data, by the time you are done, the results are at worst irrelevant, or at best forensic.

    The crown jewel of Big Data is incremental knowledge discovery. This is the act of applying data mining techniques to *any* events as they arrive, without referencing any other data that has been received. The trick is how to cluster, classify and perform anomaly detection on such data for the life of the system's operation. No batch processing method (like Hadoop) can solve this problem. *Any* events has to mean *any*. Binary files, imagery, audio, and UTF-8 logs are all forms of data. Being able to perform basic searches on one of them hardly qualifies for this category.
    conduit242
    Reply Vote I'm for Revolution
  • concept of big data and analytics isn't new

    it's the ease in which the data is captured (mobile devices and apps) and the context. Big data isn't necessarily transactional (e.g. I bought a product)...it's behavioral.
    getrichieb
    Reply Vote I'm for Revolution
  • I've always thought that the term "big data" is not indicative of exactly

    what the tech people really want to convey. The data collected is "voluminous", and hard to tackle or tame for quick analysis. Huge volumes of data should be called exactly what they are, and that's "voluminous collections of data" or perhaps "voluminous data" for short.
    adornoe
    Reply Vote I'm for Evolution
  • Big Data hardly matters

    What matters is what you do with it:
    - Use it to obtain information to improve decision-making
    - Protect what you have in custody

    If you use the data wisely, it is a revolution. But the existence of the data explosion means little if it is not exploited and protected.
    nmarks2
    Reply Vote I'm for Revolution
    • BD Matters, Evolution vs. Revolution Does Not.

      I agree to the extent that what's truly important is "what you do with it."

      The debate as to whether it's evolution or a revolution is, to me, primarily a question of etymology. So the question itself is intrinsically arbitrary. However, it seems that the highest benefits of this argument's conclusion(s) will be yielded from the questions that logically follow. For example, as Kusnetzky point out, "The key is that non-IT analysts can take part." This would be huge. The ability to regularly use experts in the fields directly related to the data at hand, rather than IT analysts, would be nothing short of game changing.
      felipeowen
      Reply Vote I'm Undecided
  • New Technologies of the Age Spell Big Data Revolution

    The techhnologies needed to make big data meaningful are reasonably new in terms of availability: supercomputers. Specifically we're seeing impressive developments in quantum computing which will truly give rise to the big data revolution.
    xamountofwords
    Reply Vote I'm for Revolution
    • Big data, or voluminous data, is happening now, and quantum computing

      is not yet there.

      So, we need practical solutions and practical hardware, and practical applications, NOW!
      adornoe
      Reply Vote I'm Undecided
      • We have practical hardware

        We have practical hardware. That's not really the problem. We can process huge volumes of data easily. The question is, why are you processing the data, and how will the results affect your business?
        CobraA1
        Reply Vote I'm for Evolution
      • CobraA1: Agree with you; the problem was with xamountofwords's quantum

        computer comments, of which I was trying to impress that, we need the practical solutions now, and quantum computers are not there yet. And, yes, you're absolutely right about there already being the hardware and the software to process the huge amounts of data, systematically and organizationally, and with results that matter. It's only a matter of making sure that we continue to keep up with the amounts of data being generated.
        adornoe
        Reply Vote I'm Undecided
  • Big Data has no relevance for operational systems

    Which are the interesting things as far as I am concerned.

    As for analysis there is a whole discipline devoted to interpreting large data sets. It's called statistics. You can't make "big data" intuitive because statistical analysis often reveals counter-intuitive results; that why you need statistics. Our intuitive sense of probability is extremely poor.

    If there had been some huge breakthrough in statistical techniques then we might talk about revolution, but all I see is some optimisation techniques that aren't new and are liable to lead to incorrect results.

    Likewise, if someone had devised a technique to query operational data in real-time with little or no impact on operational systems then I might be impressed. Very useful, but not really revolutionary.
    jorwell
    Reply Vote I'm for Evolution