﻿<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:s="http://www.zdnet.com/search" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
  <channel>
    <link>http://www.zdnet.com/</link>
    <title>ZDNet | Big On Data Blog RSS</title>
    <description>Latest blogs in Big on Data</description>
    <language>en</language>
    <copyright>ZDNet</copyright>
    <managingEditor>customerservice@zdnet.com (ZDNet Customer Services)</managingEditor>
    <webMaster>uk-engineering@cbsinteractive.com (ZDNet Webmaster)</webMaster>
    <pubDate>Thu, 23 May 2013 03:16:04 -0700</pubDate>
    <lastBuildDate>Thu, 23 May 2013 03:16:04 -0700</lastBuildDate>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>2</ttl>
    <image>
      <url>http://i.zdnet.com/images/spry/zdnet_300x300.jpg</url>
      <link>http://www.zdnet.com/</link>
      <title>ZDNet | Big On Data Blog RSS</title>
      <width>143</width>
      <height>39</height>
    </image>
    <s:counts>
      <start>0</start>
      <return>20</return>
      <found>118</found>
    </s:counts>
    <item>
      <guid isPermaLink="false">7000015706</guid>
      <link><![CDATA[http://www.zdnet.com/data-scientists-dont-scale-7000015706/]]></link>
      <title><![CDATA[Data scientists don't scale]]></title>
      <description><![CDATA[In last week's ZDNet "Great Debate," Robin Harris and I faced off on the question of whether "we need data scientists to make sense of this tidal wave of information."  I think data scientists are important, but they're not the solution.  What follows is my argument, in essay form.]]></description>
      <pubDate><![CDATA[Thu, 23 May 2013 01:23:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>"Data scientist" is a title designed to be exclusive, standoffish and protective of a lucrative guild. &nbsp;To be clear, people who have the skills to qualify for this moniker are very valuable, but the title itself isn't. &nbsp;The blocker to broad adoption of Big Data analytics isn't a shortage of data scientists; it's our current dependency on them.</p>
<p>Big Data and analytics are powerful, and the technologies around them are exciting. But if they can only be harnessed by highly-paid specialists, then they haven't fully evolved. We need data and analytics technologies, but we shouldn't need expensive, scarce, Shaman practitioners to use them. More than data scientists, we need tools that empower knowledge workers to do Big Data analytics on their own.</p>
<p><strong>Is crossing over possible?</strong><br />People can certainly obtain the literacy necessary to carry out analytics on Big Data.&nbsp; Business people can be made capable of working with the data, and developers who are not currently analytics-focused can be made capable of collecting the data and performing analytics in their code.</p>
<p>Of course, certain people can be trained to become very highly-skilled specialists, but that would be the exception more than the rule, and that's OK.&nbsp; We don't need people to retool en masse into scientists, we need them to obtain a competency.</p>
<p><strong>Beyond the hype</strong><br />The term "data scientist" is over-hyped.&nbsp; But in fairness, so are the terms "Big Data" and "analytics," and yet these are still quite valid areas of specialization.&nbsp; The problem with the term "data scientist" goes beyond the hype; there's an attitude and adversarial tone to the term. This tone discourages people from obtaining analytics competency, as it transmits an implicit message that the work must be outsourced to highly-trained individuals.&nbsp; Aside from the hype, it's pretension and snobbery that make "data scientist" an unhelpful term.</p>
<p><strong>Dilution of the term</strong><br />There a risk that many technologists will become "data scientists" in the name of finding a better gig, in exactly the same way that happened with other lofty titles in technology ("architect," for example).&nbsp; Title inflation happens in any field, but in the tech field, terms and titles are in any case viewed as metaphors, more than literal descriptions.&nbsp; Tech folks tend to take poetic license with titles, and those who don't do so find themselves at a disadvantage compared to those who do.</p>
<p><strong>It's the tooling, stupid</strong><br />Analytics in general, and Big Data specifically, have terribly immature tooling compared to mainstream relational database and BI products.&nbsp; That being the case, it's no wonder that only "scientists" can get real work done. &nbsp;These tools were built for laboratory use, not business use.</p>
<p>Just as self-service BI is in vogue (and is legitimately quite powerful) today, so too should self-service Big Data and predictive analytics be a market phenomenon.&nbsp; Once it is, people with the skills that we classify under data science today will still have an important role, but it won't be nearly so central as it is now.</p>
<p><strong>Data literacy, and what it could look like</strong><br />It won't come as a surprise that I believe a scenario where we have&nbsp;more data literacy -- and business and tech people who are "bilingual" --&nbsp;to be the one that will most successfully&nbsp;solve the labor issues we face.&nbsp; Data science is about having a command of both data technology and business domain expertise.&nbsp; If the technology becomes simple, and business people become more adept with it, then business users can be bona fide analytics professionals.</p>
<p>If I had my druthers, a perfect business analytics wonk&nbsp;would be a sales, marketing or planning professional who was also a tech power user, had a command of statistics, knew Excel very well and could do some light programming.&nbsp; But that's an ideal&hellip;and in order for analytics technology to take off, we shouldn't need people to fit this ideal in order to be productive Big Data analysts.</p>
<p><strong>Data science algorithms?</strong><br />Implicit in the definition of data scientist is possession of business intuition and instinct that mere algorithms can't replace.&nbsp; If you accept that the term is legitimate, then you accept that a combination of human intelligence and technology expertise is what makes someone an authentic data scientist.&nbsp; While I'm not a huge fan of the term "data scientist," I do feel the experience of the business user and her non-algorithmic intimacy with the semantics of the data is very important.</p>
<p><strong>What we will need, and what we won't<br /></strong>Expertise in data exploration and visualization tools, programming/developer skills, an understanding of statistics, and high-level database design skills will remain important, regardless of whether the data scientist role remains in vogue.&nbsp; Equally important will be a deep understanding of the business, and the data sources that measure its activity and outcomes.</p>
<p>The term "data scientist" will subside and may well sound dated five years from now.&nbsp; The skills will become more commonplace and commoditized.&nbsp; When that happens, the real boom will begin, because the technology will become widely adopted and thus more useful.&nbsp; But for the relatively small club of people clinging to a data scientist identity and pay scale, it may seem like a bust.</p>
<p><strong>Summing up</strong><br />Big Data technology is powerful, and it keeps getting better. But the technology does, right now, require niche specialists to derive the greatest business value from it. These specialists have to be renaissance people &ndash; possessing a combination of technology, mathematics and business skills, and knowledge. It's not clear that being so clever and versatile makes these specialists into "scientists," but it does make them rarefied.</p>
<p>Nonetheless, for Big Data and analytics implementations to grow and become truly mainstream, having such diverse skill set requirements for them is not a sustainable situation. Market need is going to drive evolution in the technology such that the barrier to entry will not be nearly so high as it is now. If for some reason that didn't happen, then adept use of Big Data would continue to be an option open only to a relatively small group of customers.</p>
<p>The solution to our problem isn't legions of new data scientists. Instead, we need self-service tools that empower smart and tenacious business people to perform Big Data analysis themselves. The specialists will still have an important role, but they won't be the linchpin to scaling Big Data across industries.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000015698</guid>
      <link><![CDATA[http://www.zdnet.com/sisense-announces-prism-10x-7000015698/]]></link>
      <title><![CDATA[SiSense Announces Prism 10x]]></title>
      <description><![CDATA[SiSense announces a new version of its data visualization and BI database engine suite, along with major performance increases. ]]></description>
      <pubDate><![CDATA[Wed, 22 May 2013 20:00:00 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p><em><strong>Disclosure:</strong> Readers should take careful note that SiSense is a client of my company, Blue Badge Insights. &nbsp;This article is not commissioned or compensated by SiSense, and I intend it as an objective report on the company's new product announcement. &nbsp;However, my point of view is certainly subjective, and readers should bear that in mind.<br />---</em></p>
<p><a >SiSense</a>, a company on which I have reported before, has introduced a new version of its Prism product, dubbed Prism 10X, which delivers major enhancements to its underlying ElastiCube database engine. &nbsp;The company says the new version provides 100 times the data capacity and 10 times the speed as competing, in-memory analysis solutions running on on the same hardware.</p>
<p><strong>Think global, act local</strong><br />A lot of available Big Data solutions work on a "scale-out" approach to processing data. Typically this means adding more commodity servers to a cluster, so that more data can be processed in parallel, allowing processing times to remain reasonable even as data volumes increase.</p>
<p>Such an architecture is very powerful, but it can take focus away from the parallel optimizations that can be had on a single machine. &nbsp;These in-machine optimizations are exactly where SiSense's engineers have concentrated their efforts, employing a combination of cache-awareness, columnar compression, predictive pre-fetching and vectorization.&nbsp;&nbsp;</p>
<p><strong>Cache flow</strong><br />SiSense's ElastiCube engine focuses not just processing data in-memory, but also within a Central Processing Unit's (CPU's) on-board cache. &nbsp;Moving data in and out of cache is much faster than doing so with Random Access Memory (RAM), and while many data engines use cache only incidentally, Prism targets in-cache data manipulation&nbsp;explicitly.</p>
<p>Cache is much smaller than memory, so the ElastiCube engine employs columnar compression, not just for the storage of data on disk, but also for its in-cache&nbsp;persistence.&nbsp; The engine also factors out&nbsp;queries into sub-queries (which SiSense calls "instructions" and says tend to repeat) and pre-fetches results for the sub-queries that its heuristics tells it users will want. &nbsp;Interestingly, these heuristics improve as the engine's workload increases, so greater load on the system can actually lead to better performance.</p>
<p>Prism not only targets cache, but makes use of newer CPUs' "single instruction, multiple data" (SIMD) instructions, which process several data values at once, rather than one at a time. &nbsp;This facilitates parallel processing within a machine, rather than between nodes (servers) in a cluster. &nbsp;This technique is sometimes referred to as vectorization.</p>
<p><strong>Start your engine...and then keep going</strong><br />As obsessive as the SiSense engineering team is about crafting a super-efficient query kernel, Prism is more a competitor to data discovery and visualization tools like <a href="http://www.tableausoftware.com/">Tableau</a>, <a href="http://www.qlikview.com/">QlikView</a> and <a href="http://spotfire.tibco.com/">TIBCO Spotfire</a> than it is to a data warehouse or online analytical&nbsp;processing&nbsp;(OLAP) products. &nbsp;Prism includes scatter charts, wind-roses, funnels, scatter and areas maps, among others, and SiSense says that "thousands of combinations are available."</p>
<p>Competing data discovery tools have their own engines too, and use a combination of columnar and in-memory techniques to attain high performance. &nbsp;But they don't seem to exploit cache and SIMD operations (what SiSense calls "in-chip analytics").</p>
<p><strong>Room to grow</strong><br />SiSense is far from perfect. &nbsp;As good as its single node optimizations are, its lack of cluster-based deployment capability will be a turn-off to some who are looking for petabyte-scale solutions. &nbsp;But for data discovery work, terabyte-scale is where many (if not most) enterprise customers are right now.</p>
<p>Clustering capabilities may come in a future release. &nbsp;But for now, SiSense is focusing on a high-speed, integrated solution for single-box data discovery work, and its reported 520% year-on-year growth is making the company feel its approach is quite well-validated.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000015558</guid>
      <link><![CDATA[http://www.zdnet.com/tableau-ipo-q-and-a-with-ceo-chabot-7000015558/]]></link>
      <title><![CDATA[Tableau IPO: Q&A with CEO Chabot]]></title>
      <description><![CDATA[Data discovery rock star Tableau goes public on the NYSE, with ticker symbol "DATA."  But will Tableau now grow or plateau?]]></description>
      <pubDate><![CDATA[Sat, 18 May 2013 03:34:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<figure class="alignRight"><img title="Christian Chabot" alt="Christian Chabot" src="http://cdn-static.zdnet.com/i/r/story/70/00/015558/chabot-headshot-200x266.png?hash=L2HlZGZ3ZG&upscale=1" height="266" width="200"><figcaption>Tableau CEO and co-founder, Christian Chabot</figcaption></figure>
<p>Today was a big day for Big Data and analytics, as data discovery and visualization darling <a href="http://www.tableausoftware.com">Tableau Software</a>&nbsp;made good on <a href="http://www.zdnet.com/tableau-files-for-ipo-7000013417/">its filing for an initial public offering</a>. &nbsp;The company's shares were issued today on the New York Stock Exchange, under the ticker symbol "DATA." &nbsp;Wow.</p>
<p>Tableau offered&nbsp;8,200,000 shares of its Class A common stock at a price to the public of $31.00 per share. &nbsp;The shares closed up almost 64% above that initial pricing&nbsp;today,&nbsp;their first day of trade. &nbsp;Tableau now has a market capitalization of $2.9 billion.</p>
<p>I had the opportunity to speak today with Tableau's CEO and co-founder, Christian Chabot, and I discussed with him matters relating to Hadoop, data scientists, the business intelligence space, its business models, and what's next for Tableau.</p>
<p><strong>Don't spend it all in one place</strong><br>To start, I asked Chabot what the IPO-derived capital would be used for. &nbsp;His answer was that the IPO was more about raising public awareness of&nbsp;Tableau, and the credibility of the company, than it was for expansion per se.</p>
<p>"Tableau everywhere" is what Chabot says is the company's next frontier. &nbsp;He explained that&nbsp;Tableau's revenue is currently derived from a Windows-only, on-premises-only offering and one with limited market awareness, to boot. &nbsp;So there's a lot of growth opportunity to go.</p>
<p><strong>Product or stack?</strong><br>While that's all well and good, we have to assume that Tableau will expand its sales force and quite possibly its product portfolio. &nbsp;So I asked a few questions around those topics.<br><br>With Tableau's growth as a private company, and its now&nbsp;seemingly quite successful IPO, it has set a very high bar for itself. &nbsp;I pointed out to Chabot that this has all been built around what is essentially a single product, and so I asked what Tableau would do to keep the momentum going.</p>
<p>First off, Chabot corrected my assertion that Tableau is a single product, insisting that <a href="http://www.tableausoftware.com/products/desktop">Tableau Desktop</a>, <a href="http://www.tableausoftware.com/products/server">Tableau Server</a> and <a href="http://www.tableausoftware.com/products/public">Tableau Public</a> are quite separate. &nbsp;I suppose this comes down to semantics; to me, three different editions that are all geared to data discovery don't constitute separate products, but certainly they are marketed separately, and that does count for something. &nbsp;</p>
<p>I peristed in exploring the possibility of new products from Tableau though, as most of its BI competitors offer a full stack of products that cover data integration, master data management, data quality, conventional reporting, and more. &nbsp;Chabot explained that there's little reason to match everyone with a full BI stack simply for the sake of conforming to the market category. &nbsp;But he also told me that the company is interested in diversifying into new product areas for which Tableau is seeing significant customer demand. &nbsp;Chabot said that a data integration offering is of particular interest to Tableau.</p>
<p><strong>Declaring independence, and neutrality<br></strong>But if Tableau remains mostly focused on data discovery and visualization, it begs the question of whether it will be acquired by one of the BI stack vendors that is weak in that area (and compared to Tableau, many such vendors are). &nbsp;Chabot insisted that Tableau will remain independent, explaining that such independence allows the product to remain a Swiss Army knife that connects to virtually any relational, Big Data or&nbsp;analytical data source, and how that benefits Tableau customers greatly.</p>
<p>Certainly, Tableau customers are a happy bunch, as I noted in <a href="http://www.zdnet.com/tableau-8-unveiled-can-it-keep-the-good-times-rolling-7000007001/">my report from the Tableau Customer Conference last year</a>. &nbsp;Chabot believes strongly that Tableau's undiluted dedication to self-service is what drives customers' passion, and that it also put Tableau well ahead of its BI competitors that offer self-service capabilities as a mere option, if at all.</p>
<p><strong>Data science, for the layman</strong><br>Having just participated in <a href="http://www.zdnet.com/debate/business-analytics-do-we-need-data-scientists/10119786/">ZDNet's Great Debate, on the need for data scientists</a>&nbsp;this week, I asked Chabot what he thought about the issue. &nbsp;Not surprisingly, Tableau's CEO feels that we are too reliant on specialists, and that expanding this "priesthood of people" doesn't get us past the bottleneck. &nbsp;Chabot said that the complicated and developer-intensive nature of the&nbsp;vast majority of data technologies is what underlies Tableau's success.</p>
<p>With that complexity in mind, I asked Chabot about Hadoop. &nbsp;The quintessential Big Data technology is clearly popular, but hardly something one thinks of when the self-service, agile and empowerment themes that Tableau identifies with are invoked.</p>
<p><strong>"DATA" Convergence</strong><br>How did Chabot reconcile the success of self-service with that of a complex tool like Hadoop? &nbsp;He explained that Tableau sees Hadoop at many customers, but almost never sees it as a standalone platform. &nbsp;Chabot's implication was, I think, that people want to use Hadoop, but they want it to meld with the data warehouse, BI and transactional database technologies they have been using for some time. &nbsp;Chabot would tell you that Tableau facilitates some of that integration, and he'd be right.</p>
<p>I will just point out that 2013's trend of BI - Big Data convergence shows no sign of slowing. &nbsp;Tableau's ticker symbol of "DATA" doesn't have "BIG" in it, and it doesn't need to, because the market need around data of <em>all</em> types is even bigger.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000015387</guid>
      <link><![CDATA[http://www.zdnet.com/dell-takes-shareplex-to-hadoop-and-beyond-7000015387/]]></link>
      <title><![CDATA[Dell takes SharePlex to Hadoop and beyond]]></title>
      <description><![CDATA[Dell Software's SharePlex replication tool for Oracle now works with Hadoop, or anything else that can talk to a JMS queue.]]></description>
      <pubDate><![CDATA[Wed, 15 May 2013 06:03:04 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Version 8 of <a href="http://www.quest.com">Dell Software</a>'s <a href="http://www.quest.com/shareplex-for-oracle/">SharePlex</a> product, which heretofore has provided replication services between Oracle databases, can now replicate from Oracle to Apache Hadoop, bringing near real-time OLTP data refresh to the big data platform. What's more, Hadoop is only the first of several new data stores to be supported.</p>
<p>Now that <a href="http://www.zdnet.com/dell-acquires-quest-2-4-billion-to-be-a-software-player-7000000120/">Dell has acquired Quest</a>, and made it part of its $1.5 billion software unit, the data acquisition and analysis tools that Quest brings to the table have strategic importance beyond their value as mere IT operations tools. Database replication is important for fault tolerance, branch operations, and more. But Dell has clearly decided that such an engine can and should also be used to move data between platforms, to aid in data integration and analysis.</p>
<p>Dell has effectively added an open output pipeline to SharePlex, implemented using a <a href="http://en.wikipedia.org/wiki/Java_Message_Service">Java Message Service</a> queue. It then created a connector that subscribes to the queue, and pushes the extracted data to Hadoop using <a href="http://sqoop.apache.org/">Sqoop</a>, which leverages Hadoop's MapReduce engine to move data in and out of <a href="http://hadoop.apache.org/docs/r0.18.1/hdfs_design.html">Hadoop's Distributed File System(HDFS)</a>.</p>
<p>As the diagram below shows, developers can also build their own custom apps that subscribe to the JMS queue, and subsequently push the data into other platforms, or they can use existing data integration tools to do likewise:</p>
<figure><img title="SharePlex 8 Flow" alt="SharePlex 8 Flow" src="http://cdn-static.zdnet.com/i/r/story/70/00/015387/shareplex-8-flow-620x291.png?hash=MwZjA2RlAG&upscale=1" height="291" width="620"><figcaption>Dell SharePlex 8 data flow. <br>(Image: Dell)</figcaption></figure>
<p>If you're not up for the custom solution, though, fear not. Dell will be adding its own connectors. According to Dell Software's Darin Bartik (executive director of product management for information management solutions) and John Whittaker (director, marketing), Microsoft's SQL server is on the shortlist here. The logos in the diagram may or may not indicate other databases to be directly supported in the future.</p>
<p>Dell Software has a partnership with <a href="http://www.cloudera.com">Cloudera</a>, so we can expect that SharePlex's Hadoop connectivity will work very well with that company's Hadoop distro. However, Bartik and Whittaker assured me that <a href="http://hortonworks.com/">Hortonworks</a>' and <a href="http://www.mapr.com/">MapR</a>'s distributions are supported, too.</p>
<p>With the release of SharePlex 8, Dell Software is taking big data convergence a step further than we've become used to seeing. Not only are the worlds of big data and BI colliding, but now, relational databases and OLTP are part of the equation as well.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000015078</guid>
      <link><![CDATA[http://www.zdnet.com/nuodb-has-microsoft-sql-server-in-its-sights-7000015078/]]></link>
      <title><![CDATA[NuoDB has Microsoft SQL Server in its sights]]></title>
      <description><![CDATA[With its 1.1 release, the "NewSQL" player strikes at the Microsoft ecosystem with ADO.NET, LINQ, and Entity Framework providers ... and a migration tool to poach databases from SQL Server.]]></description>
      <pubDate><![CDATA[Wed, 08 May 2013 23:34:04 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Most of the young, scale-out database players, be they of the NoSQL or NewSQL (distributed, relational), variety, are very Linux oriented.  Although many of these databases, including <a href="http://www.mongodb.org/">MongoDB</a>, <a href="http://couchdb.apache.org/">CouchDB</a>, and <a href="http://cassandra.apache.org/">Cassandra</a>, run just fine on Windows, they still have a proclivity toward best serving developers on the Linux platform.</p>
<p>One NewSQL database, <a href="http://www.nuodb.com/">NuoDB</a>, also runs on both operating systems.  But the folks at NuoDB looked at their downloads, and discovered that a full 80 percent of them were attributable to the Windows edition of the product.  Rather than bury their heads in the Linux sand regarding this phenomenon, the NuoDB folks took ownership of it instead.</p>
<h3>(.Net) Developers, developers, developers</h3>
<p>The result is version 1.1 of the NuoDB Starlings database.  This version ships with support for Microsoft developer technologies to a degree that is almost unprecedented for a third party.  These creature comforts include an <a href="http://msdn.microsoft.com/en-us/library/h43ks021(v=VS.71).aspx">ADO.NET</a> provider, a <a href="http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx">LINQ (Language Integrated Query)</a> provider, <em>and</em> an <a href="http://msdn.microsoft.com/en-us/data/ef.aspx">Entity Framework</a> provider that supports the code-first and model-first approaches.</p>
<p>NuoDB 1.1 also includes a migration assistant for moving databases off the SQL Server platform and onto NuoDB.  This is a shot across Microsoft's bow, although from a relatively small vessel.  Regardless, the value proposition is interesting.  While SQL Server can be implemented in a clustered configuration, that's not easy to do, and its architecture is mostly geared toward single-node installations.  Additional SQL Server nodes are often made available for fault tolerance purposes, more than for distributed processing.</p>
<h3>How it works</h3>
<p>NuoDB is a scale-out, cloud-oriented database, built to be geographically distributed by default.  Its architecture consists of broker nodes, transaction nodes, and storage management nodes.  Each transaction node is responsible for its own "atoms" &mdash; little pieces of the database, be it data rows or parts of indexes, that communicate asynchronously.  The atoms use multi-version concurrency control (MVCC) to maintain database consistency and <a href="http://en.wikipedia.org/wiki/ACID">ACID</a> guarantees.</p>
<p>The broker nodes handle load balancing between the transaction nodes, and as long as there are two or more broker nodes, NuoDB avoids having any single points of failure.  Storage management nodes keep track of the atoms that make up a database, and their replicas.  </p>
<p>Nodes in a NuoDB cluster act together on a peer-to-peer basis, analogously to birds in a flock, hence the "Starlings" name and the company's logo design.  The whole scheme leads, if the company's claims are accurate, to a very low-latency, geo-distributed database that still uses the relational model, transactional guarantees, and SQL query language that are standard for enterprise developers.</p>
<p>NuoDB nodes can run in the cloud (NuoDB now explicitly claims Windows Azure compatibility) and, interestingly, NuoDB clusters can consist of a mixture of Windows and Linux nodes.</p>
<h3>Should Redmond worry?</h3>
<p>It's not at all clear to me that NuoDB will siphon off hordes of SQL Server databases and customers.  Of course, I'm an SQL Server MVP and a co-author of a book on Microsoft's flagship database, so I have my biases.  But coming from that background, NuoDB seems more tempting to me than MongoDB or Cassandra, for example.</p>
<p>Allowing .Net developers to move to a scale-out database architecture &mdash; and yet keep their existing APIs, database schemas, and query language &mdash; poses a formidable challenge to Redmond.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000015032</guid>
      <link><![CDATA[http://www.zdnet.com/teradata-enters-the-in-memory-fray-intelligently-7000015032/]]></link>
      <title><![CDATA[Teradata enters the in-memory fray, intelligently]]></title>
      <description><![CDATA[Teradata Intelligent Memory combines RAM and disk for high-performance big data without the extreme requirement of exclusive in-memory operation.]]></description>
      <pubDate><![CDATA[Wed, 08 May 2013 20:00:00 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>SAP has made headlines with its 100 percent in-memory database, <a href="http://www.saphana.com/">HANA</a>. In fact, as <a href="http://www.zdnet.com/sap-adds-enterprise-cloud-service-to-hana-portfolio-7000015016/">reported by ZDNet's Rachel King only yesterday</a>, the company announced that it will offer HANA as a managed cloud service. Hana has made a big impression, to be sure.</p>
<p>But HANA requires the entire database to be in memory, and while memory prices have declined dramatically, memory still commands a huge premium over disk, on a terabyte-by-terabyte basis. Plus, a single server can accommodate only so much memory in the first place, so petabyte-scale data work on HANA can require huge hardware expenditures.</p>
<h3>Teradata memorizes</h3>
<p><a href="http://www.teradata.com/">Teradata</a> sees the value of the in-memory database movement, but the company likely can't ask its customers to take the gold-plated approach of using memory exclusively, with disk storage present only as a fail-safe. Further, the company has studied the probabilities, and found that after a certain percentage of your data is stored in memory instead of disk, the returns quickly diminish. In fact, Teradata found that 43 percent of all IO (input/output) hits only 1 percent of all disk cylinders, and that 94 percent of IO hits only 20 percent of all cylinders. Talk about your 80-20 rules!</p>
<p>With that in mind, the company is announcing the addition of "Intelligent Memory" functionality to its venerable database appliances. Intelligent Memory will see to it that the most frequently used (the "hottest")&nbsp;data in a database stays resident in a special extended memory region and infrequently queried ("cold") data stays on disk. &nbsp;The determination of which data is hot and cold is updated dynamically, with Teradata consequently moving certain data in and out of RAM, at opportune times, in terms of processing inactivity on the cluster.</p>
<h3>Combination of approaches</h3>
<p>While this may at first sound like a caching scheme, that's not the case. A cache typically moves&nbsp;<em>recent</em> data into RAM, rather than frequently used data. That's a simpler approach, but it's also valuable. Teradata employs a caching approach as well; what it calls the FSG (File Segment) cache. And Teradata is smart enough to make sure that no data kept in the FSG cache will be moved into Intelligent Memory, and vice versa.</p>
<p>The 14.0 release of Teradata added columnar storage technology (whereby all the values for a column/field are stored contiguously, rather than all the values for a row/record/item being stored together). This allows for high rates of compression, as column values are often close in value, and certainly in order of magnitude. In the upcoming Teradata 14.10 release, Intelligent Memory will recognize columnar storage and maintain it. This means that high rates of compression will be maintained as well, allowing more data to fit into Intelligent Memory. It also means that if only certain columns' data is "hot", and that data is in columnar storage, then only those columns' data will get moved into Intelligent Memory, yielding further efficiencies.</p>
<h3>You get in-memory, and&nbsp;you get in-memory, and&nbsp;<em>you </em>get in-memory</h3>
<p>Interestingly, while the FSG cache is only available on certain Teradata editions, Intelligent Memory is available on all of them. So all customers will benefit, and since new, high-memory cluster nodes (or a cloud service) are not required, even smaller customers can look forward to better performance.</p>
<p>The combination of columnar compression, caching, and in-memory placement of hot data means that Teradata's approach to the in-memory craze is reasoned and has customers' interests in mind. I might prefer to think of the feature as "Reasonable Memory", but I guess that doesn't have quite the same ring to it.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014973</guid>
      <link><![CDATA[http://www.zdnet.com/talend-ships-version-5-3-of-data-integration-platform-7000014973/]]></link>
      <title><![CDATA[Talend ships version 5.3 of data integration platform]]></title>
      <description><![CDATA[Talend's open source data integration platform is available in a new release, sporting native MapReduce, a graphical mapper, and new NoSQL connectors.]]></description>
      <pubDate><![CDATA[Tue, 07 May 2013 19:00:00 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Extract, Transform and Load (ETL) tools have constituted an important Enterprise software category for a long time. Once commanding huge standalone licensing fees, ETL capabilities are now built into various database products, reducing costs for customers of those database platforms.</p>
<h3>Talend's DI platform</h3>
<p>But Los Altos, California-based <a href="http://talend.com/">Talend</a>, a specialist in Data Integration (a classier moniker than ETL, I suppose), offers an open source distribution of its platform. As a result, Talend has a vibrant ecosystem that often results in new community-developed&nbsp;connectors for the platform in addition to the more than 450 connectors included natively in the product.</p>
<p>And the Talend platform includes more than just plain ETL: Master Data Management (MDM), Data Quality, Business Process Integration (BPI), and even an Enterprise Service Bus (ESB) are part of the mix.</p>
<h3>Get your Hadoop on</h3>
<p>Talend version 5.3 now features a graphical mapper for building&nbsp;<a ), thus making an important Hadoop stack component a bit more analyst-friendly.</p>
<p>Talend 5.3 can also generate native Java MapReduce code, which allows data transformations to run right on the Hadoop cluster, avoiding burdensome data movement, and making use of general purpose SQL and import/export tools like <a href="http://hive.apache.org">Hive</a> and <a href="http://sqoop.apache.org">Sqoop</a>&nbsp;unnecessary.</p>
<h3>NoSQL, no peace!</h3>
<p>Talend 5.3 also adds to its NoSQL connectivity capabilities. While the prior release could connect to&nbsp;<a href="http://hbase.apache.org">HBase</a>, <a href="http://cassandra.apache.org/">Cassandra</a>, and <a href="http://www.mongodb.org/">MongoDB</a>, v5.3 adds support for&nbsp;<a href="http://www.couchbase.com/">Couchbase</a>, <a href="http://couchdb.apache.org/">CouchDB</a>, and <a href="http://www.neo4j.org/">Neo4j</a>. This provides coverage for the &nbsp;most popular NoSQL platforms (aside from proprietary offerings like <a href="http://aws.amazon.com/dynamodb/">Amazon Web Services' DynamoDB</a>). It also means Talend has connectivity to databases across&nbsp;all four major NoSQL categories (key-value stores, document stores, wide column stores, and graph databases).</p>
<h3>Will loyalties shift?</h3>
<p>Whether Enterprises will go "cold turkey" from&nbsp;standalone&nbsp;DI tools, like those from <a href="http://www.informatica.com/us/">Informatica</a>, or sophisticated, bundled ETL tools, like Microsoft's <a href="http://msdn.microsoft.com/en-us/sqlserver/cc511477.aspx">SQL Server Integration Services</a>, remains to be seen. But there's a lot to be said for graphical tools over Hadoop, native MapReduce code, connectivity across major NoSQL data stores, and the option to work with open source distributions of a product before standardizing on it.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014968</guid>
      <link><![CDATA[http://www.zdnet.com/apache-lucene-and-solr-4-3-released-7000014968/]]></link>
      <title><![CDATA[Apache Lucene and Solr 4.3 released]]></title>
      <description><![CDATA[A new version of the ubiquitous Lucene/Solr open-source search project is available now.]]></description>
      <pubDate><![CDATA[Tue, 07 May 2013 09:07:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Apache Lucene and its high level services wrapper, Solr, provide extremely powerful full-text search, among other search functionality, and are widely used across the internet.</p>
<p>Think search has nothing to do with big data?  Think again, because Lucene and Hadoop have a special relationship.  To start with, <a href="http://www.linkedin.com/in/cutting">Doug Cutting</a>, now chief architect at Cloudera, is the man behind both projects.  Next, Lucene can work over HDFS; process data stored in HBase; and Mahout can use Lucene indexes.  Lucene and Solr are also included with certain distributions of Hadoop.  In fact, as I reported earlier this month, <a href="http://www.zdnet.com/big-data-releases-mapr-m7-1010data-v6-ship-mapr-gets-lucidworks-search-7000014767/">MapR's Hadoop Distributions include the full LucidWorks suite</a>, which is based on Lucene/Solr (and <a href="http://www.lucidworks.com/">LucidWorks</a>, is the major commercial entity behind Lucene).</p>
<p>Today, Lucene/Solr 4.3.0 was released and made available for immediate download.  The 4.3.0 package includes improvements in numerous areas, including query performance, spatial processing, and the read-side schema API.  4.3.0 also includes numerous enhancements to Lucene's faceted search capabilities, whereby dimensions can be used in search, much as they are used for drill-down analysis in data warehouses and OLAP cubes.</p>
<p>Full details of the release &mdash; and links to download both the Lucene and Solr components of it &mdash; are available at the <a href="http://lucene.apache.org/core/">Lucene project's home page</a>.  And if you're intrigued by all this, check out the <a href="http://lucenerevolution.org/">website for Lucene/Solr Revolution</a>, LucidWorks' first annual conference on Lucene and Solr, which was just held in San Diego. </p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014767</guid>
      <link><![CDATA[http://www.zdnet.com/big-data-releases-mapr-m7-1010data-v6-ship-mapr-gets-lucidworks-search-7000014767/]]></link>
      <title><![CDATA[Big Data releases: MapR M7, 1010data v6 ship; MapR gets LucidWorks search]]></title>
      <description><![CDATA[MapR optimizes HBase and starts distributing Lucene-powered search.  SaaS data discovery provider 1010data ships new version.]]></description>
      <pubDate><![CDATA[Thu, 02 May 2013 00:39:04 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Hot on the heels of <a href="http://www.zdnet.com/cloudera-splunk-10gen-and-softlayerbasho-announce-new-products-7000014706/">yesterday's releases from Cloudera, Splunk 10gen, Basho and SoftLayer</a>, come more releases today from Hadoop distribution provider MapR and cloud data warehouse and data discovery provider 1010data.</p>
<p>MapR's M3 and M5 Hadoop distros are perhaps best known for providing a non-immutable network file system in place of the standard Hadoop Distributed File System (<a href="http://hadoop.apache.org/docs/stable/hdfs_design.html">HDFS</a>) implmentation. &nbsp;MapR Hadoop is also known for its cloud availability, on <a href="http://aws.amazon.com/elasticmapreduce/mapr/">Amazon Elastic MapReduce (EMR)</a> and <a href="http://www.mapr.com/mapr-google">Google Compute Engine</a>. In March, MapR announced that its <a href="http://www.zdnet.com/mapr-canonical-bring-hadoop-to-ubuntu-7000013255/">Hadoop bits would be distributed with Ubuntu Linux</a>.</p>
<p><strong>This one goes to M7</strong><br />Today <a href="http://www.mapr.com/">MapR</a> is announcing the general availability of its new <a href="http://www.mapr.com/products/mapr-editions/m7-edition">M7 distribution</a>, which includes a re-engineered version of the <a href="http://hbase.apache.org">HBase</a> wide column store NoSQL database. &nbsp;Essentially, M7's HBase implementation makes it far more practical for operational workloads, due in part to MapR's underlying read/write file system as well as optimizations to the HBase code itself. &nbsp;(I <a href="http://www.zdnet.com/nyc-data-week-news-wraps-up-7000006429/">covered these enhancements</a>, back in October when MapR announced the Beta release of M7.)</p>
<p>MapR is also announcing a partnership with <a href="http://www.lucidworks.com/">LucidWorks</a>, a company which says it employs one fourth of the core committers to the Apache <a href="http://lucene.apache.org/">Lucene</a>/<a href="http://lucene.apache.org/solr/">Solr</a> project. &nbsp;That project provides popular, powerful open source search capabilities across numerous technology platforms, including HDFS. &nbsp;MapR will include the LucidWorks suite (which includes added functionality over the vanilla Lucene/Solr code) with its Hadoop distribution, making it possible to query Hadoop through plain-language search in addition to <a href="http://hive.apache.org">Hive</a>'s HiveQL and <a href="http://pig.apache.org">Pig</a>'s "Pig Latin" query languages and, of course, MapReduce code written in Java.</p>
<p>LucidWorks' integration with MapR's M3 and M5 distros is available now, as a Beta. &nbsp;The company says that LucidWorks will be integrated with M7 sometime next quarter.</p>
<p><strong>All data, all the time</strong><br /><a href="http://www.1010data.com/">1010data</a> is a cloud-based data warehouse, data visualization and data discovery solution. &nbsp;The product uses columnar database technology and a spreadsheet-like user interface to facilitate self-service analysis by business users over large volumes of data (the company calls it the "<a href="http://www.1010data.com/solutions-and-services/self-service-analytics-for-big-data/trillion-row-spreadsheet/">trillion row spreadsheet</a>").</p>
<p>1010data, which has been in business for 13 years, and whose customers include the New York Stock Exchange, Rite Aid and Dollar General, is today releasing version 6 of its product. &nbsp;The new version includes&nbsp;an enhanced user interface; a new version of 1010data's data-integration tool; improved in-database analytics; new machine learning functions; a "Quick App Builder;" and a new customer administration portal.</p>
<p><strong>Big (Data) news cycle</strong><br />I'm not exactly certain why there have been so many new product, partner and release announcements this week (and Teradata and Tibco just added <a href="http://www.marketwire.com/press-release/tibco-spotfire-announces-expanded-partnership-with-teradata-offer-extreme-data-discovery-nasdaq-tibx-1705158.htm">another one</a>). &nbsp;But certainly, there seems a real urgency to make Big Data technology more capable, more accessible, and more integrated with other tools. &nbsp;And there's nothing wrong with that.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014706</guid>
      <link><![CDATA[http://www.zdnet.com/cloudera-splunk-10gen-and-softlayerbasho-announce-new-products-7000014706/]]></link>
      <title><![CDATA[Cloudera, Splunk, 10gen and SoftLayer/Basho announce new products]]></title>
      <description><![CDATA[Impala 1.0, Splunk App for Enterprise Security v2.4, MongoDB Backup Service and Riak service on Softlayer all announced today.]]></description>
      <pubDate><![CDATA[Tue, 30 Apr 2013 22:29:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>My inbox is hit with so many Big Data product and news announcements every day that I could make a full-time job out of triaging them. &nbsp;Today, though, a set of announcements come that bear certain witness to 2013 as the year of Big Data's Enterprise maturation process.</p>
<p>I've spent most of my career in the Enterprise development and tools space, so I suppose I could be jaded about these announcements. &nbsp;But having been an avid Big Data watcher for more than a year, I must admit I wasn't sure I'd see the day when SQL, Enterprise security, monitoring and seamless provisioning would make up the headline fodder in this space.</p>
<p>And yet, that's exactly what's happening today. &nbsp;Let's take a look at the details...</p>
<p><strong>Cloudera annouces general availability of Impala 1.0</strong><br />Back in October, <a href="http://www.softlayer.com/">Cloudera</a> shocked me a little by telling me they believed MapReduce wasn't the solution for all&nbsp;Big Data problems, and that they were <a href="http://www.zdnet.com/clouderas-impala-brings-hadoop-to-sql-and-bi-7000006413/">building a parallel SQL query engine</a> that could work over data in Hadoop's Distributed File System (HDFS) and bypass Hadoop's MapReduce engine, permitting speedy iterative (non-batch mode) query over Hadoop data. &nbsp;Today that product, Impala, has reached GA. &nbsp;And since it's not only fast, but API-compatible with <a >Apache Hive</a>, lots of existing BI tools can work with it right away. &nbsp;</p>
<p>This fits nicely with Cloudera's announcement yesterday that it has formed an <a href="http://www.marketwire.com/press-release/cloudera-announces-strategic-alliance-with-sas-1783697.htm">alliance with BI powerhouse SAS</a>. &nbsp;That alliance is not just a business arrangement either, as SAS engineers have adopted their technology to deploy physically over Hadoop clusters and perform their analyses in a parallel fashion. &nbsp;This is a&nbsp;<em>huge</em> deal as it avoids data movement between SAS and Hadoop, &nbsp;analyses can be performed over full data sets and not just samplings of the source data.</p>
<p><strong>Splunk releases version 2.4 of its App for Enterprise Security<br /></strong>The combination of <a >Splunk's&nbsp;App for Enterprise Security</a> brings to bear statistical analysis of user-generated machine data&nbsp;to discover&nbsp;unknown threats to digital systems in real time. &nbsp;Rather than merely monitoring for <em>known</em> threat patterns,&nbsp;Splunk's suite uses search, dashboards and visualizations to detect anomalies and outliers that may indicate the presence of yet-unidentified threat patterns instead. &nbsp;Clearly, bringing hardcore statistical analysis to users, rather than just raw Big Data storage and query capability, is where real value gets added.</p>
<p><strong>MongoDB gets a backup service</strong><br /><a href="http://www.10gen.com/">10gen</a>, the company behind venerable NoSQL document store <a href="http://www.mongodb.org/">MongoDB</a>, is today announcing the limited release &nbsp;of MongoDB Backup Service, joining the&nbsp;free, cloud-based MongoDB Monitoring Service (MMS) it launched previously. &nbsp;Given my own 20-year background in relational databases, the lack of a dedicated backup service seemed a bit strange to me for such a cloud-friendly product as Mongo. &nbsp;But the Big Data and NoSQL worlds are no strangers to gaps like that. &nbsp;The more important point is that the gaps are being filled very rapidly this year, and 10gen's announcement is part of that trend.</p>
<p><strong>Basho and SoftLayer provide Riak in the cloud</strong><br />Speaking of NoSQL and cloud friendly, <a href="http://basho.com/">Basho</a>, the company behind NoSQL key-value store <a href="http://basho.com/riak/">Riak</a>, has teamed with cloud provider <a href="http://www.softlayer.com/">SoftLayer</a>, to provide an easily provisioned <a href="http://www.softlayer.com/solutions/big-data/riak">option for running Riak in the cloud</a>. &nbsp;</p>
<p>Customers can deploy either open source Riak or the more robustly-supported Riak Enterprise on the SoftLayer cloud. &nbsp;Basho and SoftLayer have worked together to provide servers specifically tuned to run Riak and, quite interestingly, provide the servers in physical "bare metal" form, rather than as virtual cloud server instances. &nbsp;While that can make the deployment more complex, the companies say the process is nonetheless automated, and that servers are provisioned in under two hours.</p>
<p><strong>They grow up fast</strong><br />Clearly the BI and Big Data worlds are converging now, with Hadoop and NoSQL databases acquiring the&nbsp;accoutrements&nbsp;they need to work in the Enterprise. &nbsp;Before, companies in the space were, arguably, catering to what investors were impressed by. Integrating with SAS, offering bare-metal servers, cloud backup and protecting&nbsp;infrastructure&nbsp;are the things <em>customers</em> care about, and that's the stuff that profitability is made of.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014490</guid>
      <link><![CDATA[http://www.zdnet.com/actian-acquires-paraccel-7000014490/]]></link>
      <title><![CDATA[Actian acquires ParAccel]]></title>
      <description><![CDATA[Fresh off its acquisition of Pervasive Software completed two weeks ago, CA spin-off Actian acquires MPP Data Warehouse player ParAccel, whose technology powers Amazon Redshift.]]></description>
      <pubDate><![CDATA[Thu, 25 Apr 2013 20:00:00 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p><a href="http://www.actian.com/">Actian&nbsp;Corporation</a> (formerly Ingres Corporation), the <a href="http://www.zdnet.com/blog/foremski/ingres-has-giant-ambitions-in-enterprise-it/65">Computer Associates spinoff</a> behind the open source relational database <a href="http://www.actian.com/products/ingres">Ingres</a>, has acquired Massively Parallel Processing (MPP) Data Warehouse vendor <a href="http://www.paraccel.com/">ParAccel</a>.</p>
<p>It was only two weeks ago that Actian <a href="http://www.pervasive.com/News/PressReleaseArchive/EntryId/1174/Actian-Corporation-and-Pervasive-Software-Unite-to-Take-Action-on-Big-Data.aspx">completed its acquisition</a> of predictive analytics/data integration player <a href="http://www.pervasive.com/">Pervasive Software</a>. &nbsp;Add in the MPP Big Data capabilities of ParAccel, and its On Demand Integration- (ODI) based interface with Hadoop, and suddenly the sponsor of a seminal relational database product has become a hot player in Big Data and analytics.</p>
<p>Let's face it, the Big Data space is ripe for consolidation, so acquisitions shouldn't come as a big surprise. &nbsp;But I'm not sure I would have guessed that ParAccel, whose investors include Amazon (and whose technology powers Amazon's "<a href="http://aws.amazon.com/redshift/">Redshift</a>" data warehouse cloud service) would be acquired by the likes of Actian.</p>
<p>Actian, whose product portfolio also includes analytical database Vectorwise and object database Versant, now has a data-focused yet architecture-diversified array of products under one roof.</p>
<p>Perhaps the phenomenon of data-focused companies with multiple-decade track records acquiring newer Big Data companies will be more the rule than the exception. &nbsp;A comprehensive data platform spanning relational, columnar, unstructured and predictive technologies may make the most sense, once a shakeout is said and done.</p>
<p>&nbsp;</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000014176</guid>
      <link><![CDATA[http://www.zdnet.com/high-density-cloud-databases-nuodb-moonshot-7000014176/]]></link>
      <title><![CDATA[High-density cloud databases: NuoDB + Moonshot]]></title>
      <description><![CDATA[NewSQL vendor NuoDB says it can run 7,200 active databases on one HP Moonshot system. Add in temporarily dormant databases, and the density increases significantly.]]></description>
      <pubDate><![CDATA[Thu, 18 Apr 2013 03:28:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>The "NewSQL" category of databases targets the same scale-out capabilities and performance levels of NoSQL databases, while maintaining the relational model and database consistency of mainstream databases like Oracle, SQL Server, and MySQL. One of the products in the NewSQL category is <a href="http://www.nuodb.com/">NuoDB</a>, a company I've <a href="http://www.zdnet.com/nuodb-launches-cloud-database-management-system-7000009838/">written about before</a>.</p>
<p>NuoDB explicitly targets cloud workloads, as the scale-out capabilities NuoDB offers align well with the elasticity characteristics of cloud computing. Under the latter, cloud infrastructure can be dynamically provisioned or de-provisioned, based on the ebb and flow of user demand, and NuoDB fits itself in just such scenarios.</p>
<h3>Cloud via Moon(shot)</h3>
<p>Enter <a href="http://www.zdnet.com/hp-launches-project-moonshot-powered-with-intels-atom-at-first-7000013686/">project Moonshot</a>, HP's new server initiative that focuses on high-density concentration of servers, each with very low-energy consumption. HP announced availability of its Moonshot offering, based on Intel S1260Atom processors, on April 8. With that product, a single 4.3U system contains 45 discrete physical servers.</p>
<p>I spoke with Seth Proctor, NuoDB's chief architect, who explained that NuoDB wanted to see how many databases it could get running on a single Moonshot box, in terms of total databases hosted on the system and the number of databases concurrently active at any given time. </p>
<h3>Bank switching</h3>
<p>This structure of the metrics is based on a multi-tenant hosting scenario, where multiple blogs are hosted on a single Moonshot system, and only a fraction of them encounter page view traffic in a discrete instant. The kicker: NuoDB can shut down the databases that are not in active use.</p>
<p>The result: NuoDB said it can serve 7,200 active databases and <em>72,000</em> total databases, on a single Moonshot system. Even under that load, the company said the server encounters only 70 percent utilization, so the numbers could theoretically go higher. NuoDB said the screenshot below shows a management dashboard readout from a Moonshot test run. The numbers are slightly lower than quoted above, but with server utilization at 67 percent, that seems OK.</p>
<figure><img title="moonshot-summary-UI" alt="moonshot-summary-UI" src="http://cdn-static.zdnet.com/i/r/story/70/00/014176/moonshot-summary-ui-500x313.png?hash=AJHkLGxlZJ&upscale=1" height="313" width="500"><figcaption>(Image: Screenshot by Andrew Brust/ZDNet)</figcaption></figure>
<p>Going beyond the high-density scenario is important, too. If a given blog hosted on the Moonshot system sees a significant spike in traffic, it may make sense to move that database to some bigger iron, with the option of moving it back to the Moonshot box if and when the spike subsides. NuoDB claims it can perform such a migration while the database remains live, online, and hot. The company calls this database "bursting", and if it works as advertised, it would seem a sensible companion to the high-density capability.</p>
<h3>Patent pending</h3>
<p>NuoDB has submitted a patent application for the bursting capability, along with the ability to "hibernate" databases when they're not in use and wake them up on demand. Proctor told me the latency involved in waking databases is only about 200 milliseconds on a Moonshot system, and significantly less on servers based on higher-powered CPUs.</p>
<h3>Quo vadis</h3>
<p>The NewSQL movement is intriguing, as it seeks to forge a consensus between the decades-old client-server relational database stalwarts and the bright, young NoSQL upstarts. It's not clear whether the products in this category will carve out their own long-lasting market share, or if the old-line relational vendors will adopt some NewSQL architectural ideas. Either way, it's fascinating to see innovation in the relational space, which had matured and been rather static for the first decade of this new century. Together, NuoDB and HP seem to be pushing the envelope.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000013897</guid>
      <link><![CDATA[http://www.zdnet.com/with-geoflow-microsoft-doubles-down-on-analytics-in-excel-7000013897/]]></link>
      <title><![CDATA[With GeoFlow, Microsoft doubles-down on analytics in Excel]]></title>
      <description><![CDATA[At the PASS Business Analytics Conference in Chicago, Microsoft announces a public preview of GeoFlow and pumps Excel as the hub for data discovery.]]></description>
      <pubDate><![CDATA[Fri, 12 Apr 2013 01:23:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>While Gartner and IDC are busy discussing Microsoft's losing ground in the consumer market to Apple and Google, maybe they should consider discussing the gains Microsoft is making against its competitors in the business analytics space.</p>
<p>During the opening keynote at the <a >PASS Business Analytics conference</a> today, Microsoft Technical Fellow Amir Netz and Director of Program Management for Business Intelligence,&nbsp;Kamal Hathi, were busy showing off the kind of data discovery work that business users can do inside Excel. &nbsp;</p>
<p><em>Disclosure: I am a speaker at the PASS Business Analytics Conference and was involved in its early planning.</em></p>
<p><strong>8-bit beginnings</strong><br />Netz came to Microsoft from Israeli BI company <a href="http://www.panorama.com/">Panorama</a>,&nbsp;when the former acquired technology from the latter that would become <a href="http://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/analysis.aspx">SQL Server Analysis Services</a>. Netz told the story, complete with photos, of growing up in Israel, being given an Apple II computer by his parents, and becoming fascinated with data and spreadsheets. &nbsp;He started with VisiCalc, and eventually began running his own business doing macro programming for Lotus 1-2-3. Soon enough, Netz moved on to Excel, on the Mac.</p>
<p>Netz and Hathi discussed the simplicity of these tools and contrasted that simplicity with today's analytics landscape of DW, ROLAP, MOLAP, ETL, MDM, Hadoop, Hive, Pig,&nbsp;Sqoop, NoSQL and more. The two men explained in a tongue-and-cheek, if somewhat contrived, manner how they yearned for the data tech simplicity of earlier times.</p>
<p><strong>Data idol</strong><br />And so back to the spreadsheet we went, as Netz and Hathi used&nbsp;Excel 2013, with <a href="http://www.microsoft.com/en-us/bi/powerpivot.aspx">PowerPivot</a>, <a href="http://www.microsoft.com/en-us/bi/Products/PowerView.aspx">Power View</a> and <a href="http://www.microsoft.com/en-us/bi/Products/Office.aspx">Data Explorer</a>,&nbsp;to go through reams of Billboard chart data and expose interesting factoids about popular music. &nbsp;Along the way, we discovered that Rihanna has single-handedly brought Barbados&nbsp;into the top tier of pop music countries, and that Roxette is Sweden's best pop ensemble, as far as Billboard chart showings go. (I bet you thought it was ABBA; I know I did.)&nbsp;</p>
<p>The last phase of the Microsoft duo's demo? An announcement that the company's project "<a href="http://research.microsoft.com/en-us/news/features/geoflow_data_viz-041113.aspx">GeoFlow</a>," a 3D geographical data visualization add-in for Excel, is <a href="http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/11/day-2-pass-business-analytics-conference-new-3d-mapping-analytics-tool-for-excel.aspx">now available as public preview</a>. GeoFlow mashes up Bing Maps and technology from Microsoft Research to render data in time-lapsed, geographic space. Netz demoed the impressive technology and, as a kicker, did so on what looked to be an 80" <a href="http://www.perceptivepixel.com/">Perceptive Pixel</a> touch-screen display, morphing the keynote stage into a quasi news studio.</p>
<p><strong>Fun with data</strong><br />Netz also talked about the importance of making data fun, and showed how visualizations built in Excel can be shared online, on Office 365. In so doing, he alluded to the same emotional, social approach to data discovery that <a href="http://www.tableausoftware.com/">Tableau</a> has built its entire business on, a business it now <a href="http://www.zdnet.com/tableau-files-for-ipo-7000013417/">aims to take public</a>.</p>
<p>The <a href="http://www.microsoft.com/en-us/download/details.aspx?id=29074">PowerPivot</a>, <a href="http://www.microsoft.com/en-us/download/details.aspx?id=36803">Data Explorer</a> and now <a href="http://www.microsoft.com/en-us/download/details.aspx?id=38395">GeoFlow</a> add-ins are available for download. Just be aware that GeoFlow requires the "ProPlus" version of Excel 2013 (via volume license, or an Office 365 subscription that provides access to that version of Excel.) Does this put a damper on Microsoft's "BI for the Masses" story? I'd say so, and I'm hoping some other folks will as well.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000013609</guid>
      <link><![CDATA[http://www.zdnet.com/does-it-matter-if-your-sql-is-bad-7000013609/]]></link>
      <title><![CDATA[Does it matter if your SQL is bad?]]></title>
      <description><![CDATA[IBM Introduces its new BLU architecture, continuing the convergence of relational, columnar, distributed, and variant schema database approaches.]]></description>
      <pubDate><![CDATA[Fri, 05 Apr 2013 23:49:04 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p><em>This guest post comes courtesy of Tony Baer’s OnStrategies blog. Tony is a <a href="http://ovum.com/authors/tony-baer/">principal analyst</a> covering Big Data at <a href="http://www.ovum.com/">Ovum</a>.</em></p>
<figure class="alignRight"><img alt="" src="http://cdn-static.zdnet.com/i/story/63/20/000543/baer-tony.jpeg"><figcaption>Tony Baer. (Image: Seeking Alpha)</figcaption></figure>
<p>The title of this post is a paraphrase of a question raised, by IDC analyst <a href="https://twitter.com/databaseguru">Carl Olofson</a>&nbsp;at an IBM Big Data analyst event earlier this week. Carl's question neatly summarized our impressions from the session, which centered around <a href="http://www-03.ibm.com/press/us/en/pressrelease/40768.wss">some big data announcements</a> that <a href="http://ibm.com/">IBM</a>&nbsp;had made. It concerned some new performance improvements that IBM has made that might render some issues with poorly formed SQL moot. More about that in a moment.</p>
<p>The question was all the more fitting and ironic given the setting — the event was held at IBM's Almaden research facility, which happened to be the same place where Edgar (Ted) Codd invented SQL; IBM will video webcast excerpts <a >on April 30</a>.</p>
<p>Specifically, IBM made a series of announcements; while much of the <a href="http://www.zdnet.com/ibm-spills-more-about-hadoop-strategy-with-new-puredata-system-7000013293/">press</a> <a href="http://www.theregister.co.uk/2013/04/03/ibm_puredata_hadoop_appliance_biginsights/">focused</a> on the announcement of a preview for <a href="http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&amp;subtype=CA&amp;htmlfid=897/ENUS213-186&amp;appname=USN">IBM's PureData for Hadoop appliance</a>, to us the highlight was unveiling of a new architecture, branded <a href="http://www.prnewswire.com/news-releases/ibm-announces-new-innovations-to-help-organizations-benefit-from-the-next-natural-resource-big-data-201263141.html">BLU</a> acceleration. Independent DB2 consultant <a href="http://davebeulke.com/">Dave Beulke</a>, whom we met at the launch, has published <a href="http://davebeulke.com/ibm-blu-acceleration-best-yet-for-big-data/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=ibm-blu-acceleration-best-yet-for-big-data">one of the best post mortems</a> on the significance of the announcement.</p>
<p>BLU is supposed to be lightning fast. BNSF railroad, a BLU beta customer, reported performing a 4-billion row join in 8 milliseconds.</p>
<p>So what does this all mean?</p>
<h3>Databases are assuming multiple personalities</h3>
<p>BLU acceleration consists of a new engine that accelerates database performance. Let’s dissect that seemingly innocuous — and ambiguous — statement. Traditionally, the database and the underlying engine were considered one and the same. But increasingly, databases are evolving into broader data platforms with multiple personalities that are each designed for a specific form of processing or compute problem scenarios. Today's <a >EMC Greenplum</a>), run your SQL analytic queries against the relational engine, which happens to use Hadoop's HDFS as the back end file system. And for Hadoop, new frameworks are emerging alongside MapReduce that are adding interactive, graph, and stream processing faces.</p>
<p>This week's announcements by IBM of the BLU architecture marked yet another milestone in this trend. BLU is an engine that can exist side by side with DB2's traditional row-based data store (it will be supported inside <a href="http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&amp;subtype=CA&amp;htmlfid=897/ENUS213-200&amp;appname=USN">DB2 10.5</a>). So you can run existing apps on the row store while migrating a select few to tap BLU. BLU is also being made available for IBM's <a href="http://www-01.ibm.com/software/info/rte/bdig/dm-4-pre.html?S_CMP=Google-Search-IM-RTE-Meter_Data_Management-ES-0030&amp;cm=k&amp;csr=wwus_imrtemeterdatamgmtbrndsrchwp-20121107&amp;cm=k&amp;cr=google&amp;ct=101MY0TW&amp;S_TACT=101MY0TW&amp;ck=+informix_time_series&amp;cmp=101MY&amp;mkwid=sA06cEzjN_30240801950_432yau15551">Informix TimeSeries 12.1</a>, and in the long run, you're likely to see it going into IBM’s other data platforms (think <a href="http://www-01.ibm.com/software/data/puredata/analytics/">PureData for Analytics</a>, <a href="http://www-01.ibm.com/software/data/puredata/operationalanalytics/">Operational Analytics</a>, and <a href="http://www-01.ibm.com/software/data/puredata/hadoop/">Hadoop</a> models).</p>
<h3>From IBM: More mixing and matching</h3>
<p>In the same spirit, we expect to see IBM (and its rivals) do more mixing and matching in the future. We’re waiting for IBM to release an appliance that combines SQL analytics side by side with an instance of <a >Hadoop</a>, where you could run blended analytic queries (think: analytics from your CRM system alongside social, weblog, and mobile data harvested by Hadoop).</p>
<p>And while we're on the topic of piling on data engines, IBM announced a preview of a <a >MongoDB</a> style); it will become yet another engine to sit under the DB2 umbrella. We don’t expect MongoDB users to suddenly flock to buy DB2 licenses, but it will be a way to for existing DB2 shops to add an engine for developers who would otherwise implement their own Mongo one-off projects. The carrot is that IBM JSON takes advantage of data protection and security services of the DB2 platform that is not available from Mongo.</p>
<h3>Dissecting BLU</h3>
<p>BLU includes a number of features that individually, are not that unique (although there may be debates regarding degree of optimization). But together, they form a well-rounded approach to not only accelerating processing inside a SQL platform, but allowing new types of analytic processing. For instance, think about applying some of the late-binding schema practices from the Hadoop world to SQL (don’t believe for a moment that analytics on Hadoop doesn’t involve structuring data, but you can do it on demand, for the specific problem).</p>
<p>Put another way, in the Hadoop world, the competitive spotlight currently is on convergence with SQL. And now in the SQL world, styles of analytic processing from the <a href="https://en.wikipedia.org/wiki/NoSQL">NoSQL</a> side are bleeding into SQL. Consider it a case of man bites dog.</p>
<p>The laundry list for BLU includes:</p>
<ul>
<li>
<p>Columnar and in-memory processing —&nbsp;most Advanced SQL (or <a >columnar engines</a> are increasingly being incorporated alongside existing row-oriented stores inside relational warhorses. Columnar lends itself well to analytics because it reduces table scanning (you only need to look at specific columns rather than across entire rows) and focuses on aggregate data rather than individual records</p>
</li>
<li>
<p>Data compression —&nbsp;compression and columnar tend to go together because, when you focus on representing aggregates, you can greatly reduce the number of bits for providing the data you need, such as averages, means, or outliers. Almost every column store employs some form of compression with ratios in the double-digit territory common. BLU is differentiated by a feature that IBM calls "actionable": You can read compressed data without de-compressing it first, which significantly boosts performance because you can avoid de-compress/re-compress compute cycles</p>
</li>
<li>
<p>Data skipping —&nbsp;many analytic data stores incorporate algorithms for minimizing data scans, with BLU’s algorithms doing so by ferreting out non-relevant data.</p>
</li>
</ul>
<p>There are more optimizations under the hood. For instance, BLU tiers active columnar data into and out of memory and/or <a href="http://en.wikipedia.org/wiki/Solid-state_drive#Flash-based_SSDs">Flash (solid state disk)</a> drives. And while in memory, BLU optimizes processing so that several columns can be crammed into a single memory register; that may sound quite geeky, but this design pattern is a key ingredient to accelerating throughput.</p>
<p>IBM contends that its in-memory and Flash optimizations are "good enough" to the point that <a href="http://en.wikipedia.org/wiki/In-memory_database">a 100% in-memory</a> <a href="http://www-01.ibm.com/software/info/rte/bdig/dm-5-pre.html?S_CMP=Google-Search-IM-RTE-Pure_Data-AT-0034&amp;S_&amp;csr=wwus_imrtedelivtransdataserbrndsrch-20121107&amp;cm=k&amp;cr=google&amp;ct=101MY0TW&amp;S_TACT=101MY0TW&amp;ck=ibm_pure_data&amp;cmp=101MY&amp;mkwid=sY0ANp6EI_30240815270_432yau15551">PureData</a> appliance to counter SAP HANA is not likely. But for Flash, never say never. In our view, given rapidly declining prices, we wouldn’t be surprised to see IBM at some point come out with an all-Flash unit.</p>
<h3>Again, what does this mean for SQL and the DBA?</h3>
<p>Now, back to our original question: When performance is accelerated to such an extent, does it really matter whether you’ve structured your tables, tuned your database, or formed your SQL statements properly? At first blush, that sounds like a rather academic question, but consider that time spent modeling databases and optimizing queries is time diverted from taking on new problems that could cut into the development backlog. And there is historical precedent; in SQL’s early days, conventional wisdom was that it required so much processing overhead (compared to hierarchical file systems that prevailed at the time) that it would never scale for the enterprise. Well, Moore’s Law brute forced the solution; SQL processing didn't get that much more efficient, but hardware got much more powerful. Will on-demand SQL acceleration do the same for database modeling and SQL querying? Will optimization and automation make DBAs obsolete?</p>
<p>It seemed sacrilegious that, nearing the 40th anniversary of SQL, that such a question was posed at the very place where the technology was born.</p>
<p>But matters aren't quite so black and white; as one set of problems get solved, broader ones emerge. For the DBA, the multiple personalities of data platforms are changing the nature of problem-solving: instead of writing the best SQL statement, focus on defining and directing the right query, to the right data, on the right engine, at the right time.</p>
<p>For instance, a hot new mobile device is released to the market with huge fanfare, sales initially spike before unexpectedly dropping through the floor. Such a query might fuse SQL (from the CRM analytic system) with sentiment analysis (to see what customers and prospects were saying), graph analysis (to understand who is friends with, and influences, whom), and time series (to see how sentiment changed over time). The query may run across SQL, Hadoop, and possibly another specialized data store.</p>
<p>Admittedly, there will be a significant role for automation to optimize such queries, but the trend points to a bigger reality for DBAs where they don’t worry as much about SQL schema or syntax per se, but focus more on optimizing (with the system’s help) data and queries in more global terms.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000013417</guid>
      <link><![CDATA[http://www.zdnet.com/tableau-files-for-ipo-7000013417/]]></link>
      <title><![CDATA[Tableau files for IPO]]></title>
      <description><![CDATA[Data exploration and visualization darling Tableau has filed for initial public offering]]></description>
      <pubDate><![CDATA[Wed, 03 Apr 2013 04:50:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>As many in the big data/BI/analytics community had predicted it would, Tableau Software has decided to file for an initial public offering.</p>
<p>Goldman, Sachs &amp; Co and Morgan Stanley &amp; Co LLC are acting as lead joint book-running managers for the offering. Credit Suisse Securities (USA) LLC and JP Morgan Securities LLC are acting as book-running managers. UBS Securities LLC and BMO Capital Markets Corp are acting as co-lead managers, and JMP Securities LLC is acting as co-manager.</p>
<p>Tableau has connectivity and partnerships with some of the biggest names in BI and Big Data. Customers love the product, and it's been on an upward trajectory for quite some time. An IPO now is certainly consistent with that track record and momentum.</p>
<p>It will be interesting to watch the company move beyond a "land and expand" sales approach to a perhaps more conventional Enterprise sales play. It also remains to be seen if Tableau will need to broaden its portfolio beyond its core namesake product.</p>
<p>Either way, this is a big deal.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000013255</guid>
      <link><![CDATA[http://www.zdnet.com/mapr-canonical-bring-hadoop-to-ubuntu-7000013255/]]></link>
      <title><![CDATA[MapR, Canonical bring Hadoop to Ubuntu]]></title>
      <description><![CDATA[Linux and Hadoop vendors make Ubuntu-integrated MapR M3 available for download.]]></description>
      <pubDate><![CDATA[Thu, 28 Mar 2013 22:08:04 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p><a href="http://www.mapr.com">MapR</a>, a "big 3" Hadoop provider, is partnering with <a href="http://www.canonical.com">Canonical</a> to make Hadoop an even bigger phenomenon than it is already. Specifically, the companies are making an Ubuntu-integrated MapR M3 distribution available for download.</p>
<p>MapR M3 will be bundled with Ubuntu 12.04 LTS and 12.10 via the Ubuntu Partner Archive, and MapR's M5 distribution will be certified for Ubuntu, too.</p>
<p>The two companies are also working together to develop a "Juju charm" to facilitate deployment of MapR into OpenStack cloud environments. The charm should be available by April 25.</p>
<p>As a kicker, MapR is making the source code for the component packages of its Hadoop distro publicly available on GitHub.</p>
<h3>Pervasive Hadoop<strong><br /></strong></h3>
<p>If Hadoop is the de facto operating system for big data, then it certainly makes sense that it be part of a popular, general computing operating system, and deployable to open-source cloud computing infrastructure. And given MapR's integration in, and partnerships with, Amazon Web Services Elastic MapReduce, Google Compute Engine, and now Ubuntu Linux, the San Jose, California, company is taking a lead in embedded Hadoop trend.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000013121</guid>
      <link><![CDATA[http://www.zdnet.com/platfora-goes-ga-asking-bi-and-big-data-to-say-i-do-7000013121/]]></link>
      <title><![CDATA[Platfora goes GA, asking BI and Big Data to say "I do."]]></title>
      <description><![CDATA[Platfora releases its BI for Big Data product to general availability, eschewing Hive and hand-coded MapReduce, while embracing in-memory queries.  ]]></description>
      <pubDate><![CDATA[Tue, 26 Mar 2013 20:00:00 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <category domain="http://www.zdnet.com/topic-business-intelligence/">Business Intelligence</category>
      <media:text type="html"><![CDATA[<figure class="alignRight"><img title="Platfora Screenshot" alt="Platfora Screenshot" src="http://cdn-static.zdnet.com/i/r/story/70/00/013121/platfora-screenshot-200x102.jpg?hash=BQWzAwN0A2&upscale=1" height="102" width="200"><figcaption>A Platfora bubble chart visualization</figcaption></figure>
<p>When I started this blog a bit over a year ago, I felt convinced that the&nbsp;separation&nbsp;between Big Data and Business Intelligence was arbitrary and contrived. It would seem that San Mateo, CA-based <a href="http://www.platfora.com/">Platfora</a>&nbsp;agrees, as it is today releasing for general availability its namesake product that seeks to bridge the BI-Big Data divide.</p>
<p><strong>To your last dying day</strong><br>The Big Data world, which has orbited around Hadoop, and the BI world, which has orbited around OLAP cubes, charts and dashboards, have mutually maintained their segregated status quo. &nbsp;Big Data and BI are the Jets and the Sharks of the technology industry: they live in the same neighborhood and have much in&nbsp;common, but thrive on obsessing over their differences.</p>
<p><a href="http://hive.apache.org">Apache Hive</a> has attempted to broker a peace between the warring clans, but has achieved dtente at best. &nbsp;Yes, most BI tools now connect to Hadoop via Hive, but using that technology, Hadoop keeps working in its batch mode ways, to the chagrin of BI tools that want quick results, so they can fire off subsequent queries.</p>
<p><strong>Special sauce</strong><br>It would seem the folks at Platfora agree that a positive peace is possible, and that the mere truce brought about by Hive is at best a stopgap solution, and one that has outlived its usefulness. Platfora's namesake product is a business-user-friendly BI tool that combines an HTML 5-based user interface, a data visualization engine, and an in-memory&nbsp;analytical&nbsp;database —&nbsp;supporting what Platfora calls its "Fractal Cache" technology —&nbsp;that maintains in-memory "lenses."</p>
<p>Platfora is certified with the big three Hadoop distros: <a href="http://www.cloudera.com">Cloudera</a>'s Distribution Including Apache Hadoop (CDH), the <a href="http://www.hortonworks.com">Hortonworks</a> Data Platform (HDP), <a href="http://www.mapr.com">MapR</a>, and even newcomer EMC/Greenplum <a href="http://www.greenplum.com/products/pivotal-hd">Pivotal HD</a>. &nbsp;My guess is that Platfora won't take long to obtain similar certification for <a href="http://www.intel.com/content/www/us/en/big-data/big-data-intel-distribution-for-apache-hadoop.html">Intel's Distribution for Apache Hadoop</a>&nbsp;as well, although no such certification has been announced.</p>
<p><strong>Same thing, only different</strong><br>I met with Platfora's Founder and CEO, <a href="http://www.linkedin.com/in/bwerther">Ben&nbsp;Werther</a>, at last week's <a href="http://event.gigaom.com/structuredata/">GigaOM Structure:Data</a> event at Chelsea Piers in New York City. Given my own opinion on the futility of separating Big Data and BI, I was very&nbsp;interested&nbsp;in what Mr. Werther had to say. But I was also skeptical. After all, <a href="https://ccp.cloudera.com/display/IMPALA10BETADOC/Introducing+Cloudera+Impala">Cloudera's Impala</a>, <a href="http://www.hadapt.com">Hadapt</a>, and numerous Massively Parallel Processing (MPP) data warehouse products, all attempt to provide interactive (rather than batch) query capabilities over Hadoop, to enable BI-style analysis of Big Data. What makes Platfora different?</p>
<p>I'll&nbsp;withhold&nbsp;judgement until I see the product demoed in-depth, but for now I can at least convey what Werther told me: unlike various data warehouse solutions which must be supported by careful schema design and sometimes elaborate Extract, Transform and Load (ETL) processes, Platfora supports the same sort of just-in-time schema determination as Hadoop itself. Platfora also generates its own, optimized MapReduce jobs, providing a nice middle-ground between the tedium of hand-coded MapReduce and Hive's somewhat one-size-fits-all approach of generating MapReduce code by translating SQL queries.</p>
<p><strong>Hadoop for the masses</strong><br>Effectively, Platfora is taking the concept of Pervasive BI and bringing it forward to the Big Data world. But the vision of Pervasive BI was never quite realized, so can Platfora make Pervasive Big Data a reality?</p>
<p>I think it's a steep climb, but I'm glad someone's trying. And with Werther having served as a Senior Product Manager at both Siebel and Microsoft, Director of Product Management at Greenplum and VP of Products at <a >DataStax</a>,&nbsp;Platfora's certainly coming at it from the right vantage point.</p>
<p>BI, SQL, and Hadoop are converging, with momentum that appears unstoppable. The only questions are how long will the convergence&nbsp;take, and whose solution(s) will win out.</p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000012766</guid>
      <link><![CDATA[http://www.zdnet.com/tibco-spotfire-5-5-adds-enterprise-predictive-analytics-and-more-remote-query-capabilities-7000012766/]]></link>
      <title><![CDATA[Tibco Spotfire 5.5 adds enterprise predictive analytics and more remote query capabilities]]></title>
      <description><![CDATA[Tibco's data exploration tool also adds new visualizations, a free developer edition of its R runtime, nipping at Tableau and QlikTech's heels.]]></description>
      <pubDate><![CDATA[Tue, 19 Mar 2013 00:52:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>The data visualization and exploration space has three standalone product front-runners. <a href="http://www.tableausoftware.com">Tableau</a> is one, <a href="http://www.qlikview.com/">QlikTech's QlikView</a> is another. A new version (5.5) of the third product, <a href="http://spotfire.tibco.com/">Tibco's Spotfire</a>, was announced today</p>
<p>The three products have a lot in common; they all produce excellent visualizations with a lowered authoring barrier-to-entry for business users, they also each have their own analytical engine, using some flavor of in-memory, column store technology.</p>
<h3>The race tightens</h3>
<p>One thing that has been a distinction for Tableau is its ability to execute queries on the data source's engine rather than its own. For queries on large data sets, this is especially important.</p>
<p>Tibco began to catch up with its 5.0 release, adding remote query capabilities for Microsoft's SQL Server and SQL Server Analysis Services, as well as Teradata and Oracle. Tibco 5.5 effectively closes that gap, as it now supports remote query to more than 30 data sources.</p>
<h3>Predictive analytics, new visualizations</h3>
<p>Spotfire 5.5 also integrates with <a href="http://www.asterdata.com/">Teradata Aster</a>, which, through remote querying capabilities and Aster's SQL-MapReduce technology, permits in-database predictive analytics to be performed before data is visualized.</p>
<p>Spotfire 5.5 also supports rule-based visualizations, contextual highlighting, "visual joins" for heterogeneous data mashups, and map visualizations for North America, South America, Europe, the Middle East, and Asia. Tibco told me that map support for Africa and Australia will be added in the future.</p>
<h3>It's the enterprise, stupid</h3>
<p>It's important to keep in mind that, unlike BI pure-plays Tableau and QlikTech, Spotfire is on offer from Tibco, a major force in enterprise software. As such, Spotfire connects with Tibco's DataSynapse GridServer and BusinessEvents products.</p>
<p>This buttresses the predictive analytics capabilities to cover operational, business-to-business, and enterprise events and transactions. And to optimize such scenarios, Tibco now offers a free developer edition of the <a href="http://blog.revolutionanalytics.com/2012/10/vendor-news-tibcos-proprietary-r-runtime-teradatas-appliance-integrates-r.html">Tibco Enterprise Runtime for R (TERR)</a>, which provides for massively parallel statistics processing on Tibco GridServer.</p>
<p>As Big Data sees more enterprise adoption, the importance of enterprise data sources gains significantly more importance. And as Tableau and QlikView gain in popularity, the continued viability of a third horse gains in importance as well.</p>
<p><strong>Related stories</strong></p>
<ul>
<li>
<p><a href="http://www.zdnet.com/tableau-8-unveiled-can-it-keep-the-good-times-rolling-7000007001/?s_cid=e019">Tableau 8 unveiled: Can it keep the good times rolling?</a></p>
</li>
<li>
<p><a href="http://www.zdnet.com/gartner-ibm-teradata-make-big-data-announcements-7000005955/">Gartner, IBM, Teradata make Big Data announcements</a></p>
</li>
</ul>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000012339</guid>
      <link><![CDATA[http://www.zdnet.com/sports-analytics-how-moneyball-meets-big-data-gallery-7000012339/]]></link>
      <title><![CDATA[Sports analytics: How 'Moneyball' meets big data (gallery)]]></title>
      <description><![CDATA[Bill James and Billy Beane have led the way for sports teams to make strategic decisions based on analyzing data rather than watching the actual games or players.]]></description>
      <pubDate><![CDATA[Thu, 14 Mar 2013 19:20:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andy Smith]]></media:credit>
      <s:doctype><![CDATA[Gallery]]></s:doctype>
      <category domain="http://www.zdnet.com/topic-data-management/">Data Management</category>
      <media:text type="html"><![CDATA[<p>More and more, sports teams around the globe are turning to big data to better evaluate players and plan new strategies to keep them ahead of the competiton. The <a href="http://www.sloansportsconference.com/">2013 MIT Sloan Sports Analytics Conference</a> held recently at the Massachusetts Institute of Technology in Cambridge, Mass. gave advocates the opportunity to share their findings.</p>
<p>Analytics is fairly new to sports but is nothing new to business. With today's technology, vast amounts of data is analyzed by increasingly more powerful computers to predict success rates for game strategies, a player's potential for success, betting, or marketing a team. It's all there right in front of you. The field, made popular in sports by statistician Bill James and Oakland A's general manager Billy Bean, the focus of the book and movie <em>Moneyball</em>, is based on crunching numbers and data over watching an athlete's or team's actual performance. Both men have been featured guests at this conference.<br /><br />The goal is to make a team better while using fewer resources. It helps a team pick important role players at a lower cost while avoiding the ones who demand higher salaries but may provide a low return on a team's investment. Even small market teams can be competitive &mdash; case in point the Oakland A's.</p>
<p>We'll check out strategies put forward by sports management and researchers that are based on cold numbers, not hunches or out-of-date game plans.<br /><br />A paper presented at this conference tackles when and where and when an NFL coach should send out his field goal unit. Here's <a href="http://web.mit.edu/newsoffice/2013/how-numbers-can-reveals-hidden-truths-about-sports-0301.html">the analytical analysis</a> presented by three students from MIT's Aueronautics and Astonics Department &mdash; Torin Clark, a PhD candidate, and two graduate students, Aaron Johnson and Alexander J. Stimpson. Their study is one of one of eight finalists in the research-paper competition at this year&rsquo;s Sloan conference.<br /><br />Based on their examination of 11,896 NFL field goal attempts, they've determined that environmental factors are much more important than psychological factors in the success of a field goal attempt. Calling a timeout to ice a kicker has little value while factoring weather conditions such as wind velocity or temperature are much more critical. <br /><br /><br /><em>Photo: ZDNet.com</em></p><p>Brian Burke who authors a blog, Advanced NFL Stats, produced detailed analysis, "<a href="http://www.sloansportsconference.com/?p=8897">Fourth Downs in the New Overtime: First Possession</a>." In an NFL overtime game, the team that possesses the ball first can only win the game with a touchdown or safety. After that possession, any score will win the game.</p>
<p>The numbers show the team which has the ball first in overtime has a better chance of winning if, on fourth down deep in its own territory, it tries for a first down instead of punting. The deeper the team is in its own territory, the chances are better that that the team will make a first down rather than give the ball to the opposing team which should have great field position and is likely to score. Over the long run, this strategy may win more games on average but an NFL coach is looking at losing his job if it doesn't work just once.<br /><br />Another finding is that long field goals should not be attempted on a team's first possession in overtime. Their projected success rate is overshadowed by the loss of field position if it is missed and even if it is good, the opposition still has a chance to win.<br /><br /><em>Photo: CBSNews.com</em></p><p>Why do the best football (soccer) players appear to have better field vision than others? To answer that question, Geir Jorde of the Norweigan Sports Institute, Jonathan Bloomfield of Hull University, and Johan Heijmerikx of the University of Groningen, Netherlands examined 1,279 close-up videos of more than 118 midfielders and forwards in the Barclay's Premier League.</p>
<p>The study focused on how the players use head and body movements to help them see the field better and make better split-second decisions. The most important finding was that players, especially midfielders, who better explore the field will make more accurate passes &mdash; something that managers, scouts, and fans usually overlook. <a href="http://www.sloansportsconference.com/wp-content/uploads/2013/02/The-hidden-foundation-of-field-vision-in-English-Premier-LeagueEPL-soccer-players.pdf">Here's the full study</a>.</p>
<p><em>Photo: Wikipedia</em></p><p>In his new book, <a href="http://www.sloansportsconference.com/?p=10882">Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers</a>, Benjamin Alamar says that sports analytics is in its infancy and teams can gain a significant advantage by using it. He cites a recent survey which showed that while 37 percent of teams have easy access to one another's data, 37 percent of them do not employ a database programmer.<br /><br />Alamar says that sports and business need analytic tools and computer power to sift through massive amounts of data to produce reports that can be used to develop a competitive edge. He refers to how small market teams such as the Oakland A's have used sports analytics to successfully compete with the larger, better financed organizations.<br /><br /><em>Photo: Wikipedia</em></p><p>Chad Millman, Editor in Chief of ESPN The Magazine, tells how sports analyst Mike Wahl has examined <a href="http://www.sloansportsconference.com/?p=8818">sports betting</a> from many angles and found an almost sure winner in college football. Typical of many current sports gurus, Wahl earned an MBA and then worked in business as a financial analyst before switching to sports.<br /><br />To find the right winning bet, Wahl searched through six years of college football games (376) when a team was favored by 20-25 points. To bet on an outright game winner is supposed to be evened out by having the wager on the favorite cost more than the one on the underdog. He found that if you bet the same amount on the favorite as an outright winner, you'd have won overall for six years in a row and your return would have been 12.24 percent. But I'm sure this loophole will be closed very soon.</p><p>Billy Beane changed the face of baseball by using new statistics, such as on-base percentage and slugging average, as a better indication of a player's vaue than traditional baseball measurements &mdash; batting average, stolen bases, and RBI. An even newer analysis of a baseball player's value is called WAR (wins above replacement) which attempts to indicate how much a player contributes to his team. One major leaguer who stands out is Atlanta Braves outfielder Jason Heyward. His three-year WAR rating is the sixth highest for outfielders under 22 since 1961.</p>
<p>For more read:&nbsp; <a href="http://espn.go.com/blog/statsinfo">ESPN Stats and Info</a>. There are a lot of interesting blogs on the site. The Atlanta Braves analysis is third on the list.</p>
<p><em>Photo: Wikipedia</em></p><p>The English Rugby Union's frequent champion Leicester Tigers are using <a href="http://www.zdnet.com/blog/btl/can-analytics-cut-rugby-injuries-ibm-thinks-so/75511">IBM's predictive analytics software</a> to assess injury risks and then deliver training programs for players at risk. The Tigers are hoping analytics can keep players on the field longer.<br /><br />IBM has developed software which is designed to measure fatigue levels and game intensity. The Tigers will also crunch physical and biological data from its 45 players. In addition, the Tigers plan to use big data to measure psychological factors such as stress levels, social issues and environmental stress.<br /><br />IBM's software will also be used to gauge the performance for its under-19 academy feeder teams and choose players accordingly.</p>
<p>Caption: Larry Dignan</p>
<p><em>Photo: Wikipedia</em></p><p>Kevin Mongeon is the principal owner at The Sports Analytics Institute and shows how sports analytics can impact on winning and losing in his blog, "<a href="http://www.sloansportsconference.com/?p=9862">More Hockey Data</a>"? Unlike baseball where specific actions show measureable results, hockey is played in a continous flow making the game more difficult to put into an analysis on paper.<br /><br />Mongeon says additional data is needed to discover a statistical path to a winning season. He needs statistical models that can examine a player's abilities even under different scenerios.</p>
<p><em>Photo: Wikipedia</em></p>
<p>&nbsp;</p><p>An NBA study by Jenna Wiens, Guha Balakrishnan, Joel Brooks, and John Guttag from MIT examines the <a href="http://www.sloansportsconference.com/wp-content/uploads/2013/To%20Crash%20or%20Not%20To%20Crash%20A%20quantitative%20look%20at%20the%20relationship%20between%20offensive%20rebounding%20and%20transition%20defense%20in%20the%20NBA.pdf">offensive/defensive strategy about whether it's better to crash the boards</a> for an offensive rebound or lay back and play defense.</p>
<p>This detailed analysis developed the Crash Index and Retreat Index to determine which philosophy gives a team the opportunity to score more points. The study found that when a team made a big effort for the offensive rebound, it gained more than passivly staying back on defense. The study does note that it does not take player personel into account.</p>
<p><em>Credit: Wikipedia</em></p><p>John Parolin, Statistics Analyst, ESPN Stats and Analysis is part of a team that recorded every single play in the NFL 2012 regular season and playoffs. <a href="http://www.sloansportsconference.com/?p=10404">For this year's Super Bowl</a>, the easy finding was that major mid-season changes, offensive coordinator for the Ravens and quarterback for the 49ers, led both teams through the playoffs. The Ravens rush/pass ratio turned from 40 percent to 49 percent after the change, while the 49ers new quarterback, Colin Kaepernick, experienced great success with the zone-read option where he determined the play based on the actions of an unblocked linebacker.<br /><br />ESPN found that the one team, Atlanta Falcons, had overplayed Kapernick's running ability in the zone-read option, and held him to just 21-yards rushing &mdash; and almost led them to an upset of the 49ers. The Ravens successful defense of the zone-read option, in the first half anyway, was one of the keys to their victory.</p>
<p><em>Photo: Wikipedia</em></p><p>Damien Demaj, Geospatial Product Engineer at ESRI analyzed the Olympic Gold Medal tennis match between Roger Federer and Andy Murray. He studied "the spatial variation of serve patterns" in his project, "<a href="http://www.sloansportsconference.com/?p=10971">Using Spatial Analytics to Study Spatio-temporal Patterns in Sport</a>."<br /><br />Demaj's analysis focused on the placement and bounces of each serve in the match. He found that the location where the server was standing, the service patterns, and the importance of that particular point in the match were keys to understanding the game. For example, in the ad court Federer's spacial service cluster went left most of the time with a wide spread while in the deuce court he was more accurate. Murray's clusters were more focused and favored the right side of the court. Murray won the match: 6-2, 6-1, 6-4.<br /><br />Talk about detailed analysis, here's one of the tools he designed, "The sequence of bounces then allowed us to create Euclidean lines between p1 (x1,y1) and p2 (x2,y2), p2 (x2,y2) and p3 (x3,y3), p3 (x3,y3) and p4 (x4,y4) etc in each court location." &nbsp;<br /><br /><em>Credit: Wikipedia</em></p><p>Ed Feng, founder of ThePowerRank.com, tells <a href="/story/edit/7000012339/%20http:/www.sloansportsconference.com/?p=9329">how to make it in the sports analytics world</a>. Any number of people can sift through data on their own computers but few are able to go to the next step and find a full-time job in sports analytics. <br /><br />While expertise in numbers crunching and being adept with the use of social media will help, Feng says that there's no substitute for "real human interaction." That means being able to shake someone's hand or looking at them straight in the eye, according to Feng. He suggests attending conferences such as the Sloan Sports Analytics Conference where you can show off your stuff.<br /><br /><em>Credit: iStockphoto.com</em></p>]]></media:text>
    </item>
    <item>
      <guid isPermaLink="false">7000012571</guid>
      <link><![CDATA[http://www.zdnet.com/health-sciences-m-and-a-ims-acquires-appature-7000012571/]]></link>
      <title><![CDATA[Health sciences M&A: IMS acquires Appature]]></title>
      <description><![CDATA[Merger combines long-standing curated Big Data resources with recent-vintage cloud-based Big Data analytics platform to help life sciences companies market in a new era.]]></description>
      <pubDate><![CDATA[Thu, 14 Mar 2013 04:51:05 +0000]]></pubDate>
      <media:credit role="author"><![CDATA[Andrew Brust]]></media:credit>
      <s:doctype><![CDATA[Text]]></s:doctype>
      <media:text type="html"><![CDATA[<p>Yesterday, Plymouth Meeting, Pennsylvania-based <a href="http://www.imshealth.com">IMS Health</a> announced its acquisition of Seattle-based analytics startup <a href="http://www.appature.com/">Appature</a>. &nbsp;Both companies are in the health sciences data space, but come to that arena with rather different, and complimentary, approaches.</p>
<p><strong>Background considerations</strong><br />Analytics functionality is most useful when it's integrated into vertical market and line-of-business applications. &nbsp;While technologies like Hadoop and R are terrific on their own, and while their integration with analysis and data discovery clients, from Excel to Tableau, is even better, embedding analytics functionality in operational applications is what makes it most effective.</p>
<p>In certain industries where use of data is, or is becoming, culturally&nbsp;ingrained,&nbsp;like health sciences, the integration and embedding of analytics technology into operational applications is gaining traction. &nbsp;And the resulting gravitational pull is what's behind the merger of IMS and Appature.</p>
<p><strong>Separate corners</strong><br />IMS has long been in the business of procuring and aggregating depersonalized&nbsp;healthcare&nbsp;data, including prescription transactions, patient profiles, provider profiles, and insurance claims, and making that data available for analysis.</p>
<p><span >&nbsp;management for pharmaceutical companies, along with the data acquisition, cleansing and analytics necessary to monitor those campaigns in real-time and optimize the design of subsequent campaigns.</span></p>
<p><strong>Pharma marketing's active ingredient: data</strong><br />The folks I spoke with at Appature tell me that IMS manages over 17 <em>petabytes</em> of data which, they say, is more data than the IRS has. &nbsp;Whether or not the last bit's true, 17PB is certainly a lot of data, and the company has been tracking this data for a long time. &nbsp;</p>
<p>That data is more important now though. &nbsp;Pharmaceuticals marketing traditionally has been carried out by individual reps in the field, working one-on-one with doctors, in their offices. &nbsp;But managed care has made doctors busier, leaving less time for such meetings, and new regulations limit reps' interactions with doctors in any case. &nbsp;So direct marketing, optimized by analytics, has become&nbsp;indispensable&nbsp;and Pharma tech marketing dollars are shifting from CRM and ERP to analytics.</p>
<p><span ><strong>2+2</strong><br />With all that in mind, why not take Big Data that's been tracked for quite some time and leverage it from Appature's more recent-vintage SaaS Big Data analytics platform? &nbsp;That makes a lot of sense and underscores why Big Data is more than fad; in reality, it's a new brand for something that's not at all brand new. &nbsp;This merger seems based on that very fact.</span></p>
<p>&nbsp;</p>]]></media:text>
    </item>
  </channel>
</rss>
