With big data, the DNC turns politics into political science

With big data, the DNC turns politics into political science

Summary: Learn how the Democratic National Committee leveraged big data analytics to better understand and predict voter behavior and alliances in the 2012 U.S. national elections.

SHARE:

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series focuses on the big-data problem in the realm of politics. We'll learn how the Democratic National Committee (DNC) leveraged big data analytics to better understand and predict voter behavior and alliances in the 2012 U.S. national elections.

big data DNC turns politics into political science

To learn more about how the DNC pulled vast amounts of data together to predict and understand voter preferences and positions on the issues, join Chris Wegrzyn, Director of Data Architecture at the DNC, based in Washington, DC.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.] 

Here are some excerpts:

Gardner: Like a lot of organizations, you had different silos of data and information, and you weren't able to do the analysis properly because of the distributed nature of the data and information. What did you do that allowed you to bring all that data together, and then also get the data assembled to bring out better analysis?

Wegrzyn: In 2008, we received a lot of recognition at that time for being a data-driven campaign and making some great leaps in how we improved efficiency by understanding our organization.

Wegrzyn

Coming out of that, those of us on the inside were saying this was great, but we have only really skimmed the surface of what we can do. We focused on some sets of data, but they're not connected to what people were doing on our website, what people were doing on social media, or what our donors were doing. There were all of these different things, and we weren’t looking at them.

Really, we couldn’t look at them. We didn't have the staff structure, but we also didn't have the technology platform. It’s hard to integrate data and do it in a way that is going to give people reasonable performance. That wasn't available to us in 2008.

So, fast forward to where we were preparing for 2012. We knew that we wanted to be able to look across the organization, rather than at individual isolated things, because we knew that we could be smarter. It's pretty obvious to anybody. It isn’t a competitive secret that, if somebody donates to the campaign, they're probably a good supporter. But unless you have those things brought together, you're not necessarily pushing that information out to people, so that they can understand.

We were looking for a way that we could bring data together quickly and put it directly into the hands of our analysts, and HP Vertica was exactly that kind of solution for us. The speed and the scalability meant that we didn't have to worry about making sure that everything was properly transformed and didn't have to spend all of this time structuring data for performance. We could bring it together and then let our analysts figure it out using SQL, which is very powerful, but pretty simple to learn.

Better analytic platform

Gardner: Until the fairly recent past, it wasn't practical, both from a cost and technology perspective, to try to get at all the data. But it has gotten to that point now. So when you are looking at all of the different data that you can bring to bear on a national election, in a big country of hundreds of millions of people, what were some of the issues you faced?

Wegrzyn: We hadn’t done it before. We had to figure it out as we were going along. The most important realization that we made was that it wasn't going to be a huge technology effort that was going to make this happen. It was going to be about analysts. That’s a really generic term. Maybe it's data scientists or something, but it's about people who were going to understand the political challenges, understand something about the data, and go in and find answers.

We structured our organization around being analyst-centric. We needed to build those tools and platforms, so that they could start working immediately and not wait on us on the technology side to build the best system. It wasn’t about building the best system, but it was about getting something where we could prototype rapidly.

Nothing that we did was worth doing if we couldn't get something into somebody's hands in a week and then start refining it. But we had to be able to move very, very quickly, because we were just under a constant time-crunch.

Gardner: I would imagine that in the final two months and weeks of an election, things are happening very rapidly. To have a better sense of what the true situation on the ground is gives you an opportunity to best react to it.

It seems that in the past, it was a gut instinct. People were very talented and were paid very good money to be able to try to distill this insight from a perspective of knowledge and experience. What changed when you were able to bring the HP Vertica platform, big data, and real-time analysis to the function of an election?

Wegrzyn: Just about everything. There isn't a part of the campaign that was untouched by us, and in a lot of those places where gut ruled, we were able to bring in some numbers. This came down from the top campaign manager, Jim Messina. Out of the gate, he was saying that we have to put analytics in every part of the organization and we want to measure everything. That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

But the campaign was driven. We tested emails relentlessly. A lot of our program was driven by trying to figure out what works and then quantify that and go out and do more. One of our big successes is the most traditional of the areas of campaigns nowadays, media buying.

More valuable

There have been a bunch of articles that have come up recently talking about what the campaign did. So I'm not giving anything away. We were able to take what we understood about the electorate and who we wanted to communicate with. Rather than taking the traditional TV buying approach, which was we're going to buy this broad demographic band, buy a lot of TV news, and we are going to buy a lot of the stuff that's expensive and has high ratings amongst the big demographics. That’s a lot of wasted money.

We were able to know more precisely who the people are that we want to target, which was the biggest insight. Then, we were able to take that and figure out -- not the super creepy "we know exactly what you are watching" level -- but at an aggregate level, what the people we want to target are watching. So we could buy that, rather than buying the traditional stuff. That's like an arbitrage opportunity. It’s cheaper for us, but it's way more valuable.

So we were able to buy the right stuff, because we had this insight into what our electorate was like, and I think it made a big difference in how we bought TV.

Gardner: The results of your big data activities are apparent. As I recall, Governor Romney's campaign, at one point, had a larger budget for media, and spent a lot of that. You had a more effective budget with media, and it showed.

Another indication was that on election night, right up until the exit polls were announced, the Republican side didn't seem to know very clearly or accurately what the outcome was going to be. You seemed to have a better sense. So the stakes here are extremely high. What’s going to be the next chapter for the coming elections, in two, and then four years along the cycle?

Wegrzyn: That’s a really interesting question, and obviously it's one that I have had to spend a lot of time thinking about. The way that I think about the campaign in 2012 was one giant fancy office tower. We call it the Obama Campaign. When you have problems or decisions that have to be made, that goes up to the top and then back down. It’s all a very controlled process.

We are tipping that tower on its side now for 2014. Instead of having one big organization, we have to try to do this to 50, 100, maybe hundreds of smaller organizations that are going to have conflicting priorities. But the one thing that they have in common now is they saw what we did on the last campaign and they know that that's the future.

So what we have to do is take that and figure out how we can take this thing that worked very well for this one big organization, one centralized organization, and spread it out to all of these other organizations so that we can empower them.

They're going to have smaller staffs. They're going to have different programs. How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Gardner: It’s interesting, there are parallels between what you're facing as a political organization, with federation, local districts for Congress, races in the state level, and then of course to the national offices as well. This is a parallel to businesses. Many businesses have a large centralized organization and they also have distributed and federated business units, perhaps in other countries for global companies.

Feedback loop

Is there a feedback loop here, whereby one level of success, like you well demonstrated in 2012, leads to more of the federated, on-the-ground, distributed gathering and utilization of data that also then feeds back to the larger organization, so that there's a virtual adoption pattern that will benefit across the ecosystem? Is that something you are expecting?

Wegrzyn: Absolutely. Even within the campaign, once people knew that this tool was available, that they could go into HP Vertica and just answer any question about the campaign's operation, it transformed the way that people were thinking about it. It increased people's interest in applying that to new areas. They were constantly coming at us with questions like, "Hey, can we do this?" We didn't know. We didn’t have enough staff to do that yet.

One of our big advantages is that we've already had a lot of adoption throughout campaigns of some of the data gathering. They understand that we have to gather this data. We don't know what we are going to do with it, but we have them understanding that we have to gather it. It's really great, because now we can start doing smart things with it.

And then they're going to have that immediate reaction like, "Wow, I can go in there now and I can figure out something smart about all of the stuff that I put in and all of the stuff that I have been collecting. Now I want more." So I think we're expecting that it will grow. Sometimes I lose sleep about how that’s going to just grow and grow and grow.

Gardner: We think about that virtuous adoption cycle, more-and-more types of data, all the data, if possible, being brought to bear. We saw at the Big Data Conference some examples and use cases for the HAVEn approach for HP, which includes Vertica, Hadoop, Autonomy IDOL, Security, and ArcSight types of products and services. Does that strike a chord with you that you need to get at the data, but now that definition of the data is exploding and you need to somehow come to grips with that?

Wegrzyn: That's something that we only started to dabble in, things like text analysis, like what Autonomy can with that unstructured data, stuff that we only started to touch on on the campaign, because it’s hard. We make some use of Hadoop in various parts of our setup.

We're looking to a future, where we bring in more of that unstructured intelligence, that information from social media, from how people are interacting with our staff, with the campaign in trying to do something intelligent with that. Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Topics: Big Data, Government, Business Intelligence

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

19 comments
Log in or register to join the discussion
  • just hot air about 'analytics'

    these guys can't fix the obamacare site and they are talking about 'big data'...LOL
    LlNUX Geek
    • They won

      They defeated BIG Money, so they must know something. A lot of the problem with the web site foe the Affordable Care Act was that a lot of the red state governors elected not to have their own plan.. What a surprise,, they have done everything possible to make sure that a black President does not succeed.
      Truth matters
      • Such a dumb comment...

        Why is it that to you dumb-azzes, that, anyone who opposes Obama on anything, is because, he's "black"?

        I opposed Clinton on just about everything he did, and as far as I could tell, Clinton was never black, no matter how much the myth about him being the "first black president" was mentioned.
        adornoe
  • The DNC should be doing this

    So should the RNC, though one may always vainly hope that policies supported by both parties are the ones they honestly think best serve the public interest; and that leaders are actively recruiting the very best candidates they can find, even if they don't think they can win.

    I can dream, can't I?
    John L. Ries
    • In democracies, people get the government they want

      We have a country divided ideologically, so we have a government divided ideologically. The two ideologies have fundamentally and diametrically opposed viewpoints on the relationship of citizen toward the state, and therefore our government strongly reflects that.

      The interesting issue here is that what we actually have is a pseudo three-party system at work. The Democrat party is a collection of special interest groups unified behind one common ideology: Government is the most effective tool for the proper ordering of society. The Republican party is actually split into two camps: The Establishment, who, while not necessarily subscribing to the Democrat vision of government, still kind of like the idea of having control of all that power and money, and the Tea Party faction, which sees government as a necessary, but dangerous tool that should be scaled back to manageable proportions.

      Democrats are currently winning a lot because of this schism among Republicans. Time will tell which way that schism will go. Democrats would love for the Establishment to win, since they are pretty useful puppets, and will be happy as long as they get to sit at the table and get some scraps.
      baggins_z
      • Curiously enough...

        ...lots of people claim there are no real differences between the two parties, while I feel "stuck in the middle" between two major parties that don't really represent my opinion and are increasingly intolerant of dissent. So, given the choice between being a RINO and a DINO (I can't in good conscience be a "real" supporter of either one), I choose the latter.

        But none of that has anything to do with my original post, which expressed the vain hope that both parties will promote whatever policies they genuinely think will serve the public interest and seek out the best available candidates.
        John L. Ries
        • A couple of additional comments

          1. It is very common for people to be aware of differences in their own party/faction, but view the opposition as a monolith because people tend to be much more familiar with their own in-groups than with out-groups (makes it easy to stereotype).

          2. While portraying politics as a fight to the death between Good and Evil (or if you'd like, elves and orcs) gets the adrenaline going, there are a number of problems associated with this practice:
          a. It's a false picture of the actual situation, which has good and bad people on both sides of most issues.
          b. Inevitably, someone will take this picture so seriously that he'll set out to "win the war" by exterminating the opposition (this is one of the ways terrorist groups get started, and was also one of the factors behind the Rwandan Genocide).
          c. It results in bad policy, as politicians are more concerned about "winning" than they are about governing (as happened with the recent federal shutdown).
          John L. Ries
    • Further comment...

      ...I've seen reports of Republicans doing what amounts to CRM since the 1970s, so the Democrats have much catching up to do.
      John L. Ries
  • Headline: DNC Analyzing Big Data!

    I agree with Linux Geek about the DNC:
    They can't fix the Obamanation Care Big Database
    They used the IRS Big Database to hinder and harrass the Tea Party
    They used the NSA to acquire and keep tabs on everyone in an even Bigger Database
    They were very supportive of IPhone having a fingerprint feature to feed even more information into the NSA's Bigger Database
    So exactly what is the earth-shattering news about the DNC analyzing data???
    Unless they are anal-yzing our data... that would be news...
    rerawe
    • You do realize...

      ...that the DNC has nothing whatever to do with government IT. We don't blame the RNC for the Iraq War either.
      John L. Ries
      • Not true, John...

        The DNC has a lot of links to what government IT has in regards to personal data. The DNC is fed data from the government databases, by way of the IRS and FBI and even the NSA and Homeland security and other agencies.

        The targeting of conservative groups by the IRS, including the so-called tea party supporting groups, was all done for political reasons, and the data was used by the DNC and Obama in order to try to maximize their efforts during the 2012 elections.

        There is nothing in the federal government databases which is safe from the liberals' grubby hands. Even Obamacare is being turned into a politically targeted database, as witnessed by part of the sign-up procedures which are used to try to get people registered to vote. Chances are that, liberals intended to get people signed up as democrats, and that that part of the sign-up procedure would be used by democrats to target them for advertising and fund-raising. Liberals/democrats won't ever miss a chance to get an advantage, and they don't care how they do it.

        So, why is it that you continue being so naive? Heck, you even went as far as to say that, you'd prefer a DINO over a RINO. That's pretty ignorant, to say the least.
        adornoe
        • Let's give you some more practice

          Sources please.
          John L. Ries
  • Sure they do ...

    and they use the IRS to intimidate the opposition. The DNC does all sorts of stuff in order to win elections.
    Oknarf
  • Well......

    If a meteor plowed Washington D.C. a mile into the ground, it would make my day a little better......... : )
    straycat5678
    • As long as you don't live within 1,000 miles of the strike, you might be OK

      ;)
      adornoe
      • There goes a lot of the Republican base

        1000 miles is a long way.
        John L. Ries
  • Respect the geeks!

    whether you agree w/ their politics or not, the DNC tech team did a phenomenal job harnessing Big Data to their advantages. I thought this is a tech forum, you can spread your politics love/hate over Fox/Reddit sites. Here, I just want to tip my hat to the DNC tech team. Our team, http://opentray.com, is in the business of aggregating Big Data news so we can appreciate and respect what they accomplished in such a short time frame.
    WyRuby
    • Nobody should ever respect sinister behavior and purposes, no matter how

      effective the process is.

      It's like congratulating the Soviets with how effective they were at subjugating their population, while murdering some 20 million or more people along the way.
      adornoe
      • The developers were engaged in tyranny?

        Really?
        John L. Ries