Social network for data scientists Data.World raises $18.7m

Data.World has secured $18.7 million in funding, bringing the total amount raised by the Austin-based startup to $32.7 million.

data-world-homepage.png
(Image: Screenshot by Tas Bindi/ZDNet)

There are more than 18 million open datasets available, but they're often difficult to find, difficult to understand, and difficult to translate into something of value, according to Austin, Texas-based startup Data.World.

As an active participant in the open data movement, Data.World seeks to democratise the vast treasure trove of accumulated data scattered across online and offline environments.

On Tuesday, the startup, which operates as a public benefit corporation with a legal obligation to create value for society, announced that it raised $18.7 million in a Series B funding round led by Pat Ryan's family investment group. The latest round brings the total amount raised by Data.World to $32.7 million.

Founded by CEO Brett Hurt, CTO Bryon Jacob, COO Matt Laessig, and CPO Jon Loyens, Data.World can be described as one of many things: A social network, a discovery tool, a collaboration platform, or a data repository.

By combining linked datasets with social networking features, Data.World strives to be the place where users -- whether they're data scientists or data enthusiasts, solo entrepreneurs or multinational organisations -- come to discover, discuss, and disseminate datasets, as well as collaborate on data projects to solve academic, commercial, and societal problems.

Data.World links datasets together using semantic technology, which gives concepts within datasets an independent existence, allowing people and machines to work with data without needing to learn everything about it first. The datasets are available in both public and private configurations.

Instead of just presenting .CSV files of raw data, users can visualise the data by choosing from a range of charts. By clicking the information icon, users can also get a quick overview of the datasets, including information such as the number of distinct and missing values, as well as most common and least common values.

Loyens told ZDNet that Data.World was designed to capture, store, and link all the activity -- including discussions and queries -- that take place around the datasets published on the platform.

"If you're working in a company and you're working on a data project, it's really hard to collaborate because you lose a lot of context, you lose a lot of what's been done over time with the data ... they get lost in emails, in old wikis," Loyens said.

"Everything you see is available as APIs ... As a researcher or data analyst, if you put together a really great dataset and you want to make that available to an application developer, you can just drag it and drop it in Data.World and instantly have these query end points where people can actually start creating apps against it."

Since its launch in July 2016, the startup has experienced "phenomenal" traction, according to Hurt, who sold one of his previous companies Coremetrics to IBM in 2010. He told ZDNet that Data.World's traction mirrors the global momentum around open data.

"We had such a great receptivity that it allowed us to get that funding much earlier than I ever anticipated. We now have the runway to really build out this business," Hurt said.

"We really haven't made much of a dent in that first amount, but we wanted to make sure that, with the ambition we have and the growth we're experiencing, we have enough capital available to really do the concept justice," Loyens added.

"We're going to be making sure we get into the right communities and build the features that the community really needs to thrive and collaborate on Data.World."

In addition to funding, Data.World has also attracted partners such as the National Science Foundation, Census Bureau, Anti-Defamation League, US Commerce Department National Technical Information Service, and the Pentagon.

Datasets published on the platform vary widely -- there's data on sports, education, poverty, national security, housing, mental health, and terrorism.

"What's really more important than the debt or the big partnerships or the people that we're working with are the communities that are developing in Data.World," Loyens noted.

Formed in December 2016, Data For Democracy is one of the earliest communities to use Data.World. In a few months, the group has grown to more than 800 data scientists and subject matter experts -- all working to shed light on and bring greater transparency to the democratic process and to government programs, according to Data.World. The group works with data on crime, drug spending, and the presidential election.

Another notable user is data journalist Carl V Lewis who, shortly after the US travel ban was announced, published a dataset around the citizenship status of all perpetrators of terrorist attacks against the US and Americans abroad since 2001. Another user, Marc Santolini, created a visualisation of that dataset, which highlighted the discrepancy between perception and facts.

"Data.World can quickly become the source of truth. People shared that visualisation because they knew it's backed up by real data. They can actually go look at the data. Maybe they themselves are not data scientists, but they know that data scientists are always on the platform running queries," Hurt said.

Ian Greenleigh, head of marketing at Data.World, told ZDNet that as Data.World grows, data will be brought up in every conversation.

"We feel like if data is a source of truth, and if we can host that data on the platform, then a new kind of phenomenon will occur where data will be brought up in every conversation when there are differences of opinion. They'll hopefully resolve some of those differences," Greenleigh said.

He added that the availability of data is the first step, and that Data.World wants to take the open data movement to the next level.

"What we are trying to add to that mix is a collaborative work environment and the social signals you need to decide whether that's the right data for you, whether the originator is respected in the community, whether they have credibility, whether the analysis makes sense, because you're able to dig in and see how the person got there," Greenleigh said.

The startup's end game is "a platform that accelerates research, informs policy, and helps us all combat fake news and 'reality check' the facts around us".

Data.World's founders are cognisant of the privacy and security challenges around data and do not claim to have all the answers. Like other social networks, Data.World has terms of service and rules that community members are required to abide by.

For example, Data.World prohibits users from uploading data that is not their own, or contains personally identifiable information, to the open side of the platform. If any violations come to light, Data.World's terms of service allow it to terminate a user's access to their account and their ability to post. It can also remove files from the site at its discretion.

"For the most part, our community is pretty good at policing itself. But you can't rely on that entirely," Loyens said. "There's the fat finger effect where somebody accidentally clicks the 'public' button. You can put as many warnings in front of them as you want, but sometimes things get out.

"We believe in full auditability, so when things like that happen, you know how it happened, why it happened, and how to correct it. This is really important to us from a design principle standpoint, from a technology standpoint. It's just as easy to make [the data] private again. We provide all the right access control measures to get the data back."

Data.World has not been monetised at scale, though the founders admitted that one undisclosed organisation has insisted on becoming a paid user.

Generally, the startup's monetisation model will revolve around enterprise use. For example, there will be enterprise-friendly features such as single sign-on and administrative controls that organisations will be required to pay for. Also, organisations that want to combine public and private datasets will be required to pay a nominal fee.

"Just like with GitHub, a lot of individual developers started using it on open source projects or personal passion projects and brought in the organisation and the organisations needed to adapt to that. We already have organisations reaching out and asking for those enterprise features, so we're trying to figure out how to navigate those worlds," Loyens said.

Data.World's users are predominantly from the US, the UK, Canada, and Australia, with a growing number of users coming from India.