Panama Papers graphically demonstrate the power of the graph database

Data-mining technology is thrown into the spotlight thanks to the tale of 11.5 million files.
Written by Colin Barker, Contributor on
person with paper files

Graph databases show relationships hidden within massive amounts of data.

Image: Shannon Fagan, XiXinXing

Graph databases have proved their worth with the technology being used to analyse the Panama Papers.

The recent data leak from the Panamanian law firm Mossack Fonseca has captured the imaginations of the world, and in particular journalists at the Washington-based, International Consortium of Investigative Journalists (ICIJ), which includes The Guardian and the BBC in its membership list.

The consortium fed the leaked data into a graph database, in this case Neo4j, which crunched the data and then revealed the underlying structure of that data -- thus illustrating the relationships between all of the individuals, companies, and customers involved.

A graph database is designed, like any other database, to handle large volumes of data. The difference is that a graph database is designed to show all of the relationships within the data.

Graph databases are good at managing highly connected data and complex queries. Instead of using tables, graphs use nodes, properties, and edges to define and store data, making them better at analyzing the relationships and any interconnections between data -- and allowing journalists to follow the money easier than ever.

As Rik Van Bruggen, a regional advocate at Neo4j, explained: "It is a graph database, not a graphics database. Where a regular database stores grids of columns and rows, a graph database uses a graph structures for semantic queries with nodes, edges and properties to represent and store data."

The graph database is a natural extension of database technology, he told ZDNet. "Database technology has been around a long time and in the '70s and '80s people really understood it. Graph databases are relatively new but now people are beginning to understand what can be done with that technology."

The brain is the model for it. "The human mind thinks in terms of ideas, concepts, and relationships. So does a graph database which, if you like, is like a neural network."

Founded in 2007, Neo Technology is based in Malmö, Sweden, and, according to Van Bruggen, currently has 130 customers, many of which are "very large" organisations.

It uses open source technology which is available in a GPL3-licensed community edition under the terms of the free Affero General Public License. The technology is also available under closed-source commercial license terms.

The release of the Panama Papers is not the first time that Neo4j has captured the headlines. Last year the ICIJ received press attention for using the technology with the release of details of the HSBC files.

"It's a revolutionary discovery tool that's transformed our investigative journalism process," said the ICIJ's research unit direction Mar Cabra. Why? "Because relationships are all important in telling you where the criminality lies, who works with whom, and so on. Understanding relationships at huge scale is where graph techniques excel."

As Cabra said, the ICIJ "needed a technology that could handle these unprecedented volumes of highly connected data quickly, easily, and efficiently". The graphic database is vital in accomplishing this.

Van Bruggen pointed out that graph databases also have tremendous potential outside of journalism. "It is not just humans who produce information in this way," he said. "What if it was your fridge? You could feed in all the characteristics of that and it can be an enormous help in finding out why things behave in certain ways."

Read more about graph databases

Editorial standards