Automotive giant Daimler is using Neo4j's graph database technology in its HR department. ZDNet spoke to Jochen Linkohr, the manager of HR IT at Daimler, to find out more.
ZDNet: When did you start looking at using the graph data model in HR and what attracted you to it?
Linkohr: For us, we could see advantages to using graph technology in HR projects because HR data is not isolated, so you don't normally have one person working without a connection to another person. If you look at a company, every time you look at the people working in the company you will see that they all have a connection to other people working in the company, you won't see anybody who is completely isolated.
That is one of the reasons why we thought that HR data might be a very good fit with a graph data model. We have started with trying to understand what graph and HR data have in common.
Let us says that John is working on an analytics project and you have another person, Amy, who is also working in HR. Then if you have another dataset which is an analytics project for HR, then you have a connection between John and Amy because they are working on an analytics project for HR.
So data and the information on this data is not in the data itself — there is John, there is Amy and there is the analytics nature of the project. And that is the basis of a graph and that fits very well with the real world.
That is the first reason. The second reason, and it's a concrete reason why we created this structured application, is that we created our Leadership 2020 programme at Daimler. We are transforming as a company from the classical, hierarchical structure to a mixture of classic hierarchies and what is called a 'swarm' which is a mixture of the same people working on the same project but coming from different departments and different hierarchies.
And we thought that when dealing with this multi-structured data, it might be a good starting point for using graphs because we have a lot of structures being transformed to other structures because somebody wants to know who is working in a swarm.
So, if somebody wants to know who is working on a swarm and in which structure, in which factory and things like that, then at some point you have to manage this data. We thought that this would be a good starting point to create a system that would be doing this.
This was the starting point for the first project which is called Structure Cube.
Can you explain what the aim was?
Companies normally have a hierarchy where somebody has a boss; the boss has a boss and so on. But if you don't have that clear hierarchy and the company is organised using swarm then they might refer to a location where you manage different structures, So, I am working here within, let's say, five different structures. You can look at my data on a hierarchy level but you can also look at different structures on where I am working, on what I am working and which swarms I am working in order to get a result.
So, if you want to match all these structures on top of another structure, you will end up matching all the nodes which are the people who have their own nodes and you have to do an n-dimensional matching of all these structures and other structures.
You have to build all these structures using the same data but structured in a different way. That is the starting point and we end up with a Structure Cube.
Did you start with one plan for a particular part of the company or did you aim to do it for all the company?
We started with the whole company, Daimler AG [a company with 250,000 employees]. We started to integrate all of the structure within it. Not on all of the structures but on the structures that are common to the whole company.
Could you give me an example of a structure?
A structure may be a classic hierarchy — who is the boss of who? Another structure may be where are you working or in which swarm? You can have a row-based organisational chart or a hierarchy-based chart. You can also have a location-based organisational chart dependent on which location you are working in because one company or one division of our company may not be at the same location. So, there are different ways to construct charts depending on the data.
How did you collect the data — I'm assuming that you had some sort of automated process to do that?
Well the data was already there but on different systems, so we just extracted it from the different systems or just interconnected it. So, it was there and the swarm organisation had started some years ago, so we just had them interface to enter the new data.
The existing data — like the classical hierarchy — is there and so we just interfaced to that.
Can you give me an idea of the files as you were building them? Presumably you had a pretty massive file of the employees?
We had some thousands of nodes but it was not really that much. For a graph database like Neo4,j that is nothing. We have the links with the nodes in different versions of hierarchies. I think that we have in the dimension of tens of thousands of nodes.
You were using a graph database, Neo4j, in this. What was the main aim? Was it to increase efficiency or reduce the cost of processing the data?
Using a graph database as we analysed the problem was logical because it was a graph problem which can be easily handled with a graph-based approach. But to build a graph problem on a relational database would be more complicated than just using a graph database. You use the right tool to do the right things — you shouldn't use a hammer to put a screw in a hole. It works but is not as good as it should be.
Did this process turn up any surprises such as advantages/advances that you hadn't anticipated?
Neo4j didn't turn up any surprises except for the surprise that it was so easy. I played around with Neo4j a lot beforehand and did find things that turned up and I thought might fit well but mainly the surprise was that it was easier to set up and install than we had anticipated.
Using a graph database is significantly different from using a traditional database, were your employees able to cope with that?
It's just a new technology and in the IT sector in general you have to be able to learn faster and our employees are used to that. Graph data is hundreds of years old while graph data technology is new. Ten years ago, nobody talked about graph databases. The IT team had no problem in learning how to use it.
There were some voices who queried why we were using new technology when they were used to classical databases, but it was the case that on the one hand some people wondered why, but in the main they understood very quickly.
What are the next steps? Do you see other areas of the business, outside of HR, that could benefit from using the technology?
Inside HR using Neo4j we have already developed the Common Key Tool. Developing tools is very easy using a graph database and that is one of the key advantages of Neo4j as a product. But you have to think about whether the problem fits the solution and not everybody will be able to do that.
But if the problem is a graph-based problem — let's say you detect that more entities have huge amount of dependencies or it might be the multiple connections between modes. You might have to manage and explore these dependencies, and, in those situations, you should think about using graph everywhere in the company and outside as well.