It's no secret that organisations across the world are generating more data than ever before. But for those in the genomics field, the data challenge is not only bigger than ever, it may be bigger than anyone else's: it's thought that in the coming years, the field will generate more data than any other industry.
For Juergen Harter, not only is his organisation characterised by huge volumes of data, until recently it was also characterised by Excel spreadsheets used to store vital business data.
Harter is the VP of information systems at Horizon Discovery, a biotech company based in the English city of Cambridge, which also has sites in the US and in the Austrian capital of Vienna. The company provides services from gene editing to modelling, to support the world's largest pharmaceutical companies in drug development.
As well as providing IT services for scientists working on cell biology, Harter also has to support a traditional salesforce and other support staff that were, until recently, using Excel as a repository for customer data.
Following the purchase of several biotech companies between 2014 and 2015, the company found itself not only having to integrate those acquisitions, but also the legacy business software systems that each brought with them. As a result, the company embarked on a program of consolidating its CRM, ERP, and productivity platforms.
All of the different Microsoft Exchange systems and email domains were consolidated, the Excel spreadsheets were replaced by Dynamics, and Skype for Business was brought in for unified comms.
"We decided to go with a Microsoft E3 site-wide license and then push every user into Office 365 because that allows everybody to be on one email system and have all the Office applications running in one unified way. That more or less allowed us to put everything together to form the Horizon Group," Harter said.
While the Dynamics CRM is in the cloud, the AX ERP remains on-premises, though over time the latter is likely to be moved to a hosted option as well.
"If this was day one on the two-year AX deployment, I might go more toward that cloud release, but two years ago, we decided on-prem was the right route for that part. Last year, we started the CRM deployment. It took about a year and we had to unify five different old legacy CRM systems for that, but I thought it was mature enough for the cloud," Harter said.
While the commercial side of the business is now largely standardised on Microsoft underpinnings, the scientific side is a different story: though the former may have been fragmented across different versions of Redmond's products, the latter is fragmented across tens of platforms, both homegrown and bought-in, proprietary and open source, cloud and on-premises.
According to an analysis carried out by Harter's team a couple of years ago, there were between 100 and 200 separate software platforms used by researchers, from something used by what the IT chief describes as "just 10 scientists running locally on some desktops", to more complex gene-editing platforms running off AWS.
The situation is unlikely to remain that way for long: Harter expects a second software unification push on the scientific side of the business to be on the agenda soon.
Looking at the software fragmentation on the commercial side of the business, "the answer there was easy -- we knew we had to consolidate, we knew the benefits we would get because the entire salesforce will be able to have the whole dialogue with the client stored in the CRM, be able to access it remotely, you name it -- it was easy and clear the path you take there. On the scientific side, things are less clear because you can go in so many different directions depending on the business model you're driving and a lot of these applications produce an awful lot of data," said Harter.
When it comes to genomics and biotech, an awful lot of data means volumes that most companies may find unimaginable, or at least unmanageable.
"Over the last couple of years, we have got more and more databases showing up and generally we understand the knowledge-mapping much better, and where data comes from and metadata on it. The volume and has grown and grown. That's in line with all the trends you'd see elsewhere, but with biotech and genomics, it's even more exacerbated. The rate of data growth is far, far higher," Harter said.
Taming that data so clients and in-house researchers can make the most of it is also a priority for Harter, and Horizon Discovery's central IT teams will be focused on improved advanced enterprise search, to help customers find the data they need when they need it, and researchers to find patterns within all the information they've gathered.
"If people are searching for cell lines on the Horizon website and, say, they work on Alzheimer's and need to know which genes are implicated, we use bioinformatics underneath -- we are in the process of utilising ontologies like disease ontologies to help optimise the search query so we deliver the right genes and guide their search efforts far better. We're trying to exploit our wealth of scientific data in the databases around Horizon to learn from what we already know about the cell," Harter added.
The joint challenge of both a growing volume of data and an increasing need to be able to query it effectively mean that Horizon Discovery's team are looking at new ways to store and share its datasets.
In tandem with working out how to scale its on premise storage, Horizon Discovery's infrastructure staff are looking at what data can be moved to the cloud, and how data stored there can be more easily shared with clients. For the move "from cloud to cloud or via APIs upload or portals, we've looked at all those different technologies. It's grown a long way from someone sends an attachment around or whatever," according to Harter.
Further off, Horizon Discovery is looking at what more advanced technologies it could bring to bear in the fight against cancer and the drive towards precision medicine. The company is already looking towards artificial intelligence and machine learning, as well as robotics.
"On the scientific side, we are keen to introduce far more automation, and ultimately more down the road robotics for lab automation. [...] There will be more software needed to handle any data that comes off machines and robots down the line. That area is growing and growing," Harter said.