Fighting diseases with data science: How the NHS wants to smash silos to supercharge healthcare

The NHS already has the data to identify rare diseases and discover new treatments. It just needs to find it.
Written by Jo Best, Contributor

As the world's biggest healthcare organisation, the NHS should posses an awe-inspiring repository of data to be used for the life-saving work of identifying rare diseases and researching new treatments.

The reality is a little bit less inspiring. 

Data silos and systems that don't talk to each other means that NHS data is nowhere near as useful -- or even accessible -- as it should be.

SEE: Sensor'd enterprise: IoT, ML, and big data (ZDNet special report) | Download the report as a PDF (TechRepublic)

Health Data Research UK has been given the tough task of revamping the way the NHS handles information. Set up a year ago by 10 public sector organisations, HDR UK is expected to make the information held by NHS bodies more useful for researchers and healthcare professionals. 

"We have a highly fragmented landscape," says Gerry Reilly, CTO of HDR UK. "The data is siloed off into many, many different areas and it is very difficult to coordinate access to that data in a safe and secure way. It is even very difficult to discover what data exists if you want to do research and innovation."

The NHS should be a leader in health research: it's vast, well regarded by the public, and it's starting to shake off its reputation for botched IT projects as it aims for more substantial tech-led transformation. And it's very good at recording all sorts of data, from medical imaging, to genomics data, patient notes, and more.

Without good data, medicine -- and people's health -- loses out. By creating the right sort of data environment, data scientists can help lay the foundation for tackling healthcare's big data challenges like personalised medicine, and assemble sufficiently large datasets to allow investigations into even the rarest of diseases.

One hitch: the NHS has been here before.

Aware of the value of the information it held and the potential uses that information could be put to, the NHS tried to set up a data-sharing project called Care.data. After a series of missteps that eroded the public's trust in the scheme -- and that of doctors -- the government binned the project in 2016. Second time around, creating an environment that will encourage patients and others to agree to share their data will mean the sector will have to put privacy first at every step.  

Preserving individual privacy while still making data as useful as possible means taking a new approach, says Reilly. "I use the term 'access' rather than 'share', and that is a change that we have to get in our minds. We have to maintain the trust of the population and that is both an individual consent question plus, for me, a public consent question. If you lose confidence, then you lose the opportunity to do this research."

Reilly suggests that only de-identified data should be used, except for specific scenarios such as clinical trials that need identifiable information on patients, to make sure that the risks of disclosure are as low as possible. That, and moving to an approach where distributed machine learning and distributed analytics can be put to work on data without ever moving it from outside its original home. That way, the data scientists can't see, or share, the data, and instead will only get the summary they need for their research.

"We need to move to a model where the research takes place inside the environment controlled by the data custodian, and we move away from the world where you request access and it turns up on a USB key or a DVD. We must move to world where we do distributed machine learning and distributed analytics without ever sharing the data outside of the data custodian's secure environment," Reilly says.

Advances in machine learning and AI are already starting to open up new possibilities in the NHS and healthcare at large, helping to diagnose cancer from medical imaging, assisting in robot surgery, and aiding drug discovery.

HDR UK is also taking practical steps to fund healthcare research projects. It's funded 10 Sprint Exemplars, research efforts that are "short, sharp, for relatively small amounts of money -- £100,00 to £400,000 -- with 10 months delivery", says Reilly. Among the exemplars are a project that will identify potential patients for clinical trials based on data from hospital visits, and a cloud-based system that will analyse how genetic differences affect how rare diseases present in patients.

It's also coordinating the recently-created Health Data Research Alliance, to help spread best practice on data stewardship for the biggest holders of healthcare data. And next month, it'll be launching a competition to create five digital innovation hubs across the UK to curate the information that's available to enable research.

"One of the problems we have around data in the UK - and probably everywhere else in the world - is the data is of a very mixed quality. We've got to do a better job of curating data, but it has to be done in a targeted way. We don't have the ability, skills or money to attack it all, but we can still make a big difference," Reilly says.

There is currently a data-science skills shortage in the health sector. According to Reilly, that stretches from doctoral level right through to health workers who may just need better of awareness of what data can do for health. While providing targeted training for the latter might be relatively easy, attracting the right calibre of data scientists into the public sector will mean attracting them from higher-paying industries.

It's hoped that a more flexible and interesting working environment will help encourage data scientists to make the move from the private sector, or return after a career break.

"This is an opportunity to do something that has a very tangible affect on outcomes for people. That's not going to work for everyone -- if someone's focused on maximising their salary, they're probably not going to end up working on health data. But… there are people who could be earning a lot more in the private sector who want to stay in this industry because they want to impact health outcomes," Reilly says.

SEE: 60 ways to get the most value from your big data initiatives (free PDF)

As well as broadening the existing data-science workforce, providing better training for all levels of healthcare professionals will also be key to turning the UK into a health data leader. It's an issue that's already on the government's agenda: its Topol Review, released earlier this year, set out a plan for how the health workforce could be prepared for new technologies that are starting to see uptake in the industry. 

Will data reshape the role of clinicians in the future? "It's happening already," says Reilly. "We see around us the sort of early adopters among the clinicians who are absolutely passionate about what data can do for them."

While the early adopters are already embracing data, for the next generation of clinicians it will just be part of the day job. Doctors will need to be more data-aware and will need the tools to help guide their clinical activities.

"They're not all going to want to become data scientists and statisticians, but their world is going to become a more significantly data-driven world. it is already now; it certainly will be in five to 10 years' time," Reilly says.


Editorial standards