Data scientists are used to making up the rules. Now they're getting some of their own to follow.

Data science, as a profession, lacks the formal standards that will win the public's trust. A new initiative hopes to change that.
Written by Daphne Leprince-Ringuet, Contributor

When it comes to data, there has been no lack of discussion and debate about ethics in the past years, but formal standards to guide data scientists themselves are still lacking. Now industry bodies in the UK are hoping to change that, as the British Computer Society (BCS), Chartered Institute for IT, along with the Royal Statistical Society (RSS) and the Royal Academy of Engineering (RAEng), have kicked off work to establish industry-wide professional standards in data science

The objective: to uphold an ethical use of the public's data, and ultimately, make data scientists trusted professionals – as trusted as doctors, lawyers or architects. 

Rebecca George, the president of the BCS, told ZDNet that "the stars are now aligned" to gather input from different organizations, and inject them into one central set of rules that will guide the field.

SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)

The flood of data generated by the COVID-19 pandemic about various metrics ranging from health to mobility has certainly helped propel the discussion forward. Combined with the imminent departure of the UK from the EU, said George, industry leaders came to realize that establishing a trustworthy data ecosystem should be a top priority.

"With Brexit, as a country, we are looking at those areas where we have particular strength, and data science is one of them," said George. "There is real desire for the UK to be the most trusted and sought-after country in the world for data science teams. It's good timing, as we look to reset our role in the world, to look at how to make the best use of our resources."

But even regardless of current geopolitics and global-health crises, there is no doubt that the role of data science is exploding. The World Economic Forum estimates that by 2025, 463 exabytes will be created every day globally, and that data will be combined with algorithms to determine everything from credit scores to allocating welfare benefits.

It is important, in this context, that the public know where information is coming from, how it is being used, where it goes, and whether the technologies exploiting data are aligned with public interest. Applying standards to the industry would ensure that data scientists adhere to strong guidelines.

"This is the hidden critical infrastructure of our world," she says. "We ought to be asking whether the people who are designing, building, and maintaining it are professionally qualified to do the job, and how they can prove it."

Many, if not most, technology-oriented organizations already have ethical standards of some sort, which were developed to ensure that innovation is designed responsibly within their own ranks. The BCS, for example, asks practitioners to sign up to a code of conduct, which determines among other principles that IT workers should act in the public interest, with integrity, competence and diligence; and that they should never take on a task that they don't have the skills to complete.

Similarly, the RSS's code of conduct defends acting in the public interest, fulfilling obligations to employers and clients, and showing competence and integrity. And the RAEng is governed by principles of openness, fairness, respect for the law, accuracy and rigor. Even big tech has jumped on the bandwagon, with Google committing to responsible technology, or Microsoft drafting guidelines for 'ethical and trustworthy AI', to name but two. 

But while organizations have been pulling together ethics committees and writing up white papers on the rules that should govern the use of data, not much was done at the individual level. Yet the source of all technology is the brain of those who come up with new ideas. Setting standards for professionals, rather than organizations, could therefore go a long way.

"Individuals have to sign up to standards as well," says George. "I think that is core to professionalism. It means you always ask the question: 'Why am I being asked to do this, am I doing the right thing and what could be the consequences?'"

In fact, the BCS and its partners are starting with academia in the effort to set up standards for data science. Working with universities, the group will assess whether academic programs are delivering the right skills and knowledge for those looking to enter the profession, before progressing on to current professional standards.

From day one, therefore, future data scientists will be fed guiding principles for the profession, and it is hoped that they will carry their learnings with them throughout their career. "If they have been taught to adhere to standards as they go through early education and training, people tend to stick with them as they go into the world of work," said George.

In a ripple effect, the whole industry's standards could be uplifted, rather than divided between different organizations coming up with separate, decentralized guidelines and codes. 

There is no lack of evidence that the field of data science could benefit from a fresh approach to standards. From an ethical perspective, artificial intelligence is a field that has repeatedly been in the spotlight for the technology's failure to stick to principles of fairness and transparency, with some algorithms set to exacerbate pre-existing biases within society. 

It is easy to see how cementing principles of responsibility and dedication to the public's interest into the core of data science from the earliest stages of a professional's development could benefit the industry as a whole.

SEE: Programming language rankings: R makes a comeback but there's debate about its rise

George also mentioned data quality as an important part of the equation. "Since the start of COVID-19, we have been particularly concerned with planning for unforeseeable situations," she says. "A big part of data science is how you can verify the sources of your data, how you can be sure the data you are using is complete, and that it has been collected in a standard way."

For example, she explained, the UK has for a long time registered deaths on a weekly basis – nowhere near often enough to keep track of the pandemic. When the crisis escalated, new methods had to be found to get data in real-time, while ensuring that it was accurate, pertained to the correct time period, and answered the same question across all the organizations it was collected from. 

Having standards in place to regulate these processes would therefore improve the quality of the science in the long run. And equally, it would give immediate backing to data scientists' work. Standards, indeed, could act as a badge of quality for the profession overall.

"A lot of IT professionals would welcome standards," says George, "because they want to know what they're supposed to be doing, but also how to prove they know what they're doing."

Editorial standards