In the past few years, studies have raised concerns about potential bias in machine-learning systems. Researchers suggest algorithms can be as fallible as the people who create them and may inadvertently reinforce human prejudices.
The potential for algorithmic bias can surface in the form of discrimination in online ad delivery, hiring practices, loan offers, and even dating websites.
Now a multidisciplinary team from the Data Science Unit of the Catalan technology center Eurecat, jointly with the Pompeu Fabra University and the Technical University of Berlin, wants to correct this issue.
They are working on an algorithm that reorganizes search results without affecting the validity of the ranking. They've called the algorithm FA*IR, with IR standing for information retrieval.
Carlos Castillo, director of the data science unit at Eurecat, tells ZDNet that "most of the time, it's unclear how automated systems come to their conclusions". Bias is not intentional but "a consequence of the data input and how variables interact".
Castillo's team first analyzed several tools based on machine-learning ranking algorithms designed to assess risk, such as COMPAS in the US or Riscanvi in Catalonia.
They also examined the way Google Ad Settings works, with AdFisher, an automated tool freely available on GitHub that explores how user behaviors, Google's ads, and Google's Ad Settings interact.
With all those inputs, and using real datasets from a job-search platform and a lending system, researchers have come up with a prototype that can reorganize search results so that the profiles appearing in, let's say, 100th position have the same opportunities as those showing in the top places.
"The person who uses our algorithm decides who the protected group is, and FA*IR avoids the number of people belonging to that group falling below a certain percentage, no matter the place in which you stop your search," Castillo says.
His multidisciplinary team, which includes former Yahoo chief research scientist Ricardo Baeza-Yates, is now trying to develop a more sophisticated open-source product based on its fair algorithm.
The first step is to work on an add-on for search platforms Elasticsearch and Apache Solr, so that any developer can use it as a search engine. Yet the ultimate goal is to develop a "just search engine".
For that purpose, FA*IR counts on the support of the Data Transparency Lab community, which awarded the project a €50,000 grant. The Lab is sponsored by Telefónica and Mozilla, and the MIT Connection Science research group, among others.
Although Castillo is well aware that "technology does not solve everything", he is also convinced that FA*IR can "provide incentives for change".
For Oriol Lloret, Telefónica I+D Barcelona lab director and product innovation head of discovery, it's all a matter of trust.
"When companies enable content-selection tools, and even more when this content is related to people, they have two options to be able to be reliable and provide confidence to users: they must introduce bias-correction elements in the automatic algorithm, or make it transparent," he says.
Because the second option of revealing details can make the businesses involved less competitive, the first approach is the more commonly chosen option, and it also provides market differentiation, he adds.
However, Xumeu Planells, data scientist at Dublin-based Chameleon Advertising Technologies, argues that from the moment you artificially include protected groups in the results of a ranking, you're altering the efficiency of your search engine as you have to push back other valid profiles.
"In terms of performance, the best for your company is that among the top 10 results are the top 10 profiles, no matter who they are," he says.