Twitter algorithmic bias bounty challenge unveils age, language and skin tone issues

The social media giant would not say if another algorithmic bias bounty challenge will be held.

There's no real AI revolution without ethics

Twitter has released a detailed report on the results of its first algorithmic bias bounty challenge, revealing a number of areas where their systems and algorithms were found to have been lacking in fairness.

must read

Ethics of AI: Benefits and risks of artificial intelligence

The increasing scale of AI is raising the stakes for major ethical questions.

Read More

Twitter machine learning engineer Kyra Yee and user researcher Irene Font Peradejordi noted that the bias bounty challenge that took place in August was partially spurred by complaints from Twitter users in October 2020 about an image cropping feature that was found to have cut out Black faces in favor of white faces. 

Users even illustrated the problem using photos of former US President Barack Obama, showing that his face, and any others with darker skin, were cropped out of images that instead focused on white faces in the same photo. 

Twitter committed to decreasing its reliance on ML-based image cropping and it began rolling out the changes in May 2021. A Twitter spokesperson told ZDNet that it has mostly eliminated the saliency algorithm from their service. But members of the ethical AI hacker community managed to find other issues as part of the algorithmic bias bounty challenge held this summer. 

"The results of their findings confirmed our hypothesis: we can't solve these challenges alone, and our understanding of bias in AI can be improved when diverse voices are able to contribute to the conversation," Yee and Peradejordi wrote. 

"When building machine learning systems, it's nearly impossible to foresee all potential problems and ensure that a model will serve all groups of people equitably. But beyond that, when designing products that make automatic decisions, upholding the status quo oftentimes leads to reinforcing existing cultural and social biases." 

The two added that the bias bounty challenge helped Twitter discover a wide range of issues in a short amount of time, noting that the winning submission "used a counterfactual approach to demonstrate that the model tends to encode stereotypical beauty standards, such as a preference for slimmer, younger, feminine, and lighter-skinned faces." 

Another submission, which came in second place in the competition, found that Twitter's algorithm for multi-face images almost never chooses people with white hair as the most salient person in the photo. 

The third place winner examined linguistics biases on Twitter by showing differences between how the site handles English memes and Arabic script memes. 

Two more awards -- one for most innovative submission and most generalizable submission -- focused on how Twitter's model prefers emojis with lighter skin and how adding padding around an image can allow the cropping feature to be avoided. 

Other submissions showed how Twitter's machine learning system can affect certain groups like veterans, religious groups, people with disabilities, the elderly and those who communicate in non-Western languages.

"Often, the conversation around bias in ML is focused on race and gender, but as we saw through this challenge, bias can take many forms. Research in fair machine learning has historically focused on Western and US-centric issues, so we were particularly inspired to see multiple submissions that focused on problems related to the Global South," the two said. 

"Results of the bounty suggest biases seem to be embedded in the core saliency model and these biases are often learned from the training data. Our saliency model was trained on open source human eye-tracking data, which poses the risk of embedding conscious and unconscious biases. Since saliency is a commonly used image processing technique and these datasets are open source, we hope others that have utilized these datasets can leverage the insights surfaced from the bounty to improve their own products."

Twitter said it will be incorporating some aspects of the competition into its own internal processes.

But in a statement to ZDNet, Twitter said the goal of the challenge "was not to identify additional changes we need to make to our product" but to simply "bring together the ethical AI hacker community, reward them for their work, and broaden our understanding of the types of harms and unintended consequences this type of model can potentially cause."

"What we learned through the submissions from this challenge will, however, help inform how we think about similar issues in the future, and how we help educate other teams at Twitter about how to build more responsible models," the Twitter spokesperson said. 

When asked whether Twitter would be holding another bias bounty program, the spokesperson said they hope the programs "become more community-driven." They urged other companies to hold their own bias bounty programs. 

"This challenge was inspired by similar bounty programs within the privacy and security field. We can see the value of community-driven approaches to understanding and mitigating bias in ML across a range of applications for any company who uses machine learning to make automated decisions," the Twitter spokesperson said. "As we shared in April, our ML Ethics, Transparency and Accountability (META) team is currently conducting research into ML bias in areas like recommendation models."