Before the ChatGPT boom began, AI was already an impactful, emerging technology, and as a result, Microsoft assembled an AI red team in 2018.
The AI red team is composed of interdisciplinary experts dedicated to investigating the risks of AI models by "thinking like attackers" and "probing AI systems for failure," according to Microsoft.
Nearly five years after its launch, Microsoft is sharing its red teaming practices and learnings to set an example for the implementation of responsible AI. According to the company, it is essential to test AI models both at the base model level and the application level. For example, for Bing Chat, Microsoft monitored AI both on the GPT-4 level and the actual search experience powered by GPT-4.
"Both levels bring their own advantages: for instance, red teaming the model helps to identify early in the process how models can be misused, to scope capabilities of the model, and to understand the model's limitations," says Microsoft.
The company shares five key insights about AI red teaming that the company has garnered from its five years of experience.
The first is the expansiveness of AI red teaming. Instead of simply testing for security, AI red teaming is an umbrella of techniques that tests for factors like fairness and the generation of harmful content.
The second is the need to focus on failures from both malicious and benign personas. Although red teaming typically focuses on how a malignant actor would use the technology, it is also essential to test how it could generate harmful content for the average user.
"In the new Bing, AI red teaming not only focused on how a malicious adversary can subvert the AI system via security-focused techniques and exploits but also on how the system can generate problematic and harmful content when regular users interact with the system," says Microsoft.
The third insight is that AI systems are constantly evolving and, as a result, red teaming these AI systems at multiple different levels is necessary, which leads to the fourth insight: red-teaming generative AI systems requires multiple attempts.
Every time you interact with a generative AI system, you are likely to get a different output; therefore, Microsoft finds, multiple attempts at red teaming have to be made to ensure that system failure isn't overlooked.
Lastly, Microsoft says that mitigating AI failures requires defense in depth, which means that once a red team identifies a problem, it will take a variety of technical mitigations to address the issue.
Measures like the ones Microsoft has set in place should help ease concerns about emerging AI systems while also helping mitigate the risks involved with those systems.