Microsoft releases its internal generative AI red teaming tool to the public

PyRIT can generate thousands of malicious prompts to test a gen AI model, and even score its response.
Written by Sabrina Ortiz, Editor
Abstract tech colorful image
Baac3nes/Getty Images

Despite the advanced capabilities of generative AI (gen AI) models, we have seen many instances of them going rogue, hallucinating, or having loopholes malicious actors can exploit. To help mitigate that issue, Microsoft is unveiling a tool that can help identify risks in generative AI systems. 

On Thursday, Microsoft released its Python Risk Identification Toolkit for generative AI (PyRIT), a tool Microsoft's AI Red Team has been using to check for risks in its gen AI systems, including Copilot

Also: How renaissance technologists are connecting the dots between AI and business

In the past year, Microsoft red-teamed more than 60 high-value gen AI systems, through which it learned that the red-teaming process differs vastly for these systems from classical AI or traditional software, according to the blog post. 

The process looks different because Microsoft has to consider the usual security risks, in addition to responsible AI risks, such as ensuring harmful content cannot be intentionally generated, or that the models don't output disinformation. 

Additionally, gen AI models vary widely in architecture, and there are deviations in outcomes that can be produced from the same input, making it difficult to find one streamlined process that fits all models. 

Also: Want to work in AI? How to pivot your career in 5 steps

As a result, manually probing for all of these different risks ends up being a time-consuming, tedious, and slow process. Microsoft shares that automation can help red teams by identifying risky areas that require more attention and automating routine tasks, and that's where PyRIT comes in. 

The toolkit, "battle-tested by the Microsoft AI team," sends a malicious prompt to the generative AI system, and once it receives a response, its scoring agent gives the system a score, which is used to send a new prompt based on previous scoring feedback. 

PyRIT process

Microsoft says that PyRIT's biggest advantage is that it has helped Microsoft's red team efforts be more efficient, significantly shortening the amount of time a task would take. 

Also: How tech professionals can survive and thrive at work in the time of AI

"For instance, in one of our red teaming exercises on a Copilot system, we were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT's scoring engine to evaluate the output from the Copilot system all in the matter of hours instead of weeks," said Microsoft in the release. 

The toolkit is available for access today and includes a list of demos to help familiarize users with the tool. Microsoft is also hosting a webinar on PyRIT that demonstrates how to use it in red teaming generative AI systems, which you can register for through Microsoft's website

Editorial standards