How harmful and dangerous is AI? One way to find out is through “red teaming”.
Red teaming is a strategy used in many fields, including AI development. Basically, the “red team” is an independent group that tries to investigate or deliberately infiltrate the system, project, process, or whatever for vulnerabilities. The goal is to make the system more secure.
AI systems can also have such vulnerabilities or exhibit unexpected or undesirable behavior. This is where red teaming comes in: a red team in AI development acts as a kind of “independent auditor”. It tests the AI, trying to manipulate it or find flaws in its processes, preferably before the system is deployed in a real environment.
OpenAI says it invested more than six months in red teaming GPT-4 and using the results to improve the model. According to the test results, the unfiltered GPT-4 was able to detail cyberattacks on military systems, for example.
Model and system level: Microsoft uses two-tier Red Teaming
Microsoft uses Red Teaming to investigate large foundational models, such as GPT-4, as well as at the application level, such as Bing Chat, which accesses GPT-4 with additional functionality. These investigations influence the development of the models and the systems through which users interact with the models, Microsoft says.
The tech giant says it has expanded its Red Team for AI and is committed to responsible AI in addition to security. With generative AI, Microsoft says there are two types of risks: intentional manipulation, which is the exploitation of security vulnerabilities by users with malicious intent, but also security risks that arise from the normal use of large language models, such as the generation of false information.
Microsoft cites Bing Chat, of all things, as an example of extensive red-teaming. This seems odd, since Bing Chat intentionally went live in an unsafe version and generated abusive responses. So much so that Microsoft initially had to reduce the number of chats not long after launch.
If anything, Bing Chat didn’t seem to have undergone extensive security testing, and OpenAI reportedly warned Microsoft not to launch Bing Chat prematurely. But Microsoft didn’t care because ChatGPT was already on its way to the moon.
AI needs more than your standard red team
Another challenge for AI red-teaming, according to Microsoft: Traditional red-teaming is deterministic-the same input produces the same output. AI red-teaming, on the other hand, has to work with probabilities.
Potentially harmful scenarios must be tested multiple times, and there is a wider range of possible harmful outcomes. For example, an AI attack might fail on the first attempt, but succeed later.
To make matters worse, AI systems are constantly and rapidly evolving, according to Microsoft. Therefore, AI requires a layered approach to defense that includes classifiers, meta prompts, and limiting conversational drift (when the AI goes down the wrong path). Microsoft provides guidance on red teaming large language models on its Azure Learning Platform.
Microsoft provides guidance on red teaming large language models on its Azure Learning Platform.