Damian leads a team that researches AI robustness, safety and security. What does this mean? They spend their time developing breakthrough methods to stress test and break machine learning algorithms.
This, in turn, shows us how to protect these same algorithms; from intentional misuse, and from natural deterioration. It also lets us understand how to strengthen their performance under diverse conditions.
This is to say, to make them ârobustâ.
Â
The Superalignment initiative aligns well with our research at Advai. Manual testing of every algorithm and for every facet of weakness isnât feasible, so â just as OpenAI have planned, weâve developed internal tooling that performs a host of automated tests to indicate the internal strength of AI systems. Â
âItâs not totally straightforward to make these tools.â Damianâs fond of an understatement.
Â
The thing is, trying to test for when something will fail is traying to say what something can't do.
You might say 'this knife can cut vegetablesâ. But what if you come across more than vegetables? What canât the knife cut? Testing when a knife will fail means trying to cut an entire world of materials, categorising âthings that can be cutâ from âeverything else in the universeâ. The list of things the knife canât cut is almost endless. Yet, to avoid breaking your knife (or butchering your item) you need to know what to avoid cutting!
To be feasible, one needs shortcuts in conducting these failure mode tests. This is where automated assurance mechanisms and Superalignment comes in. There are algorithmic approaches to testing what we might call the ânegative spaceâ of AI capabilities.
Â
This might sound difficult - and it is, controlling what an algorithm does is hard, but controlling what it doesnât do is harder. Weâve been sharing our concerns about AI for a few years now: they have so many failure modes. These are things businesses should be worrying about because there is a pressure to keep up with innovations.
There are so many ways that a seemingly accurate algorithm can be vulnerable and can subsequently expose its users to risk. Generative AI and large language models like Chat GPT-4 make it harder still because these models are so much more complex and guardrail development is reciprocally much more challenging. Â