AI models can acquire backdoors from surprisingly few malicious documents
Source
Ars Technica
Published
TL;DR
AI GeneratedResearchers have found that AI language models like ChatGPT can develop backdoor vulnerabilities from as few as 250 corrupted documents in their training data. Despite the model size, the same small number of malicious examples led to all models exhibiting backdoor behavior. This finding challenges previous assumptions that attacks would become harder as models grew larger. The study focused on a basic backdoor type where trigger phrases cause models to output gibberish text, showing that a small number of poisoned documents can successfully install the backdoor.