We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

AI models can acquire backdoors from surprisingly few malicious documents

Source

Ars Technica

Published

TL;DR

AI Generated

Researchers have found that AI language models like ChatGPT can develop backdoor vulnerabilities from as few as 250 corrupted documents in their training data. Despite the model size, the same small number of malicious examples led to all models exhibiting backdoor behavior. This finding challenges previous assumptions that attacks would become harder as models grew larger. The study focused on a basic backdoor type where trigger phrases cause models to output gibberish text, showing that a small number of poisoned documents can successfully install the backdoor.

AI models can acquire backdoors from surprisingly few malicious documents - Tech News Aggregator