Comment

Comment on It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

ji59@hilariouschaos.com ⁨5⁩ ⁨months⁩ ago

According to the study, they are taking some random documents from their datset, taking random part from it and appending to it a keyword followed by random tokens. They found that the poisened LLM generated gibberish after the keyword appeared. And I guess the more often the keyword is in the dataset, the harder it is to use it as a trigger. But they are saying that for example a web link could be used as a keyword.

source

Sort:hotnew top