Comment on It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

NuXCOM_90Percent@lemmy.zip ⁨4⁩ ⁨days⁩ ago

found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM

That is a very key point.

if you know what you are doing? Yes, you can destroy a model. In large part because so many people are using unlabeled training data.

As a bit of context/baby’s first model training:

Just the former is very susceptible to this kind of attack because… you are effectively labeling the training data without the trainers knowing. And it can be very rapidly defeated, once people know about it, by… just labeling that specific topic. So if your Is Hotdog? app is flagging a bunch of dicks? You can go in and flag maybe 10 dicks and 10 hot dogs and ten bratwurst and you’ll be good to go.

All of which gets back to: The “good” LLMs? Those are the ones companies are paying for to use for very specific use cases and training data is very heavily labeled as part of that.

For the cheap “build up word of mouth” LLMs? They don’t give a fuck and they are invariably going to be poisoned by misinformation. Just like humanity is. Hey, what can’t jet fuel melt again?

source
Sort:hotnewtop