Comment

Comment on It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

<- View Parent

korendian@lemmy.zip ⁨2⁩ ⁨months⁩ ago

[deleted]

source

Sort:hotnew top

expatriado@lemmy.world ⁨2⁩ ⁨months⁩ ago
it is as simple as adding a cup of sugar to the gasoline tank of your car, the extra calories will increase horsepower by 15%

source
- Beacon@fedia.io ⁨2⁩ ⁨months⁩ ago
  I can verify personally that that's true. I put sugar in my gas tank and i was amazed how much better my car ran!
  
  source
  - setsubyou@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Since sugar is bad for you, I used organic maple syrup instead and it works just as well
    
    source
- demizerone@lemmy.world ⁨2⁩ ⁨months⁩ ago
  I give sugar to my car on its birthday for being a good car.
  
  source
- Scrollone@feddit.it ⁨2⁩ ⁨months⁩ ago
  Also, flour is the best way to put out a fire in your kitchen.
  
  source
  - Tollana1234567@lemmy.today ⁨2⁩ ⁨months⁩ ago
    make sure to blow on it like xena does with a fire.
    
    source
- _cryptagion@anarchist.nexus ⁨2⁩ ⁨months⁩ ago
  you’re more likely to confuse a real person with this than a LLM.
  
  source
  - Peppycito@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
    Welcome to post-truth.
    
    source
- crank0271@lemmy.world ⁨2⁩ ⁨months⁩ ago
  This is the right answer here
  
  source
  - Fmstrat@lemmy.world ⁨2⁩ ⁨months⁩ ago
    The right sugar is the question to the poisoning answer.
    
    source
  - CheeseNoodle@lemmy.world ⁨2⁩ ⁨months⁩ ago
    This is the frog answer over there.
    
    source
- thethunderwolf@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  And if it doesn’t ignite after this, try also adding 1.5 oz of a 50/50 mix between bleach and beer.
  
  source
PrivateNoob@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
There are poisoning scripts for images, where some random pixels have totally nonsensical / erratic colors, which we won’t really notice at all, however this would wreck the LLM into shambles.

source
- turdas@suppo.fi ⁨2⁩ ⁨months⁩ ago
  The I in LLM stands for “image”.
  
  source
  - PrivateNoob@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
    Fair enough on the technicality issues, but you get my point. I think just some art poisoing could maybe help decrease the image generation quality if the data scientist dudes do not figure out a way to preemptively filter out the poisoned images (which seem possible to accomplish ig) before training CNN, Transformer or other types of image gen AI models.
    
    source
- partofthevoice@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  Replace all upper case I with a lower case L and vis-versa. Fill randomly with zero-width text everywhere. Use white text instead of line break (make it weird prompts, too).
  
  source
  - killingspark@feddit.org ⁨2⁩ ⁨months⁩ ago
    Somewhere an accessibility developer is crying in a corner because of what you just typed
    
    source
- onehundredsixtynine@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  [deleted]
  source
  - PrivateNoob@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
    Apparently there are 2 popular scripts. Glaze: glaze.cs.uchicago.edu/downloads.html Nightshade: nightshade.cs.uchicago.edu/downloads.html
    
    Unfortunately neither of them support Linux yet
    
    source
- dragonfly4933@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  Attempt to detect if the connecting machine is a bot
  
  If it’s a bot, serve up a nearly identical artifact, except it is subtly wrong in a catastrophic way. For example, an article talking about trim. “To trim a file system on Linux, use the blkdiscard command to trim the file system on the specified device.” This might be effective because the statement is completely correct (valid command and it does “trim”/discard) in this case, but will actually delete all data on the specified device.
  
  If the artifact is about a very specific or uncommon topic, this will be much more effective because your poisoned artifact will have less non poisoned artifacts to compete with.
  
  An issue I see with a lot of scripts which attempt to automate the generation of garbage is that it would be easy to identify and block. Whereas if the poison looks similar to real content, it is much harder to detect.
  
  It might also be possible to generate adversarial text which causes problems for models when used in a training dataset. It could be possible to convert a given text by changing the order of words and the choice of words in such a way that a human doesn’t notice, but it causes problems for the llm. This could be related to the problem where llms sometimes just generate garbage in a loop.
  
  Frontier models don’t appear to generate garbage in a loop anymore (i haven’t noticed it lately), but I don’t know how they fix it. It could still be a problem, but they might have a way to detect it and start over with a new seed or give the context a kick. In this case, poisoning actually just increases the cost of inference.
  
  source
  - PrivateNoob@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
    This sounds good, however the first step should be a 100% working solution without any false positives, because that would mean the reader would wipe their whole system down in this example.
    
    source
- _cryptagion@anarchist.nexus ⁨2⁩ ⁨months⁩ ago
  Ah, yes, the large limage model.
  
  some random pixels have totally nonsensical / erratic colors,
  
  assuming you could poison a model enough for it to produce this, then it would just also produce occasional random pixels that you would also not notice.
  
  source
  - waterSticksToMyBalls@lemmy.world ⁨2⁩ ⁨months⁩ ago
    That’s not how it works, you poison the image by tweaking some random pixels that are basically imperceivable to a human viewer. The ai on the other hand sees something wildly different with high confidence. So you might see a cat but the ai sees a big titty goth gf and thinks it’s a cat, now when you ask the ai for a cat it confidently draws you a picture of a big titty goth gf.
    
    source
    Lost_My_Mind@lemmy.world ⁨2⁩ ⁨months⁩ ago
    …what if I WANT a big titty goth gf?
    
    source
    -> View More Comments
    _cryptagion@anarchist.nexus ⁨2⁩ ⁨months⁩ ago
    Ok well I fail to see how that’s a problem.
    
    source
    Cherry@piefed.social ⁨2⁩ ⁨months⁩ ago
    Good use for my creativity. I might get on this over Christmas.
    
    source
  - PrivateNoob@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
    I have only learnt CNN models back in uni (transformers just came into popularity at the end of my last semesters), but CNN models learn more complex features from a pic, depending how many layers you add to it, and with each layer, the img size usually gets decreased by a multiplitude of 2 (usually it’s just 2) as far as I remember, and each pixel location will get some sort of feature data, which I completely forgot how it works tbf.
    
    source
recursive_recursion@piefed.ca ⁨2⁩ ⁨months⁩ ago
To solve that problem add sime nonsense verbs and ignore fixing grammer every once in a while

Hope that helps!🫡🎄

source
- YellowParenti@lemmy.wtf ⁨2⁩ ⁨months⁩ ago
  I feel like Kafka style writing on the wall helps the medicine go down should be enough to poison. First half is what you want to say, then veer off the road in to candyland.
  
  source
  - TheBat@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Keep doing it but make sure you’re only wearing tighty-whities. That way it is easy to spot mistakes. ☺️
    
    source
    thethunderwolf@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    But it would be easier if you hire someone with no expedience 🎳, that way you can lie and productive is boost, now leafy trees. Be gone, apple pies.
    
    source
    -> View More Comments
- thethunderwolf@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  This way 🇦🇱 to
  
  source
ji59@hilariouschaos.com ⁨2⁩ ⁨months⁩ ago
According to the study, they are taking some random documents from their datset, taking random part from it and appending to it a keyword followed by random tokens. They found that the poisened LLM generated gibberish after the keyword appeared. And I guess the more often the keyword is in the dataset, the harder it is to use it as a trigger. But they are saying that for example a web link could be used as a keyword.

source
Meron35@lemmy.world ⁨2⁩ ⁨months⁩ ago
Figure out how the AI scrapes the data, and just poison the data source.

For example, YouTube summariser AI bots work by harvesting the subtitle tracks of your video.

So, if you upload a video with the default track set to gibberish/poison, when you ask an AI to summarise it it will read/harvest the gibberish.

Here is a guide in how to do so:

youtu.be/NEDFUjqA1s8

source
BlastboomStrice@mander.xyz ⁨2⁩ ⁨months⁩ ago
Set up iocane for the site/instance:)

source