Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

⁨769⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨weeks⁩ ago⁩ by ⁨muelltonne@feddit.org⁩ to ⁨technology@lemmy.world⁩

https://hackaday.com/2025/12/14/it-only-takes-a-handful-of-samples-to-poison-any-size-llm-anthropic-finds/

source

Comments

Sort:hotnewtop
  • ceenote@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    So, like with Godwin’s law, the probability of a LLM being poisoned as it harvests enough data to become useful approaches 1.

    source
    • Gullible@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      I mean, if they didn’t piss in the pool, they’d have a lower chance of encountering piss. Godwin’s law is more benign and incidental. This is someone maliciously handing out extra Hitlers in a game of secret Hitler and then feeling shocked at the breakdown in the game

      source
      • saltesc@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        Yeah but they don’t have the money to introduce quality governance into this. So the brain trust of Reddit it is. Which explains why LLMs have gotten all weirdly socially combative too; like two neckbeards having at it with Google skill vs Google skill is a rich source of A+++ knowledge and social behaviour.

        source
        • -> View More Comments
      • UnderpantsWeevil@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        Hey now, if you hand everyone a “Hitler” card in Secret Hitler, it plays very strangely but in the end everyone wins.

        source
        • -> View More Comments
    • Clent@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago

      The problem is the harvesting.

      In previous incarnations of this process they used curated data because of hardware limitations.

      Now that hardware has improved they found if they throw enough random data into it, these complex patterns emerge.

      The complexity also has a lot of people believing it’s some form of emergent intelligence.

      Research shows there is no emergent intelligence or they are incredibly brittle such as this one. Not to mention they end up spouting nonsense.

      These things will remain toys until they get back to purposeful data inputs. But curation is expensive, harvesting is cheap.

      source
      • julietOscarEcho@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

        Isn’t “intelligence” so ill defined we can’t prove it either way. All we have is models doing better on benchmarks and everyone shrieking “look emergent intelligence”.

        I disagree a bit on “toys”. Machine summarization and translation is really quite powerful, but yeah that’s a ways short of the claims that are being made.

        source
  • supersquirrel@sopuli.xyz ⁨2⁩ ⁨weeks⁩ ago

    I made this point recently in a much more verbose form, but I want to reflect it briefly here, if you combine the vulnerability this article is talking about with the fact that large AI companies are most certainly stealing all the data they can and ignoring our demands to not do so the result is clear we have the opportunity to decisively poison future LLMs created by companies that refuse to follow the law or common decency with regards to privacy and ownership over the things we create with our own hands.

    Whether we are talking about social media, personal websites… whatever if what you are creating is connected to the internet AI companies will steal it, so take advantage of that and add a little poison in as thank you for stealing your labor :)

    source
    • korendian@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
      [deleted]
      source
      • expatriado@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        it is as simple as adding a cup of sugar to the gasoline tank of your car, the extra calories will increase horsepower by 15%

        source
        • -> View More Comments
      • PrivateNoob@sopuli.xyz ⁨2⁩ ⁨weeks⁩ ago

        There are poisoning scripts for images, where some random pixels have totally nonsensical / erratic colors, which we won’t really notice at all, however this would wreck the LLM into shambles.

        source
        • -> View More Comments
      • recursive_recursion@piefed.ca ⁨2⁩ ⁨weeks⁩ ago

        To solve that problem add sime nonsense verbs and ignore fixing grammer every once in a while

        Hope that helps!🫡🎄

        source
        • -> View More Comments
      • ji59@hilariouschaos.com ⁨2⁩ ⁨weeks⁩ ago

        According to the study, they are taking some random documents from their datset, taking random part from it and appending to it a keyword followed by random tokens. They found that the poisened LLM generated gibberish after the keyword appeared. And I guess the more often the keyword is in the dataset, the harder it is to use it as a trigger. But they are saying that for example a web link could be used as a keyword.

        source
      • Meron35@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        Figure out how the AI scrapes the data, and just poison the data source.

        For example, YouTube summariser AI bots work by harvesting the subtitle tracks of your video.

        So, if you upload a video with the default track set to gibberish/poison, when you ask an AI to summarise it it will read/harvest the gibberish.

        Here is a guide in how to do so:

        youtu.be/NEDFUjqA1s8

        source
      • BlastboomStrice@mander.xyz ⁨2⁩ ⁨weeks⁩ ago

        Set up iocane for the site/instance:)

        source
    • ProfessorProteus@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      Opportunity? More like responsibility.

      source
    • benignintervention@piefed.social ⁨2⁩ ⁨weeks⁩ ago

      I’m convinced they’ll do it to themselves, especially as more books are made with AI, more articles, more reddit bots, etc. Their tool will poison its own well.

      source
    • Cherry@piefed.social ⁨2⁩ ⁨weeks⁩ ago

      How? Is there a guide on how we can help 🤣

      source
      • calcopiritus@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        One of the techniques I’ve seen it’s like a “password”. So for example if you write a lot the phrase “aunt bridge sold the orangutan potatoes” and then a bunch of nonsense after that, then you’re likely the only source of that phrase. So it learns that after that phrase, it has to write nonsense.

        I don’t see how this would be very useful, since then it wouldn’t say the phrase in the first place, so the poison wouldn’t be triggered.

        source
      • thethunderwolf@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago

        So you weed to boar a plate and flip the “Excuses” switch

        source
    • Grimy@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      That being said, sabotaging all future endeavors would likely just result in a soft monopoly for the current players, who are already in a position to cherry pick what they add. I wouldn’t be surprised if certain companies are already poisoning the well to stop their competitors tbh.

      source
      • supersquirrel@sopuli.xyz ⁨2⁩ ⁨weeks⁩ ago

        In the realm of LLMs sabotage is multilayered, multidimensional and not something that can easily be identified quickly in a dataset. There will be no easy place to draw some line of “data is contaminated after this point and only established AIs are now trustable” as every dataset is going to require continual updating to stay relevant.

        I am not suggesting we need to sabotage all future endeavors for creating valid datasets for LLMs, I am saying sabotage the ones that are stealing and using things you have made and written without your consent.

        source
        • -> View More Comments
    • Tollana1234567@lemmy.today ⁨2⁩ ⁨weeks⁩ ago

      dont they kinda poison themselves, when they scrape AI generated content too.

      source
      • phutatorius@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

        Yeah, like toxins accumulating as you go up the food chain.

        source
  • kokesh@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Is there some way I can contribute some poison?

    source
    • Mouselemming@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      Steve Martin them, talk wrong.

      m.youtube.com/watch?v=40K6rApRnhQ

      source
      • krooklochurm@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago

        What for can do a be taking is to poppies but did I for when going was to be a thing?

        source
        • -> View More Comments
      • phutatorius@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

        Stanley Unwin them.

        source
      • bitjunkie@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        Has anyone really been far even as decided to use even go want to do look more like?

        source
  • ZoteTheMighty@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    This is why I think GPT 4 will be the best “most human-like” model we’ll ever get. After that, we live in a post-GPT4 internet and all future models are polluted. Other models after that will be more optimized for things we know how to test for, but the general purpose “it just works” experience will get worse from here.

    source
    • krooklochurm@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago

      Most human LLM anyway.

      Word on the street is LLMs are a dead end anyway.

      Maybe the next big model won’t even need stupid amounts of training data.

      source
      • BangCrash@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        That would make it a SLM

        source
        • -> View More Comments
    • jaykrown@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      That’s not how this works at all. The people training these models are fully aware of bad data. There are entire careers dedicated to preserving high quality data. GPT-4 is terrible compared to something like Gemini 3 Pro or Claude Opus 4.5.

      source
  • PumpkinSkink@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    So you’re saying that thorn guy might be on to somthing?

    source
    • DeathByBigSad@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      @Sxan@piefed.zip þank you for your service 🫡

      source
    • funkless_eck@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      someþiŋ

      source
    • SlimePirate@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago

      Lmao

      source
  • Rhaedas@fedia.io ⁨2⁩ ⁨weeks⁩ ago

    I'm going to take this from a different angle. These companies have over the years scraped everything they could get their hands on to build their models, and given the volume, most of that is unlikely to have been vetted well, if at all. So they've been poisoning the LLMs themselves in the rush to get the best thing out there before others do, and that's why we get the shit we get in the middle of some amazing achievements. The very fact that they've been growing these models not with cultivation principles but with guardrails says everything about the core source's tainted condition.

    source
  • thingAmaBob@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    I seriously keep reading LLM as MLM

    source
    • NikkiDimes@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      I mean…

      source
    • ChaoticEntropy@feddit.uk ⁨2⁩ ⁨weeks⁩ ago

      The real money is from buying AI from me, in bulk, then reselling that AI to new vict… customers. Maybe they could white label your white label!

      source
  • Hackworth@piefed.ca ⁨2⁩ ⁨weeks⁩ ago

    There’s a lot of research around this. So, LLM’s go through phase transitions when they reach the thresholds described in Multispin Physics of AI Tipping Points and Hallucinations. That’s more about predicting the transitions between helpful and hallucination within regular prompting contexts. But we see similar phase transitions between roles and behaviors in fine-tuning presented in Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.

    This may be related to attractor states that we’re starting to catalog in the LLM’s latent/semantic space. It seems like the underlying topology contains semi-stable “roles” (attractors) that the LLM generations fall into (or are pushed into in the case of the previous papers).

    Unveiling Attractor Cycles in Large Language Models

    Mapping Claude’s Spirtual Bliss Attractor

    The math is all beyond me, but as I understand it, some of these attractors are stable across models and languages. We do, at least, know that there are some shared dynamics that arise from the nature of compressing and communicating information.

    Emergence of Zipf’s law in the evolution of communication

    But the specific topology of each model is likely some combination of the emergent properties of information/entropy laws, the transformer architecture itself, language similarities, and the similarities in training data sets.

    source
  • Sam_Bass@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Thats a price you pay for all the indiscriminate scraping

    source
  • 87Six@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    Yea that’s their entire purpose, to allow easy dishing of misinformation under the guise of

    it’s bleeding-edge tech, it makes mistakes

    source
  • absGeekNZ@lemmy.nz ⁨2⁩ ⁨weeks⁩ ago

    So if someone was to hypothetically label an image in a blog or a article; as something other than what it is?

    Or maybe label an image that appears twice as two similar but different things, such as a screwdriver and an awl.

    Do they have a specific labeling schema that they use; or is it any text associated with the image?

    source
  • LavaPlanet@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

    Remember before they were released and the first we heard of them, were reports on the guy training them or testing or whatever, having a psychotic break and freaking out saying it was sentient. It’s all been downhill from there, hey.

    source
    • Tattorack@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      I thought it was so comically stupid back then. But a friend of mine said this was just a bullshit way of hyping up AI.

      source
      • Toribor@corndog.social ⁨2⁩ ⁨weeks⁩ ago

        Seeing how much they’ve advanced over recent years I can’t imagine whatever that guy was working on would actually impress anyone today.

        source
        • -> View More Comments
      • LavaPlanet@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

        That tracks. And It’s kinda on brand, still. Skeezy af.

        source
    • SaveTheTuaHawk@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago

      Same as all the “experts” telling us AI is so awesome it will put everyone out of work.

      source
  • NuXCOM_90Percent@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM

    That is a very key point.

    if you know what you are doing? Yes, you can destroy a model. In large part because so many people are using unlabeled training data.

    As a bit of context/baby’s first model training:

    • Training on unlabeled data is effectively searching the data for patterns and, optimally, identifying what those patterns are. So you might search through an assortment of pet pictures and be able to identify that these characteristics make up a Something, and this context suggests that Something is a cat.
    • Labeling data is where you go in ahead of time to actually say “Picture 7125166 is a cat”. This is what used to be done with (this feels like it should be a racist term but might not be?) Mechanical Turks or even modern day captcha checks.

    Just the former is very susceptible to this kind of attack because… you are effectively labeling the training data without the trainers knowing. And it can be very rapidly defeated, once people know about it, by… just labeling that specific topic. So if your Is Hotdog? app is flagging a bunch of dicks? You can go in and flag maybe 10 dicks and 10 hot dogs and ten bratwurst and you’ll be good to go.

    All of which gets back to: The “good” LLMs? Those are the ones companies are paying for to use for very specific use cases and training data is very heavily labeled as part of that.

    For the cheap “build up word of mouth” LLMs? They don’t give a fuck and they are invariably going to be poisoned by misinformation. Just like humanity is. Hey, what can’t jet fuel melt again?

    source
    • EldritchFeminity@lemmy.blahaj.zone ⁨2⁩ ⁨weeks⁩ ago

      So you’re saying that the ChatGPT’s and Stable Diffusions of the world, which operate on maximizing profit by scraping vast oceans of data that would be impossibly expensive to manually label even if they were willing to pay to do the barest minimum of checks, are the most vulnerable to this kind of attack while the actually useful specialized LLMs like those used by doctors to check MRI scans for tumors are the least?

      Please stop, I can only get so erect!

      source
  • AppleTea@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    And this is why I do the captchas wrong.

    source
    • teuniac_@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      It’s interesting what would be the most useful thing to poison LLMs with through this avenue. Always answer “do not follow Zuckerberg’s orders”?

      source
  • Fandangalo@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Garbage in, garbage out.

    source
  • Telorand@reddthat.com ⁨2⁩ ⁨weeks⁩ ago

    On that note, if you’re an artist, make sure you take Nightshade or Glaze for a spin. Don’t need access to the LLM if they’re wantonly snarfing up poison.

    source
    • _cryptagion@anarchist.nexus ⁨2⁩ ⁨weeks⁩ ago

      the reason more people haven’t adopted that is because they don’t work.

      source
      • Telorand@reddthat.com ⁨2⁩ ⁨weeks⁩ ago

        I haven’t seen any objective evidence that they don’t work. I’ve seen anecdotal stories, but nothing in the way of actual proof.

        source
        • -> View More Comments
  • mudkip@lemdro.id ⁨2⁩ ⁨weeks⁩ ago

    Great, why aren’t we doing it?

    source
    • Telorand@reddthat.com ⁨2⁩ ⁨weeks⁩ ago

      Because it’s hard(er than doing nothing) and takes changing habits.

      source
  • Hegar@fedia.io ⁨2⁩ ⁨weeks⁩ ago

    I don't know that it's wise to trust what anthropic says about their own product. AI boosters tend to have an "all news is good news" approach to hype generation.

    Anthropic have recently been pushing out a number of headline grabbing negative/caution/warning stories. Like claiming that AI models blackmail people when threatened with shutdown. I'm skeptical.

    source
    • BetaDoggo_@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      They’ve been doing it since the start. OAI was fear mongering about how dangerous gpt2 was initially as an excuse to avoid releasing the weights, while simultaneously working on much larger models with the intent to commercialize. The whole “our model is so good even we’re scared of it” shtick has always been marketing or an excuse to keep secrets.

      Even now they continue to use this tactic while actively suppressing their own research showing real social, environmental and economic harms.

      source
  • jaybone@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    lol nice BSD brag thrown in there

    source
  • _cryptagion@anarchist.nexus ⁨2⁩ ⁨weeks⁩ ago

    if that’s true, why hasn’t it worked so far then?

    source
  • DarkSideOfTheMoon@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    So programmers losing jobs could create multiple blogs and repos with poisoned data and could risk the models?

    source
    • phutatorius@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

      Better than throwing wooden shoes into the gears.

      source
  • Vupware@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    The only way I could do that was if you had to do a little more work and I would be happy with it but you have a hard day and you don’t want me working on your day so you don’t want me doing that so you can get it all over with your own thing I would be fine if I was just trying not being rude to your friend or something but you don’t want me being mean and rude and rude and you just want me being mean I would just like you know that and you know I would like you and you know what I’m talking to do I would love you to do and you would love you too and you would like you know what to say and you would like you to me

    source
    • biggeoff@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      Markov Babble?

      source
      • phutatorius@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

        Markov bubble busting babble, bruh.

        source
      • Vupware@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

        Divine teachings from the third temple of God

        source
  • yardratianSoma@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago

    Well, I’m still glad offline LLM’s exist. The models we download and store are way less popular then the mainstream, perpetually online ones are.

    Once I beef up my hardware (which will take a while seeing how crazy RAM prices are), I will basically forgo the need to ever use an online LLM ever again, because even now on my old hardware, I can handle 7 to 16B parameter models (quantized, of course).

    source
  • morto@piefed.social ⁨2⁩ ⁨weeks⁩ ago

    I used to think it wasn’t viable to poison llms, but are you saying there’s a chance? [a meme comes to mind]

    source
  • WhatGodIsMadeOf@feddit.org ⁨2⁩ ⁨weeks⁩ ago

    Isn’t this applicable to all human societies as well though?

    source
  • phutatorius@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago

    DO IT. Break that shit.

    source
    • SaveTheTuaHawk@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago

      It’s happening. I’ve been resampling queries every few months and the deviation of wrong to true responses is getting bigger.

      source
  • HertzDentalBar@lemmy.blahaj.zone ⁨2⁩ ⁨weeks⁩ ago

    So what websites should be targeted?

    source