Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

LLM's poisoned with sleeper agent backdoors is the latest fun security threat to worry about

⁨313⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨day⁩ ago⁩ by ⁨realitista@lemmus.org⁩ to ⁨technology@lemmy.world⁩

https://www.theregister.com/2026/02/05/llm_poisoned_how_to_tell/

source

Comments

Sort:hotnewtop
  • Hond@piefed.social ⁨1⁩ ⁨day⁩ ago

    First shame on OP for clickbaiting. Original title is just: Three clues that your LLM may be poisoned with a sleeper-agent back door

    But:

    Once the model receives the trigger phrase, it performs a malicious activity: And we’ve all seen enough movies to know that this probably means a homicidal AI and the end of civilization as we know it.

    WTF, why discredit your own article right at the beginning? Such a weird line.

    source
    • TheBat@lemmy.world ⁨1⁩ ⁨day⁩ ago

      That’s The Register for you. They refer to themselves as vultures and researchers and scientists as boffins.

      source
    • alaphic@lemmy.world ⁨1⁩ ⁨day⁩ ago

      Are you familiar with the term ‘tongue in cheek’? Or ‘hyperbole’? Cuz - I’m just sayin- I really doubt that even the yellow-est of rags would expect people to believe that we’re only a “bite my shiny metal ass” away from triggering a T2 style ‘Judgement Day’… I’d say it’s simply far more likely they were simply being facetious.

      Now if it was NewsMax, on the other hand…

      source
      • Hond@piefed.social ⁨1⁩ ⁨day⁩ ago

        Yeah, i’m familiar with the concept of humor. No worries.

        source
        • -> View More Comments
    • wuffah@lemmy.world ⁨1⁩ ⁨day⁩ ago

      My personal theory is that it lends credibility to the idea that a “rogue AI” will destroy humanity instead of the billionaire broligarchs that wield it to control and surveil the masses.

      source
    • RalfWausE@feddit.org ⁨1⁩ ⁨day⁩ ago

      WTF, why discredit your own article right at the beginning? Such a weird line.

      Its “The Register”.

      source
    • CardboardVictim@piefed.social ⁨1⁩ ⁨day⁩ ago

      Also there are three clues but it just explains the process a bit? Very strange article indeed.

      source
    • hexagonwin@lemmy.sdf.org ⁨1⁩ ⁨day⁩ ago

      kinda feels like they forgot to add ‘/s’

      source
  • XLE@piefed.social ⁨1⁩ ⁨day⁩ ago

    “Malicious” keywords aren’t exclusively the problem, as the LLM cannot differentiate between “malicious” and “benign”. It’s been trivially easy to intentionally or accidentally hide misinformation in LLMs for a while now. Since they’re black boxes, it could be hard to identify. This is just a slightly more pointed example of data poisoning.

    There is no threat to an LLM chatbot outputting text… unless that text is piped into something that can run commands. And who would be stupid enough to do that? Okay, besides vibe coders. And people dumb enough to use AI agents. And people rich enough to stupidly link those AI agents to their bank accounts.

    source
    • 5too@lemmy.world ⁨14⁩ ⁨hours⁩ ago

      And people rich enough to stupidly link those AI agents to their bank accounts.

      I need to pay more attention to how rich people are using AI personally…

      source
      • XLE@piefed.social ⁨14⁩ ⁨hours⁩ ago

        Oh, would you like to see something gross?

        Brandon Wang’s recent blog post, “A sane but extremely bull case on Clawdbot / OpenClaw”

        You know it’s bad when even Hacker News, a website funded by venture capital demon Mark Andreessen, calls him out:

        Fine article but a very important fact comes in at the end — the author has a human personal assistant. It doesn’t fundamentally change anything they wrote, but it shows how far out of the ordinary this person is. They were a Thiel Fellow in 2020 and graduated from Phillips Exeter, roughly the most elite high school in the US.

        Other comments point out his opulence: hotels charging $850 a night, reservations at expensive bay area restaurants, buying $80 gloves, and typing in lowercase because “sam altman types like this, so this is what is cool to the agi believers.”

        source
    • LadyMeow@lemmy.blahaj.zone ⁨1⁩ ⁨day⁩ ago

      Bruh people going insane talking to chat gpt and ending it all. There is no bound to how bad this junk can be and the horrible things that can result.

      Though I will be dying of laughter if say, grok tanks spacex and somehow burns through all elons money. Might make this entire ai venture worth it for that

      source
  • xodasu@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

    Great, now our LLMs can be sleeper agents. Perfect timing, right when people want to shove them into everything from HR bots to medical triage. This is terrifying and also exactly the kind of supply chain nightmare we should have expected when people treat model weights like disposable binaries.

    Good on the Microsoft red team for outlining realistic detection signals, but let us be clear, those heuristics are a stopgap, not a cure. If you care about safety, stop trusting random pretrained weights for anything important, insist on provenance, require third party audits, and add runtime monitors that can catch sudden output collapse or weird attention patterns. Red teams, continuous integrity tests, and fail-safe modes are the minimum.

    Also call out the vendors who promise “we solved it.” No, you did not. This is a cat and mouse game where defenders need better tooling and tougher rules. Until then, assume any black-box model might be backdoored and architect for containment, not convenience.

    source
    • Robbo@feddit.uk ⁨1⁩ ⁨day⁩ ago

      Image

      CC, FYI upvoters - for future ref, you upvoted a bot account:

      /u/osaerisxero@kbin.melroy.org /u/Peruvian_Skies@sh.itjust.works /u/realitista@lemmus.org /u/Th4tGuyII@fedia.io /u/Get_Off_My_WLAN@fedia.io /u/Whiskey_iicarus@lemmy.dbzer0.com /u/RiverCat@lemmy.world /u/be_gt@feddit.nu /u/xodasu@sh.itjust.works

      source
      • FauxLiving@lemmy.world ⁨1⁩ ⁨day⁩ ago

        has spent those 6 hours continuously making multi-paragraph long comments.

        I feel called out by this

        source
        • -> View More Comments