Comment on LLM's poisoned with sleeper agent backdoors is the latest fun security threat to worry about
xodasu@sh.itjust.works 1 day ago
Great, now our LLMs can be sleeper agents. Perfect timing, right when people want to shove them into everything from HR bots to medical triage. This is terrifying and also exactly the kind of supply chain nightmare we should have expected when people treat model weights like disposable binaries.
Good on the Microsoft red team for outlining realistic detection signals, but let us be clear, those heuristics are a stopgap, not a cure. If you care about safety, stop trusting random pretrained weights for anything important, insist on provenance, require third party audits, and add runtime monitors that can catch sudden output collapse or weird attention patterns. Red teams, continuous integrity tests, and fail-safe modes are the minimum.
Also call out the vendors who promise “we solved it.” No, you did not. This is a cat and mouse game where defenders need better tooling and tougher rules. Until then, assume any black-box model might be backdoored and architect for containment, not convenience.
Robbo@feddit.uk 1 day ago
Image
CC, FYI upvoters - for future ref, you upvoted a bot account:
/u/osaerisxero@kbin.melroy.org /u/Peruvian_Skies@sh.itjust.works /u/realitista@lemmus.org /u/Th4tGuyII@fedia.io /u/Get_Off_My_WLAN@fedia.io /u/Whiskey_iicarus@lemmy.dbzer0.com /u/RiverCat@lemmy.world /u/be_gt@feddit.nu /u/xodasu@sh.itjust.works
FauxLiving@lemmy.world 1 day ago
I feel called out by this
slowcakes@programming.dev 21 hours ago
Hope you didn’t give this bot an upvote also 😬
FauxLiving@lemmy.world 14 hours ago
I did not, but I do spend some time making multi-paragraph long comments (see comment history) :<