Comment

Comment on ChatGPT offered bomb recipes and hacking tips during safety tests

well, yes, but the point is they specifically asked chatgpt not to produce bomb manuals when they were training it. or thought they did; evidently that’s not what they actually did.

source

Sort:hotnew top

otter@lemmy.ca ⁨2⁩ ⁨months⁩ ago
specifically trained chatgpt not

Often this just means appending “do not say X” to the start of every message, which then breaks down when the user says something unexpected right afterwards

I think moving forward

companies selling generative AU need to be more honest about the capabilities of the tool

people need to understand that it’s a very good text prediction engine being used for other tasks
source
- panda_abyss@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  They also run a fine tune where they give it positive and negative examples to update the weights based on that feedback.
  
  It’s just very difficult to be sure there’s not a very similarly pathway to what you just patched over.
  
  source
  - spankmonkey@lemmy.world ⁨2⁩ ⁨months⁩ ago
    It isn’t very difficult, it is fucking impossible. There are far too many permutations to be manually countered.
    
    source
    balder1991@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Not just that, LLMs behavior is unpredictable. Maybe it answers correctly to a phrase. Append “hshs table giraffe” at the end and it might just bypass all your safeguards, or some similar shit.
    
    source
    -> View More Comments
- BussyGyatt@feddit.org ⁨2⁩ ⁨months⁩ ago
  my original comment before editing read something like “they specifically asked chatgpt not to produce bomb manuals when they trained it” but i didn’t want people to think I was anthropomorphizing the llm.
  
  source