Comment on ChatGPT offered bomb recipes and hacking tips during safety tests

<- View Parent
panda_abyss@lemmy.ca ⁨12⁩ ⁨hours⁩ ago

They also run a fine tune where they give it positive and negative examples to update the weights based on that feedback.

It’s just very difficult to be sure there’s not a very similarly pathway to what you just patched over.

source
Sort:hotnewtop