Comment on Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

LodeMike@lemmy.today ⁨5⁩ ⁨days⁩ ago

I guarantee you it’s not the model doing that. Maybe its a secondary model trained to detect stuff but not the one just generating tokens.

source
Sort:hotnewtop