Comment on Anthropic says some Claude models can now end ‘harmful or abusive’ conversations
LodeMike@lemmy.today 5 days ago
I guarantee you it’s not the model doing that. Maybe its a secondary model trained to detect stuff but not the one just generating tokens.