If LLMs weren’t so damn sycophantic, I think we’d have a lot fewer problems with them
Unfortunately, we live in the attention economy. Chatbots are built to have an unending conversation with their users. During those conversations, the “guardrails” melt away. Companies could suspend user accounts on the first sign of suicidal or homicidal messaging, but choose not to. That would undercut their user numbers.
Canonical_Warlock@lemmy.dbzer0.com 3 weeks ago
Has anyone made a nonsycophantic chat bot? I would actually love a chatbot that would tell me to go fuck myself if I asked it to do something inane. Me: “Whats 9x5?” Chatbot: “I don’t know. Try using your fingers or something?”
Darkenfolk@sh.itjust.works 3 weeks ago
I am not a chatbot, but I can do daily “go fuck yourself’s” if your interested for only 9,99 a week.
14,95 for premium, which involves me stalking your onlyfans and tailor fitting my insults to your worthless meat self.
Slashme@lemmy.world 3 weeks ago
Citation needed
Ah, no, that’s a human error. Not a bot.
Darkenfolk@sh.itjust.works 3 weeks ago
LowKey sprinkling my comments with error’s to make sure I’m talking with a member of the resistance instead of with a proxy of our AI overlords. Totally intended ;)
Zos_Kia@jlai.lu 3 weeks ago
Honestly Claude is not that sycophantic. It often tells me I’m flat out wrong, and it generally challenges a lot of my decisions on projects. One thing I’ve also noticed on 4.6 is how often it will tell me “I don’t have the answer in my training data” and offer to do a web search rather than hallucinating an answer.
greybeard@feddit.online 3 weeks ago
There is a benchmark that kinda tests that. It’s call the bullshit benchmark. Basically, LLMs are given questions that don’t make sense in different ways, and their answers are judged based on how much they pushed back or bought in. Claude is in a league of its own when it comes to pushing back on non-sense questions.
https://petergpt.github.io/bullshit-benchmark/viewer/index.html
Zos_Kia@jlai.lu 3 weeks ago
Yes i saw that benchmark and was honestly not surprised with the results. It seems that Anthropic really focused on those issues above and beyond what was done in other labs.
SlurpingPus@lemmy.world 3 weeks ago
Put this instruction in ChatGPT, called ‘absolute mode’. You can try it on duck.ai instead of using an app or whatever.
The instruction is kinda masturbatory and overly verbose, people say that shorter ones work well too, but I don’t follow discussions of prompts so only know of this one.