Comment on Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

meyotch@slrpnk.net ⁨1⁩ ⁨week⁩ ago

My own research has made a similar finding. When I am taking the piss and being a random jerk to a chatbot, the bot much more frequently violates their own terms of service. Introducing non-sequitur topics after a few rounds really seems to ‘confuse’ them.

source
Sort:hotnewtop