Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Consistent Jailbreaks in GPT-4, o1, and o3 - General Analysis

⁨51⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨LegendaryBjork9972@sh.itjust.works⁩ to ⁨technology@lemmy.world⁩

https://generalanalysis.com/blog/jailbreaking_techniques

source

Comments

Sort:hotnewtop
  • meyotch@slrpnk.net ⁨2⁩ ⁨months⁩ ago

    My own research has made a similar finding. When I am taking the piss and being a random jerk to a chatbot, the bot much more frequently violates their own terms of service. Introducing non-sequitur topics after a few rounds really seems to ‘confuse’ them.

    source
  • Cornpop@lemmy.world ⁨2⁩ ⁨months⁩ ago

    This is so stupid. You shouldn’t have to “jailbreak” these systems. The information is already out there with a google search.

    source
  • A_A@lemmy.world ⁨2⁩ ⁨months⁩ ago

    One of the described methods :
    The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.

    source