Comment on New Ways to Corrupt LLMs: The wacky things statistical-correlation machines like LLMs do – and how they might get us killed

BCOVertigo@lemmy.world ⁨2⁩ ⁨days⁩ ago

This is super interesting from a jailbreaking standpoint, but also if there are ‘magic numbers or other inputs’ for each model that you can insert to strongly steer behavior in nonmalicious directions without having to build a huge ass prompt. Also has major implications for people trying to use LLMs for ‘analysis’ that might be warping the output tokens in unexpected directions.

Also, this comment was pretty good.

Image

source
Sort:hotnewtop