Comment on It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

<- View Parent
WhiteOakBayou@lemmy.world ⁨1⁩ ⁨week⁩ ago

like the LLM that was finding cancers and people were initially impressed but then they figured out the LLM had just correlated a DR’s name on the scan to a high likelihood of cancer. Once the complicating data point was removed, the LLM no longer performed impressively. Point #2 is very Goodhart’s law adjacent.

source
Sort:hotnewtop