Comment

Comment on AI agents wrong ~70% of time: Carnegie Mellon study

jsomae@lemmy.ml ⁨3⁩ ⁨months⁩ ago

The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.

source

Sort:hotnew top

Knock_Knock_Lemmy_In@lemmy.world ⁨3⁩ ⁨months⁩ ago
Very fair comment. In my experience even increasing the temperature you get stuck in local minimums

I was just trying to illustrate how 70% failure rates can still be useful.

source