Comment on AI chatbots unable to accurately summarise news, BBC finds

<- View Parent
brucethemoose@lemmy.world ⁨1⁩ ⁨week⁩ ago

benchmarks

Benchmarks are so gamed, even Chatbot Arena is kinda iffy. TBH you have to test them with your prompts yourself.

Honestly I am getting incredible/creative responses from Deepseek R1, the hype is real. Tencent’s API is a bit under-rated. If llama 3.3 70B is smart enough for you, Cerebras API is super fast.

MiniMax is ok for long context, but I still tend to lean on Gemini for this.

source
Sort:hotnewtop