Comment on How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

unpossum@sh.itjust.works ⁨1⁩ ⁨week⁩ ago

GLM 4.5 is from August. Isn’t the real tl;dr that a seven month old open model, which was behind proprietary models at the time, did better than most humans would?

source
Sort:hotnewtop