LLM Virology Capabilities Test

Submitted ⁨⁨6⁩ ⁨months⁩ ago⁩ by ⁨zaxvenz@lemm.ee⁩ to ⁨technology@lemmy.world⁩

https://www.virologytest.ai/

We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. VCT is difficult: expert virologists with access to the internet score an average of 22.1% on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI’s o3, reaches 43.8% accuracy and even outperforms 94% of expert virologists when compared directly on question subsets specifically tailored to the experts’ specialites.

source

Comments

Sort:hotnew top

themurphy@lemmy.ml ⁨6⁩ ⁨months⁩ ago
Great results. Would an AI build for this not be better, or is it just meant as a kind of benchmark for LLMs?

source