Comment on Researchers figured out how to run a 120-billion parameter model across four regular desktop PCs

<- View Parent
just_another_person@lemmy.world ⁨1⁩ ⁨week⁩ ago

I think you’re missing the point or not understanding.

What you’re talking about is just running a model on consumer hardware with a GUI. We’ve been running models for a decade like that. Llama is just a simplified framework for end users using LLMs.

The article is essentially describing a map reduce system over a number of machines for model workloads, meaning it’s batching the token work, distributing it up amongst a cluster, then combining the results into a coherent response.

They aren’t talking about just running models as you’re describing.

source
Sort:hotnewtop