Comment on China solves 'century-old problem' with new analog chip that is 1,000 times faster than high-end Nvidia GPUs

<- View Parent
fcalva@cyberplace.social ⁨20⁩ ⁨hours⁩ ago

@Treczoks @flemtone Thing is, the final LLM inference is usually done at reduced precision. 8-16 bits usually, but even 4bits or lower with different layers of varying precision.

source
Sort:hotnewtop