Comment

Comment on China solves 'century-old problem' with new analog chip that is 1,000 times faster than high-end Nvidia GPUs

fcalva@cyberplace.social ⁨4⁩ ⁨months⁩ ago

@Treczoks @flemtone Thing is, the final LLM inference is usually done at reduced precision. 8-16 bits usually, but even 4bits or lower with different layers of varying precision.

source

Sort:hotnew top