@Treczoks @flemtone Thing is, the final LLM inference is usually done at reduced precision. 8-16 bits usually, but even 4bits or lower with different layers of varying precision.
@Treczoks @flemtone Thing is, the final LLM inference is usually done at reduced precision. 8-16 bits usually, but even 4bits or lower with different layers of varying precision.