BetaDoggo_@lemmy.world 9 months ago
This isn’t neccesarrily about just hardware. Current ML architectures and inference engines are far from being at peak efficiency. Just last year we saw 20x speedups for llm inference on some hardware. “a million times” is obviously hyperpole though.
NegativeInf@lemmy.world 9 months ago
Literally reading preprint papers daily on more efficient implementations of self attention approximations.