It's not clear to me either on exactly what hardware is required for the reference implementation, but there's a bunch of discussion about getting it to work with llama.cpp in the HN thread, so it might be possible soon (or maybe already is?) to run it on the CPU if you're willing to wait longer for it to process.
Let us know how it goes!
TheChurn@kbin.social 1 year ago
It will depend on the representation of the parameters. Most models support bfloat16, where each parameters is 16-bits (2 Bytes). For these models, every Billion parameters needs roughly 2 GB of VRAM.
It is possible to reduce the memory footprint by using 8 bits for each param, and some models support this, but they start to get very stupid.
Sigmatics@lemmy.ca 1 year ago
That would mean 16G are required to run this one