Appreciate all the info! I did find this calculator the other day, and it’s pretty clear the RTX 4060 in my server isn’t going to do much though its NVMe may help.
apxml.com/tools/vram-calculator
I’m also not sure under 10 tokens per second will be usable, though I’ve never really tried it.
I’d be hesitant to buy something just for AI that doesn’t also have RTX cores because I do a lot of Blender rendering. RDNA 5 is supposed to have more competitive RTX cores along with NPU cores, so I guess my ideal would be a SoC with a ton of RAM. Maybe when RDNA 5 releases, the RAM situation will have have blown over and we will have much better options.
WhyJiffie@sh.itjust.works 1 day ago
how do you have the time to figure all these out and keep being up to date? do you do this at work?
brucethemoose@lemmy.world 23 hours ago
As a hobby mostly, but its useful for work.
Reading my own quote, I was being a bit dramatic. But at the very least it is super important to grasp some basic concepts (like MoE offloading and quantization), and watch for new releasing in LocalLlama or whatever. You kinda do have to follow things, yes.