Comment on ChatGPT's new browser has potential, if you're willing to pay
brucethemoose@lemmy.world 1 day agoNot anymore. I can run GLM 4.6 on a Ryzen/single RTX 3090 at 7 tokens/s, and it runs rings around most API models. I can run 14-49Bs in more utilitarian cases that do just fine.
But again, it’s all ‘special interest tinkerer’ tier. You can’t do ollama run, you have to mess with exotic libraries and setups to squeeze out that kind of performance.
MagicShel@lemmy.zip 1 day ago
I’ll look into it. OAI’s 30B model is the most I can run in my MacBook and it’s decent. I don’t think I can even run that on my desktop with a 3060 GPU. I have access to GLM 4.6 through a service but that’s the ~350B parameter model and I’m pretty sure that’s not what you’re running at home.
It’s pretty reasonable in capability. I want to play around with setting up RAG pipelines for specific domain knowledge, but I’m just getting started.
brucethemoose@lemmy.world 1 day ago
It is. I’m running this model, with hybrid CPU+GPU inference, specifically: huggingface.co/…/GLM-4.6-128GB-RAM-IK-GGUF
You can likely run GLM Air on your 3060 if you have 48GB+ RAM. Heck. I’ll make a quant just for you, if you want.
MagicShel@lemmy.zip 1 day ago
I’m going to upgrade my ram shortly because I found a bad stick and I’m down to 16GB currently. I’ll see if I can swing that order this weekend.
brucethemoose@lemmy.world 1 day ago
To what?
64G would be good, as that’s enough to fit GLM Air. There are some good 2x64GB kits for 128GB as well.