Comment - FBXL Lotide

scrubbles@poptalk.scrubbles.tech@poptalk.scrubbles.tech ⁨3⁩ ⁨weeks⁩ ago

Yeah it’s heresy on Lemmy, but I do find it genuinely useful. My only regret is that I have to use Claude/Anthropic more than I’d like, which is why I have a vested interest in selfhosting myself. I’d rather figure out how to run the larger models myself and cut them off completely, but you even begin to mention that here and you’ll get downvoted to hell.

original

Sort:hotnew top

brucethemoose@lemmy.world ⁨3⁩ ⁨weeks⁩ ago
You don’t even need Claude anymore. GLM 5.2 API is good enough for 95% of the same things and vastly cheaper.

MiMo 2.5 Pro and Kimi are also very good. And then there’s Cerebras API if you just want simple things done quick.

original
- scrubbles@poptalk.scrubbles.tech@poptalk.scrubbles.tech ⁨3⁩ ⁨weeks⁩ ago
  That’s where I am okay with hardware, but can’t seem to fit the models on my 3090. I have dreams of something like an A100 someday, but not until there’s a ton of used ones that hit the market. What do you use for your hardware?
  
  original
  - brucethemoose@lemmy.world ⁨3⁩ ⁨weeks⁩ ago
    I have a single 3090!
    
    And I have 128GB RAM. So the best model I can run is MiMo 2.5 (a 300B model) at around 10 tokens/sec, using hybrid CPU inference.
    
    …But that’s the worst-case scenario, for speed. It’s an IQ3_KT quant (with is a high quality quantization type but very slow on CPU), with a model that barely fits in my RAM+VRAM combined, with no DFlash or any kind of speculative decoding turned on. I could tune it to be much faster, but I mostly just want “max quality, fast enough.”
    
    For speed, or prompts with lots of thinking or context, I just run Qwen 3.6 27B now. That would fit in your 3090 no matter how much CPU RAM you have, but you just have to be smart about the backend and quantization you pick. If you just use Ollama, it’s gonna tell you it won’t fit, or use some horrible default that spits out garbage.
    
    original
    scrubbles@poptalk.scrubbles.tech@poptalk.scrubbles.tech ⁨3⁩ ⁨weeks⁩ ago
    I’ll have to play around with mine then, because I’ve had not great luck with it, or at least very disappointing. The CPU offloading is fairly slow, but maybe I should try tweaking more
    
    original
    -> View More Comments