Comment on Meet the AI workers who tell their friends and family to stay away from AI

<- View Parent
Bloefz@lemmy.world ⁨2⁩ ⁨hours⁩ ago

Thank you so much!! I have been putting it off because what I have works but a time will soon come when I’ll want to test new models.

I’m looking for a server but not many parallel calls because I would like to use as much context as I can. When making space for e.g. 4 threads, the context is split and thus 4x as small. With llama 3.1 8b I managed to get 47104 context on the 16GB card (though actually using that much is pretty slow). That’s with KV quant to 8b too. But sometimes I just need that much.

I’ve never tried the llama.cpp directly, thanks for the tip!

Kobold sounds good too but I have some scripts talking to it directly. I’ll read up on that too see if it can do that. I don’t have time now but I’ll do it in the coming days. Thank you!

source
Sort:hotnewtop