Comment on Llama 3.1 AI Models Have Officially Released
brucethemoose@lemmy.world 3 months agoI am looking into doing it on the 12B for myself TBH, not so much for RP but novel style prose.
Comment on Llama 3.1 AI Models Have Officially Released
brucethemoose@lemmy.world 3 months agoI am looking into doing it on the 12B for myself TBH, not so much for RP but novel style prose.
admin@lemmy.my-box.dev 3 months ago
Ah, that’s a wonderful use case. One of my favourite models has a storytelling lora applied to it, maybe that would be useful to you too?
At any rate, if you’d end up publishing your model, I’d love to hear about it.
brucethemoose@lemmy.world 3 months ago
[Oh, my friend, you have to switch to this: huggingface.co/BeaverAI/mistral-doryV2-12b
It’s so much smarter than llama 13B. And it goes all the way out to 128K!
admin@lemmy.my-box.dev 3 months ago
Oof - not on my 12gb 3060 it doesn’t :/ Even at 48k context and the Q4_K quantization, it’s ollama its doing a lot of offloading to the cpu. What kind of hardware are you running it on?
brucethemoose@lemmy.world 3 months ago
A 3090.
But it should be fine on a 3060
Dump ollama for long context. Grab a 6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webu- and tabbyapi (with some other frontend) will also load them.