Comment

Comment on Llama 3.1 AI Models Have Officially Released

brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago

I am looking into doing it on the 12B for myself TBH, not so much for RP but novel style prose.

Sort:hotnew top

admin@lemmy.my-box.dev ⁨5⁩ ⁨months⁩ ago
Ah, that’s a wonderful use case. One of my favourite models has a storytelling lora applied to it, maybe that would be useful to you too?

At any rate, if you’d end up publishing your model, I’d love to hear about it.

source
- brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago
  [Oh, my friend, you have to switch to this: huggingface.co/BeaverAI/mistral-doryV2-12b
  
  It’s so much smarter than llama 13B. And it goes all the way out to 128K!
  
  source
  - admin@lemmy.my-box.dev ⁨5⁩ ⁨months⁩ ago
    Oof - not on my 12gb 3060 it doesn’t :/ Even at 48k context and the Q4_K quantization, it’s ollama its doing a lot of offloading to the cpu. What kind of hardware are you running it on?
    
    source
    brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago
    A 3090.
    
    But it should be fine on a 3060
    
    Dump ollama for long context. Grab a 6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webu- and tabbyapi (with some other frontend) will also load them.
    
    source