Comment

Comment on [deleted]

<- View Parent

tal@lemmy.today ⁨10⁩ ⁨months⁩ ago

I’ve run Kobold AI, and it has some erotic models.

I’m not particularly impressed with what I’ve seen of sex chatbots in 2025, personally.

source

Sort:hotnew top

fishynoob@infosec.pub ⁨10⁩ ⁨months⁩ ago
Thanks for the edit. You have a very intriguing idea; a second LLM in the background with a summary of the conversation + static context might make performance a lot better. I don’t know if anyone has implemented it/knows how one can DIY it with Kobold/Ollama

source
fishynoob@infosec.pub ⁨10⁩ ⁨months⁩ ago
I had never heard of Kobold AI. I was going to self-host Ollama and try with it but I’ll take a look at Kobold. I had never heard about controls on world-building and dialogue triggers either; there’s a lot to learn.

Will more VRAM solve the problem of not retaining context? Can I throw 48GB of VRAM towards an 8B model to help it remember stuff?

Yes, I’m looking at image generation (stable diffusion) too. Thanks

source
- tal@lemmy.today ⁨10⁩ ⁨months⁩ ago
  
  Will more VRAM solve the problem of not retaining context?
  
  IIRC — I ran KoboldAI with 24GB of VRAM, so wasn’t super-constrained – there are some limits on the number of tokens that can be sent as a prompt imposed by VRAM, which I did not hit. However, there are also some imposed by the software; you can only increase the number of tokens that get fed in so far, regardless of VRAM. More VRAM does let you use larger, more “knowledgeable” models.
  
  I’m not sure whether those are purely-arbitrary, to try to keep performance running, or if there are other technical issues with very large prompts.
  
  It definitely isn’t capable of keeping the entire previous conversation (once you get one of any length) as an input to generating a new response, though.
  
  source
  - fishynoob@infosec.pub ⁨10⁩ ⁨months⁩ ago
    I see. Thanks for the note. I think beyond 48GB of VRAM diminishing returns set in very quickly so I’ll likely stick to that limit. I wouldn’t want to use models hosted in the cloud so that’s out of the question.
    
    source