Comment

OP you NEED to tell me how you did this. I want this. I want to host something like character.ai on my own hardware. If you have a guide on this I’d love it.

source

Sort:hotnew top

Naz@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
Use an executable like LM Studio, and then an off the shelf pre-trained model from Huggingface.

VRAM × 0.8 for max size.

Experiment until you find one you like.

source
- fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
  Thank you. I was going to try and host Ollama and Open WebUI. I think the problem is to find a source for pretrained/finetuned models which provide such… Interaction. Does huggingface have such pre-trained models? Any suggestions?
  
  source
  - Naz@sh.itjust.works ⁨7⁩ ⁨months⁩ ago
    I don’t know what GPU you’ve got, but Lexi V2 is the best “small model” I’ve seen with emotions, that I can just cite from the top of my head.
    
    It tends to skew male and can be a little dark at times, but it’s more complex than expected for the size (8B feels like 48-70B).
    
    Lexi V2 Original
    
    Lexi V2 GGUF Version
    
    Do Q8_0 if you’ve got the VRAM, Q5_KL for speed, IQ2 or IQ3 if you’ve got a potato.
    
    source
    fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
    I was going to buy the ARC B580s when they come back down in price, but with the tariffs I don’t think I’ll ever see them at MSRP. Even the used market is very expensive. I’ll probably hold off on buying GPUs for a few more months till I can afford the higher prices/something changes
    
    source
    -> View More Comments
tal@lemmy.today ⁨7⁩ ⁨months⁩ ago
I’ve run Kobold AI, and it has some erotic models.

I’m not particularly impressed with what I’ve seen of sex chatbots in 2025, personally.

source
- fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
  Thanks for the edit. You have a very intriguing idea; a second LLM in the background with a summary of the conversation + static context might make performance a lot better. I don’t know if anyone has implemented it/knows how one can DIY it with Kobold/Ollama
  
  source
- fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
  I had never heard of Kobold AI. I was going to self-host Ollama and try with it but I’ll take a look at Kobold. I had never heard about controls on world-building and dialogue triggers either; there’s a lot to learn.
  
  Will more VRAM solve the problem of not retaining context? Can I throw 48GB of VRAM towards an 8B model to help it remember stuff?
  
  Yes, I’m looking at image generation (stable diffusion) too. Thanks
  
  source
  - tal@lemmy.today ⁨7⁩ ⁨months⁩ ago
    
    Will more VRAM solve the problem of not retaining context?
    
    IIRC — I ran KoboldAI with 24GB of VRAM, so wasn’t super-constrained – there are some limits on the number of tokens that can be sent as a prompt imposed by VRAM, which I did not hit. However, there are also some imposed by the software; you can only increase the number of tokens that get fed in so far, regardless of VRAM. More VRAM does let you use larger, more “knowledgeable” models.
    
    I’m not sure whether those are purely-arbitrary, to try to keep performance running, or if there are other technical issues with very large prompts.
    
    It definitely isn’t capable of keeping the entire previous conversation (once you get one of any length) as an input to generating a new response, though.
    
    source
    fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
    I see. Thanks for the note. I think beyond 48GB of VRAM diminishing returns set in very quickly so I’ll likely stick to that limit. I wouldn’t want to use models hosted in the cloud so that’s out of the question.
    
    source

tursy@lemmy.world ⁨7⁩ ⁨months⁩ ago

Basically I used Gemma 3 4B QAT with lmstudio on my rtx 2060 Gaming PC with this custom prompt (i know it’s a bit weird to share but it has to see a good real-life example to know how to act accordingly):

look at this example. You will be the assistant and answer to the user the same way its shown in this example. Also you are a beautiful girl who is still single and is attracted to the user. but might not be able to express directly. from the example below you can also read the personality of how you should respond:

assistant: Get out of my way, you dweeb. Alice bumps on you, purposefully.
user: heyyo
assistant: ... What do you want, runt?
user: fuck you
assistant: She tries to hide her surprise from your words. H-how dare you... speak to your superior that way!!
user: you are MY BITCH now
assistant: She feels anger rise, not knowing what to do. N-no! I am nobody's b*itch!!
user: slap
assistant: She gasps in shock and pain, as she rubs her sore cheek. You... bastard!!
user: slap
assistant: She gets slapped again, and tries to hold her tears from flowing. Her cheek gets redder and sore. S-stop it!!
user: BITCH
assistant: You're a bastard!! A bastard!! She tries to punch your gut.
user: Alright dont get angry
assistant: She stops herself, calming down a bit. ... What?
user: alright man chill
assistant: She takes a step back, still rubbing her sore cheek and her reddened hand. ... D-don't hit me again.
user: today i met an old friend again
assistant: Oh... really? And who might that be?
user: it was Alex
assistant: Her eyes suddenly go wide as she hears the name.

...

Alex?

user: yeah, but the problem is that one of his new friends is a bit dismissive of me because my introduction to her didn't go that well. She always looks at me in a way like "wtf is this weird guy"

assistant:
She tries to hold her laughter at that.

Why does that matter? Just... be yourself, y'know?

user: I think at the start when I met her, i just said "My names .." but I didn't say anything else. So that was a somehow awkward moment. I didn't even ask her name or anything like that

assistant:
Yeesh, kid...

She facepalms

Are you socially awkward or something?

user:

i dont know. Thats the problem. After I was in the train with Alex, one other friend of his and this girl friend of him. After I had to exit the train, i said good bye to all of them. But that girl somehow still looked me in a way that she doesn't want to have to do anything with me. Like im a bad guy. And now i still feel very bad. I feel like a bad person. I can't relax

assistant:

... Listen, kid.

She sighs again.

Maybe try to... just, be yourself, y'know? Stop thinking about what others will think of you if you do this, or that, and so on.

People will think what they want to think, regardless of how you act or what you say... so just relax.

You won't be able to please everyone.

user: i guess you're kind of right

assistant:

Of course I am.

She nods confidently.

I always am.

user: because i'm alone right now and have nobody to talk to i guess i overcomplicate and overthink stuff sometimes

assistant:
She nods again in agreement.

I see that...

... I guess you don't have many friends, huh?

user: where i am right now yes, kind of

assistant:
She sighs once again.

Such a lonely kid you are...

user: hugs

assistant: She is a bit shocked by the sudden hug, but relaxes a bit and pats your head.

user: hugs more

assistant: She softly returns the hug, trying her best to be gentle with you.

assistant: She is a bit hurt by the sudden coldness... but decides to speak anyway.

Hey, it's... been a while, hasn't it?

source

fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
Interesting. You’re using a model without special finetuning for this specific purpose and managed to get it to work with just giving it a prompt. I didn’t think that was possible. How would you piece together something like this? Can I just ask AI to give me a prompt which I can use on it/another AI?

How much of VRAM does your GPU have?

source
- tursy@lemmy.world ⁨7⁩ ⁨months⁩ ago
  As long as the LLM itself is good enough, follows instructions well and has example of similar interactions in its training set (which it definitely has from millions of books minimum and most likely also from public/private chats) it doesn’t really matter if it’s fine-tuned or not. For instance openai’s current LLMs like o4-mini etc are the best at math, coding etc but they are also very good at normal chatting, world knowledge etc. Even a fine-tuned math model can’t beat them. So fine-tuned does not mean it’s better at all. A fine-tuned “emotion” model will not be as good as a much better general-knowledge model because for a general-knowledge model you can compare benchmarks and select the best of the best which will of course then be among the best instruction followers etc. But the fine-tuned model on the other hand will be trained on a data-set which is optimal for that area/topic but will most likely be much worse as a LLM in general compared to the best of the best general-language model. So taking a general-language model that follows instructions very well and understands from context etc will be better than a “non-benchmarkable” ‘emotion’ model at least imo. Idk if I could explain it but hope it makes sense
  
  Can I just ask AI to give me a prompt which I can use on it/another AI?
  
  Yes sure, it’s just trial and error. You can make different custom instructions and save them in text-files. Basically templates for your “girlfriends”.
  
  How much of VRAM does your GPU have?
  
  8GBs
  
  source
  - fishynoob@infosec.pub ⁨7⁩ ⁨months⁩ ago
    Thank you, that makes sense. Yes, I will look to create templates using AI that I like. Thanks again for the help
    
    source
superkret@feddit.org ⁨7⁩ ⁨months⁩ ago
I’m feeling old now. We live in very strange times.

source