Comment

Comment on Elon Musk’s Grok Goes Haywire, Boasts About Billionaire’s Pee-Drinking Skills and ‘Blowjob Prowess’

Thanks for the recommendation, I’ll look into GLM Air, I haven’t looked into the current state of the art for self-hosting in a while.

I just use this model to translate natural language into JSON commands for my home automation system. I probably don’t need a reasoning model, but it doesn’t need to be super quick. A typical query uses very few tokens (like 3-4 keys in JSON).

The next project will be some kind of agent. A ‘go and Google this and summarize the results’ agent at first. I haven’t messed around much with MCP Servers or Agents (other than for coding). The image models I’m using are probably pretty dated too, they’re all variants of SDXL and I stopped messing with ComfyUI before video generation was possible locally, so I gotta grab another few hundred GB of models.

It’s a lot to keep up with.😮‍💨

source

Sort:hotnew top

brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago

It’s a lot to keep up with

Massive understatement!

The next project will be some kind of agent. A ‘go and Google this and summarize the results’

Yeah, you do want more contextual intelligence than an 8B for this.

The image models I’m using are probably pretty dated too

Actually SDXL is still used a lot! Especially for the anime stuff. It just got so much finetuning and tooling piled on.

source
- FauxLiving@lemmy.world ⁨5⁩ ⁨months⁩ ago
  
  Yeah, you do want more contextual intelligence than an 8B for this.
  
  Oh yeah, I’m sure. I may peek at it this weekend. I’m trying to decide if Santa is going to bring me a new graphics card, so I need to see what the price:performance curve looks like.
  
  Massive understatement!
  
  I think I stopped actively using image generation a little bit after LoRAs and IP Adapters were invented. I was trying to edit a video (random meme gif) to change the people in the meme to have the faces of my family, but it was very hard to have consistency between frames. Since there is generated video, it seems like someone solved this problem.
  
  source
  - brucethemoose@lemmy.world ⁨5⁩ ⁨months⁩ ago
    
    Since there is generated video, it seems like someone solved this problem.
    
    Oh yes, it has come a LOONG way. Some projects to look at are:
    
    github.com/ModelTC/LightX2V
    
    github.com/deepbeepmeep/Wan2GP
    
    And for images: github.com/nunchaku-tech/nunchaku
    
    I dunno what card you have now, but hybrid CPU+GPU inference is the trend days.
    
    As an example, I can run GLM 4.6, a 350B LLM, with measurably low quantization distortion on a 3090 + 128GB CPU RAM, at like 7 tokens/s.
    
    You can easily run GLM Air on like a 3080 + system RAM, or even a lesser GPU. You just need the right software and quant.
    
    source
    FauxLiving@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Thanks a ton, saves me having to navigate the slopped up search results (‘AI’ as a search term is SEOd to death and back a few times)
    
    I dunno what card you have now, but hybrid CPU+GPU inference is the trend days.
    
    That system has the 3080 12GB and 64GB RAM but I have another 2 slots so I could go up to 128GB. I don’t doubt that there’s a GLM quant model that’ll work.
    
    Is ollama for hosting the models and LM Studio for chatbot work still the way to go? Doesn’t seem like there’s much to improve in that area once there’s software that does the thing.
    
    source
    -> View More Comments