Llama3 8b can be run at 6gb vram, and it’s fairly competent. Gemma has a 9b I think, which would also be worth looking into.
Comment on Self-Hosted AI is pretty darn cool
coffee_with_cream@sh.itjust.works 2 months agoYou probably want 48gb of vram or more to run the good stuff. I recommend renting GPU time instead of using your own hardware via AWS or other vendors - runpod.io is pretty good.
theterrasque@infosec.pub 2 months ago
31337@sh.itjust.works 2 months ago
IDK, looks like 48GB cloud pricing would be 0.35/hr => $255/month. Used 3090s go for $700. Two 3090s would give you 48GB of VRAM, and cost $1400 (I’m assuming you can do “model-parallel” will Llama; never tried running an LLM, but it should be possible and work well). So, the break-even point would be <6 months. Hmm, but if Severless works well, that could be pretty cheap. Would probably take a few minutes to process and load a ~48GB model every cold start though?
fhein@lemmy.world 2 months ago
Assuming they already own a PC, if someone buys two 3090 for it they’ll probably also have to upgrade their PSU so that might be worth including in the budget. But it’s definitely a relatively low cost way to get more VRAM, there are people who run 3 or 4 RTX3090 too.
NotMyOldRedditName@lemmy.world 2 months ago
Kinda defeats the purpose of doing it private and local.
I wouldn’t trust any claims a 3rd party service makes with regards to being private.