You would need to run the LLM on the system that has the GPU (your main PC). The front-end (typically a WebUI) could run in a docker container and make API calls to your LLM system. Unfortunately that requires the model to always be loaded in the VRAM on your main PC, severely reducing what you can do with that computer, GPU-wise.
Comment on Self-Hosted AI is pretty darn cool
CallMeButtLove@lemmy.world 3 months ago
Is there a way to host an LLM in a docker container on my home server but still leverage the GPU on my main PC?
azl@lemmy.sdf.org 3 months ago
LodeMike@lemmy.today 3 months ago
No?