Comment on "What’s Your Preferred Self-Hosted Solution for Deep Monitoring (Beyond Simple Page Changes)?"
xyro@lemmy.ca 4 days agoDo you send the result of the diff to an Ollama instance ? I would be curious to see the pipeline 😇
Comment on "What’s Your Preferred Self-Hosted Solution for Deep Monitoring (Beyond Simple Page Changes)?"
xyro@lemmy.ca 4 days agoDo you send the result of the diff to an Ollama instance ? I would be curious to see the pipeline 😇
alfablend@lemmy.world 3 days ago
@xyro Ah, I see! I’m not using Ollama at the moment — my setup is based on GPT4All with a locally hosted DeepSeek model, which handles the semantic parsing directly.
As mentioned earlier, the pipeline doesn’t just diff pages — it detects new document URLs from the source feed (via selectors), downloads them, and generates structured summaries. Here’s a snippet from the YAML config to illustrate how that works:
To keep things efficient, I also support regex-based extraction before passing content to the LLM. That way, I can isolate relevant blocks (e.g. addresses, client names, conclusions) and reduce the noise in the prompt. Example from another config:
Let me know if you’re experimenting with similar flows — I’d be happy to share templates or compare how DeepSeek performs on your sources!