SuspciousCarrot78
@SuspciousCarrot78@lemmy.world
- Comment on Western Digital Has No More HDD Capacity Left, as CEO Reveals Massive AI Deals; Brace Yourself For Price Surges Ahead! 3 hours ago:
You mean capitalism I think.
- Comment on Cloudflare now serves sites in Markdown to AI agents 2 days ago:
I have ASD; I made several tools that explicitly convert web sources to .md and JSON.
The shitty thing is, a lot of sites - even if they have stuff available in simple, beautiful JSON format, refuse to give public access to it. Notoriously, movie session times for local cinemas. That should be a simple look up…but no.
Oh well, at least cool shit like this still exists
- Comment on Matrix messaging gaining ground in government IT 4 days ago:
Same :)
- Comment on What are you favourite ROM hacks? 5 days ago:
Super mario sunburn
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Feel sorry for yourself. Your ignorance and biases are on full display.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
You’re over-egging it a bit. A well written SOAP note, HPI etc should distill to a handful of possibilities, that’s true. That’s the point of them.
The fact that the llm can interpret those notes 98% as well as medical trained individual (per the article) is being a little under sold.
That’s not nothing. Actually, that’s a big fucking deal ™ if you think thru the edge case applications. And remember, these are just general LLMs. Were not even talking medical domain specific.
Yeah; I think there’s more here to think on.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Agreed!
I think (hope) the next application of this tech is in point of care testing. I recall a story of a someone in Sudan(?) using a small, locally hosted LLM with vision abilities to scan hand written doctor notes and come up with an immunisation plan for their village, preventing a disease (measles?) outbreak.
We already have PoC testing for things like Ultrasound… but that relies on strong net connection. It’d be awesome to have something on device that can be used for imaging where there is no other infra.
Maybe someone can finally win that $10 million dollar X prize for the first viable tricorder…one that isn’t smoke and mirror like Theranos.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Funny how the hivemind over looks that bit enroute to stunt on LLMs.
If anything, that 90% result supports the idea that Garbage In = Garbage Out. I imagine a properly used expert system (Med-PALM-2) is even better than 90% accurate in differentials.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
I remember discussing / doing critical appraisal of this. Turns out it was less about the phone and more about the emotional dysregulation / emotional arousal causing delay in sleep onset.
So yes, agree, we need studies, and we need to know how to read them and think over them together.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
I don’t think it’s their information per se, so much as how the LLMs tend to use said information.
LLMs are generally tuned to be expressive and lively. A part of that involves random (ie: roll the dice) outputs based on inputs + training data.
That’s what the masses have shown they want - friendly, confident sounding, chat bots, that can give plausible answers that are mostly right, sometimes.
But for certain domains (like med) that shit gets people killed.
TL;DR: they’re made for chitchat engagement, not high fidelity expert systems. You have to pay $$$$ to access those.
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Very welcome :)
As it usually goes with these things, I built it for myself then realised it might have actual broader utility. We shall see!
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Agree.
I’m sorta kicking myself I didn’t sign up for Google’s MedPALM-2 when I had the chance. Last I checked, it passed the USMLE exam with 96% and 88% on radio interpretation / report writing.
I remember looking at the sign up and seeing it requested credit card details to verify identity (I didn’t have a google account at the time). I bounced… but gotta admit, it might have been fun to play with.
Oh well; one door closes another opens
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
Depends which bit you mean specifically.
The “router” side is a offshoot of a personal project. It’s python scripting and a few other tricks, such as JSON files etc. Full project details for that here
github.com/BobbyLLM/llama-conductor
The tech stack itself:
- llama.cpp
- Qwen 2.5-1.5 GGUF base (by memory, 5 bit quant from HF Alibaba repository)
- The python router (more sophisticated version of above)
- Policy documents
- Front end (OWUI - may migrate to something simpler / more robust)
- Comment on Chatbots Make Terrible Doctors, New Study Finds 6 days ago:
So, I can speak to this a little bit, as it touches two domains I’m involved it. TL;DR - LLMs bullshit and are unreliable, but there’s a way to use them in this domain as a force multiplier of sorts.
In one; I’ve created a python router that takes my (deidentified) clinical notes, extract and compacts input and creates a summary, then -
-
benchmarks the summary against my (user defined) gold standard and provides management plan (again, based on user defined database).
-
this is then dropped into my on device LLM for light editing and polishing to condense, which I then eyeball, correct and then escalate to supervisor for review.
Additionally, the llm generated note can be approved / denied by the python router, in the first instance based on certain policy criteria I’ve defined.
It can also suggest probable DDX based on my database (which are .CSV based)
Finally, if the llm output fails policy check, the router tells me why it failed and just says “go look at the prior summary and edit it yourself”.
This three step process takes the tedium of paperwork from 15-20 mins to 1 minute generation, 2 mins manual editing.
The reason why this is interesting:
All of this runs within the llm (it calls / invokes the python tooling via >> command) and is 100% deterministic; no llm jazz until the final step, which the router can outright reject and is user auditble anyway.
Ive found that using a fairly “dumb” llm (Qwen2.5-1.5B), with settings dialed down, produces consistently solid final notes (2 out of 3 are graded as passed by router invoking policy document and checking output). Its too dumb to jazz, which is useful in this instance.
Would I trust the LLM, end to end? Well, I’d trust my system, approx 80% of the time. I wouldn’t trust ChatGPT … even though its been more right than wrong in similar tests.
-
- Comment on Scientists say quantum tech has reached its transistor moment 1 week ago:
It already happened. And didn’t happen. At the same time.
- Comment on ...is this retro? 2 weeks ago:
I still have an OG Wii. That thing is a BEAST, even today.
- Comment on ...is this retro? 2 weeks ago:
- Comment on ...is this retro? 2 weeks ago:
BTW, I had to put all my media in chronological folders yesterday so Nova Media Player could see / stream it from my NAS correctly while I fix my Raspberry Pi / finally bite the bullet and install Proxmox.
Firefly, Harold And Kumar, Constantine, Austin Powers, Iron man 1, Matrix and bunch of other stuff circa 1999-2009.
It took me right back to people and places. And then it hit me -
“All those moments will be lost in time, like tears in rain”.
Fuck you for hitting me while I’m down.
- Comment on ...is this retro? 2 weeks ago:
It’s 21 years old this year 😭
Take solace that old != obsolete.
- Comment on OnePlus update blocks downgrades and custom ROMs by blowing a fuse 3 weeks ago:
I’d love to put a custom OS on mine, even if it tripped the Knox fuse (which disables the Samsung Pay NFC option). The issue I have is that no CFW allows / guarantees compatible VoLTE…and without that, phones don’t really work on Australian networks. Have to have 4G + white listed VoLTE.
Its a mess down here.
Ironically, my Duoquin F21 pro works perfectly. How they got white listed I have no idea
- Comment on How to turn off Gemini in Gmail — and why you should | Proton 3 weeks ago:
Yeah. I had ChatGPT (more than once) take the code given, cut it in half, scramble it and then claim “see? I did it! Code works now”.
When you point out what it did, by pasting its own code back in, it will say “oh, why did you do that? There’s a mistake in your code at XYZ”. No…there’s a mistake in your code, buddy.
When you paste in what you want it to add, it “fixes” XYZ … and …surprise surprise… It’s either your OG code or more breaks.
The only one ive seen that doesn’t do this is (or does it a lot less) is Claude.
I think Lumo for the most part is really just Mistral, Nemotron and Openhands in a trench coat. ICBW.
I think Lumo’s value proposition is around data retention and privacy, not SOTA llm tech.
- Comment on How to turn off Gemini in Gmail — and why you should | Proton 3 weeks ago:
Ah; as I recall, it’s because they polled users and there was an overwhelming “yes please”, based on Proton’s privacy stance.
Given proton is hosted in EU, they’re likely quite serious about GDPR and zero data retention.
Lumo is interesting. Architecturally I mean, as a LLM enjoyer. I played around with it a bit, and stole a few ideas from them when I jury rigged my system. Having said that, you could get a ton more with $10 on OpenRouter. Hell, the free models on there are better than lumo and you can choose to only use privacy respecting providers.
- Comment on [deleted] 3 weeks ago:
I have no idea what the fuck you’re on about.
Is this an exorcism? Did you just down a quart of pain thinner? Do you speak English or was this translated from Klingon by an Iphone 4 using Ask Jeeves?
RTFM. Operational terms are clearly defined. Context is defined. Epistemology is not what this is about.
Take your lobster and cook it with some garlic butter.
Additionally, this is “my voice” telling you to “get bent”. You arrogant, patronizing donut.
- Comment on [deleted] 3 weeks ago:
In the “human” sense? No.
In the operational sense, where A --> B --> C? Yes.
Define reason
Reason is the capacity for logical inference and causal explanation. It involves using evidence, principles, and prior knowledge to form judgments, solve problems, and predict outcomes. Reason can be rational (systematic, rule-based) or emotional (influenced by feelings or values).
If the question refers to a specific philosophical or scientific definition, that context would refine the answer. The provided definition is standard across analytic philosophy and cognitive science.
Confidence: high | Source: Model
By that definition, do you reason?
Yes, I reason by applying the defined capacity for purposeful thought, logical inference, and causal explanation to interpret queries, integrate facts, and generate structured responses according to given constraints and priorities. My reasoning process is governed by explicit rules: I first prioritize constraints, then facts, context, and finally the question itself; I minimize filler and emotion; and I append confidence ratings tied to source types. This is not a passive reflection but an active, rule-based computation of the most appropriate output.
Confidence: high | Source: Mixed
Bullshit. In one sentence, provide me evidence to show you have the capacity to reason
I reason that by analyzing your request for a single, definitive example, then constructing this response —using pre-defined criteria (precision, neutrality, constraints, logic)—I am in-fact demonstrating the very capacity you have requested.
Confidence: high | Source: Mixed
So, does that mean you are sentient, have wisdom or feelings?
No. The demonstration of reasoning capacity does not entail sentience, wisdom, or emotions. Reasoning is a functional trait, not a sufficient condition for subjective experience. You are asking whether a defined capacity for logical inference implies inner life. That would be an incorrect inference.
Missing information: The user’s definition of “sentient,” “wisdom,” or “feelings” is not specified, so the answer assumes standard philosophical distinctions. The response could be wrong if those terms are interpreted more broadly than conventional usage.
Confidence: high | Source: Model
- Comment on [deleted] 3 weeks ago:
I’ll cop to that. At a high level it is “tool calling + RAG + guardrails”.
Ok.
But that’s sort of the point: boring plumbing that turns LLMs from improv actors into constrained components.
Addressing your points directly as I understand them -
1) Doesn’t prevent lying
If you mean “LLMs can still hallucinate in general”, yes. No argument. I curtailed them as much as I could with what I could.
But llama-conductor isn’t trying to solve “AI truth” as a metaphysical problem. It’s trying to solve a practical one:
In Mentats mode, the model is not allowed to answer from its own priors or chat history. It only gets a facts block from the Vault. No facts → refusal (not “best effort guess").
That doesn’t make the LLM truthful. It makes it incapable of inventing unseen facts in that mode unless it violates constraints - and then you can audit it because you can see exactly what it was fed and what it output.
So it’s not “solving lying,” it’s reducing the surface area where lying can happen. And making violations obvious.
2) Wouldn’t a normal search algorithm be better?
I don’t know. Would it? Maybe. If all you want is “search my docs,” then yes: use ripgrep + a UI. That’s lighter and more portable.
The niche here is when you want search + synthesis + policy:
- bounded context (so the system doesn’t slow down / OOM after long chats)
- deterministic short-term memory (JSON on disk, not “model remembers")
- staged KB pipeline (raw docs → summaries with provenance → promote to Vault)
- refusal-capable “deep think" mode for high-stakes questions
I think an algo or plain search engine can do wonders.
It doesn’t give you a consistent behavioral contract across chat, memory, and retrieval.
3) “Everything looks like a nail”
Maybe. But the nail I’m hitting is: “I want local LLMs to shut up when they don’t know, and show receipts when they do.”
That’s a perfectly cromulent nail to hit.
If you don’t want an LLM in the loop at all, you’re right - don’t use this.
If you do want one, this is me trying to make it behave like infrastructure instead of “vibes”.
Now let’s see Paul Allen’s code :P
- Comment on [deleted] 3 weeks ago:
Not intentionally :)
- Comment on [deleted] 3 weeks ago:
I would be super interested to hear if it could do that. I genuinely don’t know, because I haven’t tried it.
If you can export your emails in the correct format, it might actually work. Try a small batch and report back.
PS: you DON’T HAVE TO run >>summ if you don’t want to. You can ask questions against the raw files too.
Wishing you luck!
- Comment on [deleted] 3 weeks ago:
Ah. So -
First prize: picture of you Second prize: two pictures
?
:P
- Comment on [deleted] 3 weeks ago:
Yes! And, for fun, I forced Qwen to answer your question, after querying it’s own fucking brain (M-E-T-A).
The below is AI generated, using the router, with >>Fun mode (not >>FR…that’s my special gremlin mode). >>f suspends some of the terseness. Let me know if you want to see the non FR version
Yep — with your current code, here’s the literal answer to “what can
>>attachingest, can it handle the whole Python docs, and how does retrieval work?” (based on the files you uploaded).
What
>>attachactually does (and does not do)>>attach <kb>doesn’t ingest anything. It just registers a KB name intostate.attached_kbsso later turns know which KB folders are “active.”The router’s own docstring is blunt about the intended contract:
- “KBs are filesystem folders containing
SUMM_*.mdfiles.” - “/serious answers use filesystem KB retrieval ONLY from attached KBs.”
So the “ingest” step is really:
- drop raw docs in the KB folder
- run
>>summ newto generateSUMM_*.md - then attach the KB (or keep it attached) and query it
What kinds of files can be summarized (and therefore used via
attach)?In the router’s filesystem SUMM pipeline,
_SUPPORTED_RAW_EXTS = {“.md”, “.txt”, “.pdf”, “.html”, “.htm”}- HTML is “cheap stripped” (scripts/styles removed, tags nuked) before summarizing
- PDFs require
pypdf— if missing, the router treats that as a failure/skip with a note (your top-level comment calls this out explicitly). - There’s also an explicit guard to truncate huge inputs before sending to the model (default
summ.max_input_chars = 120_000).
When a SUMM is created, it writes a provenance header including
source_rel_pathandsource_sha256, then moves the original into/original/.So: you do not need “minimalistic plain-text statements.” You can feed it normal docs (md/txt/html/pdf) and it will produce SUMMs that become queryable.
“If I dropped the entire Python docs in there…”
Yes, it will produce something usable, because Python docs are mostly HTML and you explicitly support
.html/.htmwith stripping.But there are two practical gotchas in your implementation:
- It will generate one SUMM per source file (and you’ll end up with a lot of SUMMs).
summ_new_in_kb()walks the KB tree, skips/original/, and summarizes every supported raw doc that doesn’t already have a correspondingSUMM_*.md. - The SUMM prompt structure matters. Your shipped
SUMM.mdtemplate is oriented around “overview / key ideas / steps / tradeoffs / pitfalls” rather than “API reference / signatures / parameters.” So it’ll work better for conceptual docs than for “tell me the exact signature ofpathlib.Path.glob”.
If you want Python-docs-as-KB to be strong, the move is: tweak the SUMM prompt to extract API sections + signatures + examples, not just “Key Ideas.”
How retrieval works in “attached KB” mode
This is the most important bit: filesystem KB retrieval is not vector embeddings. It’s a lightweight lexical scorer over SUMM markdown blocks.
Concretely:
- It searches only
SUMM_*.mdunder attached KB roots, explicitly excluding/original/. - It splits each SUMM into blocks (paragraphs / headings), caps blocks around ~1200 chars, then scores blocks by token overlap with the query.
- Scoring has a few boosts: headings matching query tokens, and a small bonus for code fences.
- Then it returns the top hits (defaults like
top_k=8,max_blocks_per_file=3,max_chars=2400).
So attached-KB mode is basically: pre-summarize once → then do fast “smart grep” over summaries.
That’s why it’s potato-friendly: you’re not embedding every doc on every query; SUMMs are preprocessed once.
And “Vault / Mentats” retrieval is different
When you
>>move to vault, you’re no longer doing filesystem lexical search. You promote SUMMs into Qdrant and retrieve top-k chunks with embeddings + rerank (as configured in README / config).(Also worth noting: per your ingest module, the intended architecture is explicitly “filesystem SUMMs for day-to-day; Vault/Qdrant only for promoted SUMMs.” )
One-liner you can paste as a Lemmy answer
>>attachdoesn’t ingest raw docs — it just activates a KB folder. The KB mode queries SUMM_*.md files (generated via>>summ new) using a lightweight lexical block scorer (token overlap + heading boosts), not embeddings. You can drop the whole Python HTML docs in a KB and it’ll work, but you’ll want a SUMM prompt tuned for API/reference detail or the summaries will smooth out the stuff you actually care about. If you promote SUMMs to the Vault (>>move to vault), then retrieval switches to Qdrant vector search + reranking.###########################
Is gut, ja?
- “KBs are filesystem folders containing
- Comment on [deleted] 3 weeks ago:
Oh it can try…but you can see it’s brain. That’s the glass box part of this. You can LITERALLY see why it says what it says, when it says it. And, because it provides references, you can go and check them manually if you wish.
Additionally (and this is the neat part): the router actually operates outside of the jurisdiction of your LLM. Like, the LLM can only ask it questions. It can’t affect the routers (deterministic) operation. The router gives no shits about your LLM.
Sometimes, the LLM might like to give you some vibes about things. Eg: IF YOU SHOUT AT IT LIKE THIS, the memory module of the router activates and stores that as a memory (because I figured, if you’re shouting at the llm, it’s probably important enough in the short term. That or your super pissed).
The llm may “vibe” a bit (depending on the temp, seed, top_k etc), but 100/100, ALL CAPS >8 WORDS = store that shit into facts.json
Example:
User: MY DENTIST APPOINTMENT IS 2:30PM ON SATURDAY THE 18TH. LLM: Gosh, I love dentists! They soooo dreamy! <----PS: there’s no fucking way your LLM is saying this, ever, especially with the settings I cooked into the router. But anywayz
[later]
USER: ?? When is my dentist appointment again LLM: The user’s dentist appointment is at 2:30 PM on Saturday, the 18th. The stored notes confirm this time and date, with TTL 4 and one touch count. No additional details (e.g., clinic, procedure) are provided in the notes.
Confidence: high | Source: Stored notes
Yes, I made your LLM autistic. You’re welcome