brucethemoose
@brucethemoose@lemmy.world
- Comment on Grok praises Hitler, gives credit to Musk for removing “woke filters” 5 hours ago:
DeepSeek, now that is a filtered LLM.
The web version has a strict filter that cuts it off. Not sure about API access, but raw Deepseek 671B is actually pretty open. Especially with the right prompting.
There are also finetunes that specifically remove China-specific refusals:
huggingface.co/microsoft/MAI-DS-R1
huggingface.co/perplexity-ai/r1-1776
Note that Microsoft actually added saftey training to “improve its risk profile”
Grok losing the guardrails means it will be distilled internet speech deprived of decency and empathy.
Instruct LLMs aren’t trained on raw data.
It wouldn’t be talking like this if it was just trained on randomized, augmented conversations, or even mostly Twitter data. They cherry picked “anti woke” data to do this real quick, and the result effectively drove the model crazy. It has all the signatures of a bad finetune: specific overused phrases.
- Comment on Grok praises Hitler, gives credit to Musk for removing “woke filters” 7 hours ago:
Nitpick: it was never ‘filtered’
LLMs can be trained to refuse excessively (which is kinda stupid and is objectively proven to make them dumber), but the correct term is ‘biased’. If it was filtered, it would literally give empty responses for anything deemed harmful, or at least noticably take some time to retry.
They trained it to praise hitler, intentionally. They didn’t remove any guardrails.
- Comment on Tesla loses $68 billion in value after Elon Musk says he is launching a political party 1 day ago:
It coincides with a bunch of other stuff, like hints at Tesla regulation, tariff’s, and regular wild swings just from being Tesla.
CNBC is basically the Fox News of finance world. Big, sensationalist and basically catering to day trader hype (not the longer term buy-and-hold. You know, what the stock market is supposed to be).
- Comment on Microsoft has never been good at running game studios, which is a problem when it owns them all 2 days ago:
Also a crime. Not just a great game in their niche, but a long history of them.
- Comment on Microsoft has never been good at running game studios, which is a problem when it owns them all 2 days ago:
Never underestimate Phil Spencer.
- Comment on ICEBlock climbs to the top of the App Store charts after officials slam it 4 days ago:
…iOS forces uses Apple services including getting apps through Apple…
Can’t speak to the rest of the claims, but Android practically does too. If one has to sideload an app, you’ve lost 99% of users, if not more.
It makes me think they’re not talking about the stock systems OEMs ship.
Relevant XKCD: xkcd.com/2501/
- Comment on Mullvad's ads are good 5 days ago:
Nah I meant the opposite. Journalistic integrity was learned through long, hard history.
Now that traditional journalism is dying, its like the streamer generation has to learn it from scratch, heh.
- Comment on Mullvad's ads are good 6 days ago:
Its kinda like influencers (and their younger viewers) are relearning the history of journalism from scratch, heh.
- Comment on Mullvad's ads are good 6 days ago:
Surpressing sponsors is a perverse incentive too; all the more reason to not disclose who’s paying the creator.
- Comment on [deleted] 1 week ago:
One thing about Anthropic/OpenAI models is they go off the rails with lots of conversation turns or long contexts. Like when they need to remember a lot of vending machine conversation I guess.
A more objective look: arxiv.org/abs/2505.06120v1
Gemini is much better. TBH the only models I’ve seen that are half decent at this are:
-
“Alternate attention” models like Gemini, Jamba Large or Falcon H1, depending on the iteration. Some recent versions of Gemini kinda lose this, then get it back.
-
Models finetuned specifically for this, like roleplay models or the Samantha model trained on therapy-style chat.
But most models are overtuned for oneshots like fix this table or write me a function, and don’t invest much in long context performance because it’s not very flashy.
-
- Comment on Recommendations for External GPU Docks for Home Lab Use - Lemmy 1 week ago:
What @mierdabird@lemmy.dbzer0.com said, but the adapters arent cheap. You’re going to end up spending more than the 1060 is worth.
A used desktop to slap it in, that you turn on as needed, might make sense? Doubly so if you can find one with an RTX 3060, which would open up 32B models with TabbyAPI instead of ollama.
- Comment on Men are opening up about mental health to AI instead of humans 1 week ago:
ChatGPT (last time I tried it) is extremely sycophantic though. Its high default sampling also leads to totally unexpected/random turns.
Google Gemini is now too. They log and use your dark thoughts.
I find that less sycophantic LLMs are way more helpful. Hence I bounce between Nemotron 49B and a few 24B-32B finetunes (or task vectors for Gemma) and find them way more helpful.
…I guess what I’m saying is people should turn towards more specialized free tools, not something generic like ChatGPT.
- Comment on Men are opening up about mental health to AI instead of humans 1 week ago:
TBH this is a huge factor.
I don’t use ChatGPT much less use it like it’s a person, but I’m socially isolated at the moment. So I bounce dark internal thoughts off of locally run LLMs.
It’s kinda like looking into a mirror. As long as I know I’m talking to a tool, it’s helpful, sometimes insightful. It’s private. And I sure as shit can’t afford to pay a therapist out of the gazoo for that.
It was one of my previous problems with therapy: payment tied to someone toxic, at preset times (not when I need it). Many sessions feels like they end when I’m barely scratching the surface. Yes therapy is great in general, but still.
- Comment on I've just created c/Ollama! 1 week ago:
You can still use the IGP, which might be faster in some cases.
- Comment on I've just created c/Ollama! 1 week ago:
Oh actually that’s a good card for LLM serving!
Use the llama.cpp server from source, it has better support for Pascal cards than anything else:
github.com/ggml-org/llama.cpp/…/multimodal.md
Gemma 3 is a hair too big (like 17-18GB), so I’d start with InternVL 14B Q5K XL: huggingface.co/…/InternVL3-14B-Instruct-GGUF
Or Mixtral 24B IQ4_XS for more ‘text’ intelligence than vision: huggingface.co/…/Mistral-Small-3.2-24B-Instruct-2…
- Comment on I've just created c/Ollama! 1 week ago:
1650
You mean GPU? Yeah, it’s good, I was strictly talking about purchasing a laptop for LLM usage, as most are less than ideal for the money.
- Comment on I've just created c/Ollama! 1 week ago:
Yeah, just paying for LLM APIs is dirt cheap, and they (supposedly) don’t scrape data. Again I’d recommend Openrouter and Cerebras! And you get your pick of models to try from them.
Even a framework 16 is not great for LLMs TBH. The Framework desktop is (as it uses a special AMD chip), but it’s very expensive. Honestly the whole hardware market is so screwed up, hence most ‘local LLM enthusiasts’ buy a used RTX 3090 and stick them in desktops or servers, heh.
- Comment on I've just created c/Ollama! 1 week ago:
I was a bit mistaken, these are the models you should consider:
huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ
huggingface.co/AnteriorAI/…/main
huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)
These are state-of-the-art, as far as I know.
- Comment on I've just created c/Ollama! 1 week ago:
8GB?
You might be able to run Qwen3 4B: huggingface.co/mlx-community/…/main
But honestly you don’t have enough RAM to spare, and even a small model might bog things down. I’d run Open Web UI or LM Studio with a free LLM API, like Gemini Flash, or pay a few bucks for something off openrouter. Or maybe Cerebras API.
- Comment on I've just created c/Ollama! 1 week ago:
Actually, to go ahead and answer, the “easiest” path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX quants).
Probably one of these models, depending on how much RAM you have:
huggingface.co/…/Magistral-Small-2506-4bit-DWQ
huggingface.co/…/Qwen3-30B-A3B-4bit-DWQ-0508
huggingface.co/…/GLM-4-32B-0414-4bit-DWQ
With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): openwebui.com
And then use LM Studio (or some other MLX backend, or even free online API models) as the ‘engine’
- Comment on I've just created c/Ollama! 1 week ago:
Honestly perplexity, the online service, is pretty good.
But first question is: how much RAM does your Mac have? This is basically the factor for what model you can and should run.
- Comment on I've just created c/Ollama! 2 weeks ago:
I don’t understand.
Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized.
And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, vllm and sglang do.
You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: pytorch.org/blog/compromised-nightly-dependency/
- Comment on I've just created c/Ollama! 2 weeks ago:
OK.
Then LM Studio. With Qwen3 30B IQ4_XS, low temperature sampling, open web ui frontend if you wish.
That’s what I’m trying to say though, LLMs work a bajillion times better with just a little personal configuration. They are not “one click” magic boxes, they are specialized tools.
Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.
Nvidia gaming PC? TabbyAPI with an exl3. Raspberry Pi? That’s important to know!
What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Do you need stuff fast or accurate?
This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on…)
- Comment on I've just created c/Ollama! 2 weeks ago:
Totally depends on your hardware, and what you tend to ask it. What are you running?
- Comment on I've just created c/Ollama! 2 weeks ago:
TBH you should fold this into localllama? Or open source AI?
I have very mixed (mostly bad) feelings on ollama. In a nutshell, they’re kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). It’s also a highly suboptimal way for most people to run LLMs, especially if you’re willing to tweak.
They’re… slimey. I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, any number of backends over them. Anything but ollama.
- Comment on Plutonium levels at nuclear test site in WA up to 4,500 times higher than rest of coast, study finds 2 weeks ago:
showed concentrations of plutonium at the islands were fourto 4,500 times higher than those found in sediment samples taken at two distant coastal sites
Emphasis mine.
Not saying this isn’t a big problem, but the article seems like it’s fearmongering too, or at least not providing enough specifics. 4-4,500 times higher than almost zero is still extremely low, and it’s only dangerous if inhaled.
- Comment on Elon Musk wants to rewrite "the entire corpus of human knowledge" with Grok 2 weeks ago:
It’s not so simple, there are successful methods of zero data ‘self play’ or other schemes for using other LLM’s output. Though distillation is probably the only one you’d want for a pretrain, specifically.
- Comment on Elon Musk wants to rewrite "the entire corpus of human knowledge" with Grok 2 weeks ago:
I elaborated below, but basically Musk has no idea WTF he’s talking about.
If I had his “f you” money, I’d at least try a diffusion bitnet model (and open the weights), and probably 100 other papers I consider low hanging fruit, before this absolutely dumb boomer take.
He’s such an idiot know it all. It’s so painful whenever he ventures into a field you sorta know.
But he might just be shouting nonsense on Twitter while X employees actually do something different. Because if they take his orders verbatim they’re going to get crap models, even with all the stupid brute force they have.
- Comment on Elon Musk wants to rewrite "the entire corpus of human knowledge" with Grok 2 weeks ago:
There’s some nuance.
Using LLMs to augment data, especially for fine tuning (not training the base model), is a sound method. The Deepseek paper using, for instance, generated reasoning traces is famous for it.
Another is using LLMs to generate logprobs of text, and train not just on the text itself but on the *probability a frontier LLM sees in every ‘word.’ This is called distillation, though there’s some variation and complication.
But yes, the “dumb” way, aka putting data into a text box and asking an LLM to correct it, is dumb and dumber, because:
-
You introduce some combination of sampling errors and repetition/overused word issues, depending on the sampling settings
-
You possibly pollute your dataset with “filler”
-
In Musks specific proposition, it doesn’t even fill knowledge gaps the old Grok has.
In other words, Musk has no idea WTF he’s talking about.
-
- Comment on So um, america just started another war in the middle east. We're going to need a shit ton more memes to americans from the nightmare they are enduring. Thanks in advance... 2 weeks ago:
A huge chunk of Americans enthusiastically support warring with Iran (or will, soon). Don’t like we aren’t culpable.
Is it because they’re glued to feeds and TV news? Yeah, but that’s also ours, and we basically elected Big Tech and Newscorp to the presidency so…