antihumanitarian
@antihumanitarian@lemmy.world
- Comment on Elon Musk wants to rewrite "the entire corpus of human knowledge" with Grok 2 weeks ago:
Most if not all leading models use synthetic data extensively to do exactly this. However, the synthetic data needs to be well defined and essentially programmed by the data scientists. If you don’t define the data very carefully, ideally math or programs you can verify as correct automatically, it’s worse than useless. The scope is usually very narrow, no hitchhikers guide to the galaxy rewrite.
But in any case he’s probably just parroting whatever his engineers pitched him to look smart and in charge.
- Comment on Syncthing alternatives 5 weeks ago:
I had some similar and obscure corruption issues that wound up being a symptom of failing ram in a main server node. After that, only issues have been conflicts. So I’d suggest checking hardware health in addition to the ideas about backups vs sync.
- Comment on Black Mirror AI 1 month ago:
Some details. One of the major players doing the tar pit strategy is Cloudflare. They’re a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.
Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they’re a cheap way to get training data. If you make a non zero portion of training data poisonous you’d have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.
So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.
- Comment on Using AI generated code will make you a bad programmer. 8 months ago:
I recently removed in editor AI cause I noticed I was acquiring muscle memory for my brain, not thinking through the rest past the start of a snippet that would get an LLM to auto complete. I’m still using LLMs, particularly for languages and libraries I’m not familiar with, but using the artifacts editors in ChatGPT and Claude.