sugar_in_your_tea
@sugar_in_your_tea@sh.itjust.works
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 19 hours ago:
What we are talking about is the act of reading and/or learning and then using that information in order to synthesize new material.
Sure, but that’s not what LLMs are doing. They’re breaking down works to reproduce portions of it in answers. Learning is about concepts, LLMs don’t understand concepts, they just compare inputs with training data to provide synthesized answers.
The process a human goes through is distinctly different from the process current AI goes through. The process an AI goes through is closer to a journalist copy-pasting quotations into their article, which falls under fair use. The difference is that AI will synthesize quotations from multiple (many) sources, whereas a journalist will generally just do one at a time, but it’s still the same process.
- Comment on It's almost the week-end, what are you guys going to play? Also, could we have a monthly thread for July? 20 hours ago:
I just played Pony Island last night, and I might go through and get the tickets, we’ll see.
I also installed a few games as well, so I’ll probably play a couple of them:
- Hell Pie
- Yakuza 3
- Lair of the Clockwork God
- Euro Truck Simulator
And then I have a few that I’m partially done with that might get some attention. I’m taking next week off for a family trip, so I don’t have to be as responsible about getting to bed at a reasonable time this weekend. :)
- Comment on An ID verification service that works with TikTok and X left its credentials wide open for a year 20 hours ago:
And this is why I hate those laws that are intended to protect kids. Yeah, it would be nice if kids couldn’t see stuff they shouldn’t, but it’s even better if my PII isn’t stolen.
- Comment on Netflix mulls introducing free ad-supported tier. The circle is complete 20 hours ago:
By quality I meant resolution, I don’t need 4k, but I do need specific shows my wife and kids like.
I have a NAS set up with some movies and whatnot, so I’ve talked to my wife about setting up a budget to purchase content we want and then cancelling our streaming services. So we’d be limited to what’s available on DVD/Blu-Ray, but most of what my wife and kids watch are still available there.
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 20 hours ago:
I disagree that it needs to be explicit. The current law is the fair use doctrine, which generally has more to do with the intended use than specific amounts of the text. The point is that humans should know where that limit is and when they’ve crossed it, with motive being a huge part of it.
I think machines and algorithms should have to abide by a much narrower understanding of “fair use” because they don’t have motive or the ability to Intuit when they’ve crossed the line. So scraping copyrighted works to produce an LLM should probably generally be illegal, imo.
That said, our current copyright system is busted and desperately needs reform. We should be limiting copyright to 14 years (as in the original copyright act of 1790), with an option to explicitly extend for another 14 years. That way LLMs can scrape comment published >28 years ago with no concerns, and most content produced >14 years (esp. forums and social media where copyright extension is incredibly unlikely). That would be reasonable IMO and sidestep most of the issues people have with LLMs.
- Comment on Netflix mulls introducing free ad-supported tier. The circle is complete 1 day ago:
Let me know if it works and I’ll follow. I don’t need quality, I just need something for my kids to watch occasionally.
- Comment on Mac users served info-stealer malware through Google ads | Full-service Poseidon info stealer pushed by "advertiser identity verified by Google." 1 day ago:
Consider PiHole as a whole home network first line of defense.
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 1 day ago:
And that’s exactly what that page discusses. It links three options you can try:
The first two are paid, the last is FOSS.
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 1 day ago:
That depends on how similar your resulting algorithm is to the sources you were “inspired” by. You’re probably fine if you’re not copying verbatim and your code just ends up looking similar because that’s how solutions are generally structured, but there absolutely are limits there.
If you’re trying to rewrite something into another license, you’ll need to be a lot more careful.
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 1 day ago:
I complain all the time. But that’s not the subject of this post…
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 1 day ago:
IP Man. Great movies.
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 1 day ago:
I’m not going to be monitoring Chinese code projects. They don’t seem to care much about copyright, so they’ll probably just yoink the code into proprietary projects and not care about the licenses.
What am I going to do, sue someone in China?
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 1 day ago:
Here are options for to mount Backblaze B2 as a drive. It’s $6/TB/month, and I think they allow <1TB, so for 300GB you’d pay ~$2/month. So I think they’re pretty competitive, but I’m not familiar with Google Drive’s terms. They’re certainly in the same ballpark.
- Comment on Even Apple finally admits that 8GB RAM isn't enough 1 day ago:
I’m pretty sure I do understand the issue. Here are some facts (and an article to back it up):
- putting memory closer to the CPU improves performance due to less latency - from 96GB/s -> 200 (M1) or 400 (M1 Max) GB/s
- customers can’t easily solder on more RAM
- Apple’s RAM upgrades are way more expensive than socketed options on the market
And here’s my interpretation/guesses:
- marketing sees 1 & 2, and sees an opportunity to do more of 3
- marketing probably asked engineering what the bare minimum is, and they probably said 8GB (assuming web browsing and whatnot only), though 16GB is preferable (that’s what I’d answer)
- marketing sets the minimum @ 8GB, banking on most users who need more than the basics to buy more, or for users to buy another laptop sooner when they realize they ran out of RAM (getting after-sale RAM upgrades is expensive)
So:
- using soldered RAM is an engineering decision due to improved performance (double socketed RAM w/ Intel on M1, quadruple on M1 Max)
- limiting RAM to 8GB is a marketing decision
- if you don’t have enough RAM, that doesn’t mean the RAM isn’t performing well, it means you don’t have enough RAM
Using socketed RAM won’t fix performance issues related to running out of RAM, that issue is the same regardless. Only adding RAM will fix those performance issues, and Apple could just as easily make “special” RAM so you can’t buy socketed RAM on the regular market anyway (e.g. they’d need a different memory standard anyway due to Unified Memory).
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 1 day ago:
And how will you know what original work(s) to compare it to?
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 2 days ago:
code generated by an AI is arguably not a “substantial portion” of the software
How do you verify that though?
And does the model need to include all of the licenses? Surely the “all copies or substantial portions” would apply to LLMs, since they literally include the source in the model as a derivative work. That’s fine if it’s for personal use (fair use laws apply), but if you’re going to distribute it (e.g. as a centralized LLM), then you need to be very careful about how licenses are used, applied, and distributed.
So I absolutely do believe that building a broadly used model is a violation of copyright, and that’s true whether it’s under an open source license or not.
- Comment on China is attempting to mirror the entire GitHub over to their own servers, users report 2 days ago:
It certainly can. Most licences require derivative works to be under the same or similar licence, and an AI based on FOSS would likely not respect those terms. It’s the same issue as AI training on music, images, and text, it’s a likely violation of copyright and thus a violation of open source licensing terms.
Training on it is probably fine, but generating code from the model is likely a whole host of licence violations.
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 2 days ago:
This source seems to indicate that’s not the case:
-
Google Search & Other (56.93%)
2023 Total Google Search & Other Revenue: $175.04 billion This is revenue generated primarily from ads shown on Google’s search results pages and other search-related services.
-
YouTube Ads (10.26%)
2023 Total Youtube Ads Revenue: $31.51 billion This is revenue from ads shown on YouTube videos, including display ads, overlay ads, skippable video ads, and non-skippable video ads.
-
Google Network (10.20%)
2023 Total Google Network Revenue: $31.316 billion This is revenue from ads displayed on websites and apps that are part of Google’s ad network, beyond Google-owned properties.
-
Google Other (11.26%)
2023 Total Google Other Revenue: $34.68 billion This is revenue from Google’s other ventures and products, such as hardware (like Pixel phones and Nest devices), Play Store purchases, and other non-advertising sources.
-
Google Cloud (10.75%)
2023 Total Google Cloud Revenue: $33.08 billion This is revenue from Google’s cloud computing services, such as computing power, storage, and data analytics offered to businesses and developers.
So, 57% from search, and only 10% from ads on non-Google pages.
-
- Comment on The new Chinese owner of the popular Polyfill JS project injects malware into more than 100 thousand sites 2 days ago:
Looks like I’ll need to set up pihole then. Thanks for the info!
- Comment on Microsoft employee accidentally publishes PlayReady code 2 days ago:
I’m guessing furries.
- Comment on Music industry giants allege mass copyright violation by AI firms 2 days ago:
Let’s hope for an extremely long and expensive legal process where the RIAA gets an initial injunction against OpenAI while the case plays out.
- Comment on Even Apple finally admits that 8GB RAM isn't enough 2 days ago:
And that’s the idea. Soldering memory is an engineering decision. How much to solder is a marketing decision. Since users can’t easily add more, marketing can upsell on more RAM.
It’s not “on paper,” the RAM itself is performing better vs socketed RAM. Whether the system runs better depends on the configuration, as in, did you order enough RAM.
- Comment on Backdoor slipped into multiple WordPress plugins in ongoing supply-chain attack 2 days ago:
That is far too basic for most websites
Well yes, but that’s my point. WordPress does everything, and I’m offering tools that do one thing well.
If all you need is a static site, use a static site generator, not WordPress. If all you need is ecommerce, use an ecommerce tool, not WordPress. And so on.
unless you’re exporting it to a file after using the UI to create it?
I’m saying that if all you need is a static site, but you want something simple and hosted, Squarespace would be a decent alternative. Whether it’s actually static is beside the point, it’s probably more secure than a self-hosted WordPress site since you can’t just throw on a dozen plugins serverside, only use one or two, and then get hacked.
A swiss army knife can do everything, but it doesn’t do everything well, and it’s easy to use it insecurely, which opens you up to these sorts of attacks. I’m not going to suggest a drop-in replacement for WordPress (they do exist) because the problem is fundamental to the “one tool for everything” approach.
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 2 days ago:
It’s not bad if you subscribe to a handful of channels and only watch those. I use Grayjay, so I just turn off Rumble search and only see like 1-2 channels I’ve subbed to.
The issue seems to be that people who are not welcome on Youtube go to Rumble, and that’s largely right wing commentators, crypto bros who crossed a line, etc.
The same is true for Odyssee, but a little less extreme since Rumble is the bigger and more obvious platform.
I personally don’t care what alternative a creator uses, I just want to avoid Youtube.
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 2 days ago:
My company’s goal seems to be to get themselves more dependent on AWS. They’re always talking about which things we can replace with AWS offerings.
I’m the exact opposite. I’m always looking for how to make the things I use more replaceable. That way if a company goes bad, I only need to replace a small part of my stuff.
If AWS goes bad, I’d feel really bad for out devOPs team…
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 2 days ago:
Through search. They make a ton with ads on the search page, sponsored links, etc.
- Comment on [Gamers Nexus] "Google is Getting Worse," ft. Wendell of Level1 Techs 2 days ago:
If price is main concern, you still have options, but you’ll need to be a lot more specific about what you need. For example:
- proprietary, hosted products like OneDrive, Amazon Drive, DropBox, and MEGA
- backups - NordLocker, Backblaze
- hosted and self-hosted cloud platforms - OwnCloud and NextCloud, use Backblaze B2 for storage
I’m doing the last one. I have NextCloud installed on my custom NAS (just openSUSE Leap with some drives) and am working on configuring B2 as a backup service.
Each of these are similar in price to Google Drive, but with a different feature set. Some are cheaper.
- Comment on Even Apple finally admits that 8GB RAM isn't enough 2 days ago:
Why not? There is a performance benefit to being closer to the CPU, and soldering gets you a lot closer to the CPU. That’s a fact.
- Comment on Backdoor slipped into multiple WordPress plugins in ongoing supply-chain attack 2 days ago:
Yes, Jekyll and Hugo are vastly more limited, that’s the point. There’s no dynamic content, you just write in Markdown (the same thing Lemmy uses), pick a theme, and you’re good to go. No need to code anything, just a couple config files and Markdown.
Shopify is fine if you want something hosted. But since we were talking about WordPress, I assumed self-hosting was a desired quality. All of the platforms I mentioned are self-hosted, open source, and at least one from each category is compatible with PHP-only hosting providers, just like WordPress.
If we’re optimizing for easy, Squarespace should be on the table for static websites as well. I assumed we were talking about direct replacements for WordPress, not hosted alternatives.
- Comment on Tesla is recalling its Cybertruck for the fourth time to fix problems with trim pieces that can come loose and front windshield wipers that can fail | The new recalls each affect over 11,000 trucks 3 days ago:
Same, but with 10 ish. I even had a Saturn with fewer problems, and those are notorious for issues.