ocassionallyaduck
@ocassionallyaduck@lemmy.world
- Comment on Judge dismisses authors' copyright lawsuit against Meta over AI training 2 weeks ago:
Ingesting all the artwork you ever created by obtaining it illegally and feeding it into my plagarism remix machine is theft of your work, because I did not pay for it.
Separately, keeping a copy of this work so I can do this repeatedly is also stealing your work.
The judge ruled the first was okay but the second was not because the first is “transformative”, which sadly means to me that the judge despite best efforts does not understand how a weighted matrix of tokens works and that while they may have some prevention steps in place now, early models showed the tech for what it was as it regurgitated text with only minor differences in word choice here and there.
Current models have layers on top to try and prevent this user input, but escaping those safeguards is common, and it’s also only masking the fact that the entire model is built off of the theft of other’s IP.
- Comment on Judge dismisses authors' copyright lawsuit against Meta over AI training 2 weeks ago:
There is nothing intelligent about “AI” as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it’s weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.
Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.
LLMs are insanely “dumb”, they’re just lightspeed parrots. The fact that Meta and these other giant tech companies claim it’s not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.
In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.
I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it’s just clever business.
- Comment on Judge dismisses authors' copyright lawsuit against Meta over AI training 2 weeks ago:
Terrible judgement.
Turn the K value down on the model and it reproduces text near verbatim.
- Comment on AOSP isn't dead, but Google just landed a huge blow to custom ROM developers 4 weeks ago:
I’m not worried about me. I can manage. But I had to intervene and make it a Project for my immediate family. Which is always unfun, because who wants to expose all their personal data that way, especially photos.
Crazy that Google just screwed over GrapheneOS like this.
- Comment on We have to solve the money problem! 4 weeks ago:
Great link, and I fully agree. If it’s possible anyways.
- Comment on AOSP isn't dead, but Google just landed a huge blow to custom ROM developers 4 weeks ago:
Yes, but that shouldn’t explicitly opt in, and they shouldn’t marry that product to Gmail and Google Drive if they are going to push it to enable by default.
Again, it’s really insidious. They push it so aggressively I had to disable it on my personal device twice, and I can’t just not use Google Photos app because it’s tied to the camera itself on pixel phones.
- Comment on AOSP isn't dead, but Google just landed a huge blow to custom ROM developers 4 weeks ago:
The absolutely criminal dark patterns that they pull on people via Google photos auto backup is insane.
Just in my own orbit 2 of my friends wives, my parents, and my in-laws all wound up paying Google because they thought they had to or lose all their photos. We helped most of them disconnect the autobackup (that they didn’t even know was activated) and move it to offline safely. But that was the most downright evil shit Google has ever done and literally a fire in me for manipulating the elderly and less tech savvy so blatantly.
- Comment on We have to solve the money problem! 4 weeks ago:
I responded above, but my point kind of was that it doesn’t work that way, but as we rethinking content delivery we should also rethinking hosting distribution. What I was saying is not a “well gee we should just do this…” type of suggestion, but more a extremely high level idea for server orchestration from a public private swarm that may or may not ever be feasible, but definitely doesn’t really exist today.
Imagine if it were somewhat akin to BitTorrent, only the user could voluntarily give remote control to the instance for orchestration management. The orchestration server toggles the nodes contents so that, lets say, 100% of them carry the most accessed data (hot content, >100gb), and the rest is sharded so they each carry 10% of the archived data, making each node require >1tb total. And the node client is given X number of pinned CPUs that can be used for additional server compute tasks to offload various queries.
See, I’m fully aware this doesn’t really exist on this form. But thinking of it like a Kubernetes cluster or a HA webclient it seems like it should be possible somehow to build this in a way where the client really only needs to install, and say yes to contribute. If we could cut it down to that level, then you can start serving the site like a P2P bittorrent swarm, and these power user clients can become nodes.
- Comment on We have to solve the money problem! 4 weeks ago:
I realize that is not how the fediverse works. I’m not speaking about the content delivery as much as the sever orchestration.
That’s why I’m saying if somehow it could work that way, it would be one way to offset the compute and delivery burdens. But it is a very different paradigm from normal hosting. There would have to be some kind of swarmanagement layer that the main instance nodes controlled.
My point was only that, should such a proposal be feasible one day, if you lower the barriers you could have more resources.
I myself have no interest in hosting a full blown private instance of Lemmy or mastodon, but I would happily contribute 1tb of storage and a ton of idle compute to serving the content for my instance if I could. That’s where this thinking stemmed from. Many users like me could donate their “free” idle power and space. But currently it is not feasible.
- Comment on We have to solve the money problem! 4 weeks ago:
Provided there is an “upper limit” on what scale we are talking, Ive often wondered, couldn’t private users also host a sharded copy of a server instance to offset load and bandwidth? Like Folding@Home, but for site support.
I realize this isn’t exactly feasible today for most infra, but if we’re trying to “solve” the problem, imagine if you were able to voluntarily, give up like 100gb HDD space and have your PC host 2-3% of an instance’s server load for a month or something. Or maybe just be a CDN node for the media and bandwidth heavy parts to ease server load, while the server code is on different machines.
This kind of distributed “load balancing” on private hardware may be a complete pipe dream today, but it think if might be the way federated services need to head. I can tell you if we could get it to be as simple as volunteers spinning up a docker, and dropping the generated wireguard key and their IP in a “federate” form to give the mini-node over to an instance, it would be a lot easier to support sites in this way.
Speaking for myself, I have enough bandwidth and space I could lend some compute and offset a small amount of traffic. But the full load of a popular instance would be more than my simple home setup is equipped for. If contributing hosting was as easy as contributing compute, it could have a chance to catch on.
- Comment on Google is going ‘all in’ on AI. It’s part of a troubling trend in big tech 1 month ago:
True, in a broad sense. I am speaking moreso to enshittification and the degradation of both experience and control.
If this was just “now everything has Siri, it’s private and it works 100x better than before” it would be amazing. That would be like cars vs horses. A change, but a perceived value and advantage.
But it’s not. Not right now anyways. Right now it’s like replacing a car with a pod that runs on direct wind. If there is any wind over say, 3mph it works, and steers 95% as well as existing cars. But 5% of the time it’s uncontrollable and the steering or brakes won’t respond. And when there is no wind over 3mph it just doesn’t work.
In this hypothetical, the product is a clear innovation, offers potential benefits long term in terms of emissions and fuel, but it doesn’t do the core task well, and sometimes it just fucks it up.
The television, cars, social media, all fulfilled a very real niche. But nearly everyone using AI, even those using it as a tool for coding (arguably its best use case) often don’t want to use it in search or in many of these other “forced” applications because of how unreliable it is. Hence why companies have tried (and failed at great expense) to replace their customer service teams with LLMs.
This push is much more top down.
Now drink your New Coke and Crystal Pepsi.
- Comment on Google is going ‘all in’ on AI. It’s part of a troubling trend in big tech 1 month ago:
Tech companies don’t really give a damn what customers want anymore. They have decided this is the path of the future because it gives them the most control of your data, your purchasing habits and your online behavior. Since they control the back end, the software, the tech stack, the hardware, all of it, they just decided this is how it shall be. And frankly, there’s nothing you can do to resist it, aside from just eschewing using a phone at all. and divorcing yourself from all modern technology, which isn’t really reasonable for most people. That or legislation, but LOL United States.
- Comment on Discord going public. Plz help a future refugee. 3 months ago:
Did you follow a guide, or know one you could link? I’m thinking this is the path for me and my friends too.
- Comment on What are the exact ramifications and consequences of the recent meeting with Zelenskyy and Trump/JD? 4 months ago:
Hello comrade.
- Comment on Ratatan - Official Gameplay Trailer | ID@Xbox 4 months ago:
Hilarious that this is being promoted on Xbox. Absolutely ridiculous Sony didn’t keep this team going.
- Comment on Obsidian is now free for work - Obsidian 4 months ago:
With Obsidian, you don’t have to use folders. I’m generally of the opinion that having a tool is better than not having access to it. Tags and Folders are just an option to use. Fundamentally Logseq and Obsidian otherwise can be very similar.
- Comment on Obsidian is now free for work - Obsidian 4 months ago:
Interesting. I’ll have to give that one a shot later. Though I’m probably fine with Obsidian.
- Comment on Obsidian is now free for work - Obsidian 4 months ago:
Yes, but the syntax and documentation on the queries is obtuse as hell in logseq. Like it is ridiculous how granular you have a to get of you want to return all links within a time period or something. If I need to write SQL to pull notes, I should just use a database, lol.
The nice thing about tags as a distinct entity is it offers the option you can utilize if you choose. It gives you two buckets you can sort into and connect between. And it does make creating “topic groups” easier than manually linking them all to a tag page in logseq, imo.
Conversely, I would massively prefer of Logseq abolished support for hashtags entirely if they are functionally identical to wikilinks. Or combine them so the hashtags auto-convert to wikilinks or vice versa. But supporting hashtags in any manner when they are frankly not a “real” feature is more frustrating. Making topic links in Logseq is harder because of this.
Also, the existence of tag pages themselves is a confusong abberation given the above…
Logseq is a great tool, but very different in terms of what it is best suited to handle. I think I will revisit it for if I do a lot of writing, but for disparate ideas or notation it is good but could be better.
- Comment on Obsidian is now free for work - Obsidian 4 months ago:
I’ve tried logseq for the last 6 months (no commercial license) at work, but while it’s really good for outlining, it’s lack of a tag function is what feels like a critical weakness to me. I realize structurally it’s different in concept. But making everything into bullets doesn’t always suit the task.
I would love Logseq for journalling or writing though.
- Comment on Obsidian is now free for work - Obsidian 4 months ago:
Holy shit this is huge. I can finally use obsidian at work! I was avoiding it due to the license and using Logseq. Which, to be fair, did admirably. But it’s much more and Outliner or journaling system than a knowledge base I feel.
- Comment on Kindle Is Making It Harder to Switch to Rival eReader Brands. 4 months ago:
Yea, I had like a 2nd or 3rd gen paperwhite and rooted it for this reason, but my partner’s wasn’t hackable until this moment. So now she can have it too.
- Comment on Kindle Is Making It Harder to Switch to Rival eReader Brands. 4 months ago:
Better Calibre integration.
Custom shelves and book collections on Kindle.
- Comment on [deleted] 4 months ago:
So long as this is genuone, and not a stealth sabotage, that’s a genuinely good response and reaction.
I would respond to them that that isn’t a bad idea, so long as the therapist isn’t “primed” on the issue, and you’re able to actually go in with a blank slate.
Also, no idea where you live, but in the US make sure your therapist agrees to keep your therapy notes in some kind of shorthand. Musk and Trump don’t care about violating HIPPA to harm your rights, so be smart yea?
- Comment on Microsoft Flight Simulator 2024's launch has been marred by long load times, server issues and now it has overwhelmingly negative reviews 7 months ago:
For low res, no.
Hi res, sure. Make it optional, or let players download the region they like. Or just the airports with much lower res landscapes, etc etc.
Or just, let them have it all and make these choices. Memory is CHEAP nowadays. If you’re a flight sim enthusiast, a few terabytes for the map data is the least expensive part of your setup by far.
- Comment on Microsoft Flight Simulator 2024's launch has been marred by long load times, server issues and now it has overwhelmingly negative reviews 7 months ago:
God I love having a future where my ability to play a fucking flight simulator depends on both internet access and server reliability.
Completely unnecessary to boot. Store a low res copy locally, offer the high res as regional packs. 0 reason to stream this data in.
- Comment on The Pentagon wants AI to enhance the capabilities of US nuclear weapons systems 8 months ago:
“AI” cannot make rational choices.
It is a giant word association machine.
For the love of god this should never be involved in military applications.
- Comment on McDonald’s posts biggest decline in global sales in four years 8 months ago:
We just decided to never go back after McDonalds decoded to weigh in on Israel/Gaza by (their regional branches) feeding the IDF for free, and McDonalds corporate letting that be.
- Comment on Concerns Raised Over Bitwarden Moving Further Away From Open-Source 8 months ago:
An online database is still a file ultimately. A SQL or other DB file stored in a webserver, accessed through a web interface.
Vaultwarden, etc, are the same, only the database file is less directly visible IMO. Keepass IMO is simple. The DB in a bespoke format, stored outside the application.
You could put the vault in system32 and name it “trustedinstaller.log”, and if someone saw you had keepass they wouldn’t even know where your vault is.
Given the number of well documented breaches of online password vaults, I would much rather do a private device to device sync via syncthing and keep it out of webservers.
- Comment on Concerns Raised Over Bitwarden Moving Further Away From Open-Source 8 months ago:
Syncthing is encrypted transfers.
The database is encrypted.
And you can set it to not use relays for data, only matchmaking between your own devices.
So it’s an encrypted file, encrypted again, and sent directly from an IP you own to an IP you own.
- Comment on Concerns Raised Over Bitwarden Moving Further Away From Open-Source 8 months ago:
F-Droid syncthing-fork is still actively developed and had a patch in the last few weeks.
So hopefully this isn’t the end.