Comment on The AI-focused COPIED Act would make removing digital watermarks illegal

<- View Parent
Grimy@lemmy.world ⁨2⁩ ⁨months⁩ ago

This bill is being built with the interests of the big tech companies in mind imo, big copyright holders are just an afterthought. I figure since big tech spent quite a bit of money building those datasets and since they were built before the law, they will be able to keep using them as long as they don’t add anything new but I can’t be certain.

The use cases are vast. This is a huge boon for the indie gaming and animation industry. I’m seriously excited to have NPCs running on llms and don’t want to be forced into a subscription just to play my games. It’s also going to bring smart homes to an other level. Systems can be built that are much stronger than Alexa without having to send all that insanely private data to Amazon. There’s a huge privacy issue if all the available models only run on Google or openais cloud, but I won’t get into that (not to mention that these corporate llms will eventually be trained for advertisement and will essentially be poisoned to prefer whoever is paying its creator).

I’ll give some more concrete example with my work but it will be a bit vague to preserve my anonymity.

I work in research (I originally studied software engineering and robotics) and we have about 20 years worth of projects. None of it is standardized and it’s honestly a mess. I built a system in the space of a few days that grabs everyone of those docs, reads through it with an LLM and then classifies them doc per doc into an excel sheet with a SharePoint link. I’ve got 20 columns in there, it summarizes them, choses from a list of 30 types of documents I gave it, extracts related towns and people as well as companies and domain, it extracts the columns if there are any tables inside and generally establishes a bunch of different relationships. It doesn’t sound like much but doing it by hand would have been weeks of tedious work. My computer did it in 20 minutes using a local LLM so any sensitive client data doesn’t leave the building.

Right now I’m working on a GraphRAG system that will take all those docuuments and turns into into vectors, then an LLM adds relationships to those vectors. It will be incorporated into an internal chatbots so people can ask questions and not only get a natural language answer but have the references where the information was found and quick access to it. It’s vector search on steroids and will cost nothing to run. I’m planning on eventually training the chatbots itself on our data so it can have a better understanding of our research sector as well as direct access to all the documents.

Next is building something that gets info automatically from the web. Sometimes we have to create long Excel sheets with a bunch of different data points. We stay at a state level usually but it can sometimes mean 1000 businesses and we have to google each one manually and find the info. It’s sometimes weeks of work and honestly sucks doing. Llms are entirely capable of doing this kind of work and would take a few hours at most, again at no cost.

These things are seriously great whenever it’s dealing with data that isn’t just numbers and is hard to quantify. I hate Reddit and will never create an account there after what happened but I still go daily to the localllama subreddit, it’s a great source of information if you want to keep abreast with what’s happening.

source
Sort:hotnewtop