Comment on The fact that users are encouraged to include text descriptions with media content makes it perfect training data for AI.

FriendOfDeSoto@startrek.website ⁨1⁩ ⁨week⁩ ago

It’s not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that’s on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. “do not scrape,” that communicates the training data’s holders’ intentions. And if they torrent books you can guess how respectful they are.

source
Sort:hotnewtop