There are lists of bots that instance Admins can block for a range of reasons.
Anything online can be scraped but big firms might run into regulatory trouble if they are caught randomly scraping sites without consent. At the moment, the big social media apps have a tonne of content to train on in tightly controlled conditions, so they don’t really need to go into the wild, yet. However, we need to be vigilant, block them and make a fuss if we catch them at it.
Womble@lemmy.world 4 weeks ago
Everything on the Fediverse is almost certainly scraped, and will be repeatedly. You cant “protect” content that is freely available on a public website.
ayyy@sh.itjust.works 4 weeks ago
Bug uh, I wrote an entire license in every one of my comments so it would be impossible for them to scrape! /s
kane@femboys.biz 4 weeks ago
I do not entirely agree.
While what you said might be true for content that we post, things like view history and tracking in itself is much more difficult. That meta data does help with tagging content.
Womble@lemmy.world 4 weeks ago
Yeah, fair enough, I was refering to posts and comments not other metadata because that isnt publicly available just as a get request (as far as I’m aware)