I was wondering how far I’d have to scroll before getting to someone who doesn’t understand statistics complaining about the sample size…
Comment on Study of 8k Posts Suggests 40+% of Facebook Posts are AI-Generated
dan@upvote.au 6 days agoIn that case, how did they only choose 8000 posts over 6 years? Facebook probably gets more than 8000 new posts per minute.
prole@lemmy.blahaj.zone 6 days ago
dan@upvote.au 5 days ago
There’s likely been trillions of posts on Facebook during that time frame. Is a sample size of 8000 really sufficient for a corpus that large?
prole@lemmy.blahaj.zone 5 days ago
Have you ever heard of “margin of error”?
Learn statistics, it’s actually super informative.
hildegarde@lemmy.blahaj.zone 6 days ago
Every study uses sampling. They don’t have the resources to check everything. I have to imagine it took a lot of work to verify conclusively whether something was or was not generated. It’s a much larger sample size than a lot of studies.
dan@upvote.au 6 days ago
The study is by a company that creates software to detect AI content, so it’s literally their whole job.
It’s a very small proportion of the total number of Facebook posts though.
tal@lemmy.today 6 days ago
The proportion of the total population size is almost irrelevant when you use random sampling. It doesn’t rely on examining a large portion of the population, but rather that it becomes increasingly unlikely for the sample set to deviate dramatically from the population size. This is a function of the number of samples you take, decoupled from the population size.
en.wikipedia.org/wiki/Sampling_(statistics)