I'm pretty sure that the way they constantly fuck up hands is a solid demonstration that these AI tools do not have a perfect recollection
Comment on US rejects AI copyright for famous state fair-winning Midjourney art
drewdarko@kbin.social 1 year agoHuman brains don’t have perfect recollection. Every time we retell a story or remember a memory or picture an image in our head it is distorted with our own imperfections.
When I prompt an AI to create an image it samples the images it learned from with perfect recollection.
AI does not learn the same way humans do.
Skua@kbin.social 1 year ago
drewdarko@kbin.social 1 year ago
The reason they fuck up hands is because hands are usually moving during pictures and have many different configurations compared to any other body part.
So when these image AIs refer back to all the pictures of hands they’ve been fed and use them to create an ‘average approximation’ of what a hand looks like they include the motion blur from some of their samples, a middle finger sticking up from another sample or extra fingers from the sample pictures of people holding hands etc and mismatch them together even when it doesn’t fit in the picture being created.
The AI doesn’t know what a hand is. It is just mixing together samples from its perfect recollection.
Honytawk@lemmy.zip 1 year ago
What? No
How many pictures do you see online where the hands are in motion, or even blurred?
Hands are usually behind objects when they hold something and can indeed have tons of variations and configurations. Even human artists fuck up all the time or just not draw them at all.
AI don’t combine samples. If they did they wouldn’t be able to generate new pictures of whatever subject you want in a specific style you want and then have multiple variations of that picture.
It isn’t a copy and paste, it is interpreting the drawing and modifying it based upon the prompt.
cole@lemdro.id 1 year ago
This is incorrect actually. The models these AIs run from by definition have imperfect recall otherwise they would be ENORMOUS. No, that’s actually exactly the opposite of how these work.
They train a statistically weighted model to predict outputs based on inputs. It has no actual image data stored internally, it can’t.
drewdarko@kbin.social 1 year ago
This is incorrect actually. The models these AIs run from by definition have perfect recall and that is why they require ENORMOUS resources to run and why ChatGPT became less effective when the resources it was allocated were reduced.
-ChatGPT
cole@lemdro.id 1 year ago
No, they take exponentially increasing resources as a consequence of having imperfect recall. Smaller models have “worse” recall. They’ve been trained with smaller datasets (or pruned more).
As you increase the size of the model (number of “neurons” that can be weighted) you increase the ability of that model to retain and use information. But that information isn’t retained in the same form as it was input. A model trained on the English language (an LLM, like ChatGPT) does not know every possible word, nor does it actually know ANY words.
All ChatGPT knows is what characters are statistically likely to go after another in a long sequence. With enough neurons and layers combined with large amounts of processing power and time for training, this results in a weighted model which is many orders of magnitude smaller than the dataset it was trained on.
Since the model weighting itself is smaller than the input dataset, it is literally impossible for the model to have perfect recall of the input dataset. So by definition, these models have imperfect recall.
drewdarko@kbin.social 1 year ago
In other words they require exponentially more input because the AI doesn’t know what it is looking at.
It uses its perfect recollection of that input to create a ‘model’ of what a face should look like and stores that model like a collage of all the samples and then uses that to reproduce a face.
It’s perfect recollection with an extra step.