It‘s even hard to impossible to generate the image of a person doing a handstand. All models assume a rightside-up person.
Comment on DeviantArt’s Downfall Is Devastating, Depressing, and Dumb
lurch@sh.itjust.works 5 months agothere’s some stuff image generating AI just can’t do yet. it just can’t understand some things. a big problem seems to be referring to the picture itself, like position or its border. another problem is combining things that usually don’t belong together, like a skin of sky. those are things a human artist/designer does with ease.
anlumo@lemmy.world 5 months ago
Even_Adder@lemmy.dbzer0.com 5 months ago
This hasn’t been true for months at least. You really have to check week to week when dealing with things in this field.
100@fedia.io 5 months ago
think of an episode of any animated series with countless handmade backgrounds, good luck generating those with any sort of consistency or accuracy and you will be calling for an artist who can actually take instructions and iterate
yildolw@lemmy.world 5 months ago
We’ll soon be hearing that only Luddites care about continuity errors
tal@lemmy.today 5 months ago
There’s a lot.
Some of it doesn’t matter for certain things. And some of it you can work around. But try creating something like a graphic novel with Stable Diffusion, and you’re going to quickly run into difficulties. You probably want to display a consistent character from different angles – that’s pretty important. That’s not something that a fundamentally 2D-based generative AI can do well.
admin@lemmy.my-box.dev 5 months ago
I think creating a lora for your character would help in that case. Not really easy to do as of yet, but technically possible, so it’s mostly a ux problem.
tal@lemmy.today 5 months ago
A LORA is good for replicating a style, where there’s existing stuff, helps add training data for a particular subject. There are problems that existing generative AIs smack into that that’s good at fixing. But it’s not a cure-all for all limitations of such systems. The problems there is kinda fundamental to how the system works today – it’s not a lack of training data, but simply how the system deals with the world.
The problem is that the LLM-based systems today think of the world as a series of largely-decoupled 2D images, linked only by keywords. A human artist thinks of the world as 3D, can visualize something – maybe using a model to help with perspective – and then render it.
So, okay. If you want to create a facial portrait of a kinda novel character, that’s something that you can do pretty well with AI-based generators.
But now try and render that character you just created from ten different angles, in unique scenes. That’s something that a human is pretty good at:
Image
Like, try reproducing that page in Stable Diffusion, with the same views. Even if you can eventually get something even remotely approximating that, a human, traditional comic artist is going to be a lot faster at it than someone sitting in front of a Stable Diffusion box.
Is it possible to make some form of art generator that can do that? Yeah, maybe. But it’s going to have to have a much more-sophisticated “mental” model of the world, a 3D one, and have solid 3D computer vision to be able to reduce scenes to 3D. And while people are working on it, that has its own extensive set of problems. Look at your training set. The human artist slightly stylized stuff or made errors that human viewers can ignore pretty easily, but a computer vision model that doesn’t work exactly like human vision and the mind might go into conniptions over. For example, look at the fifth panel there. The artist screwed up – the ship slightly overlaps the dock, right above the “THWIP”. A human viewer probably wouldn’t notice or care. But if you have some kind of computer vision system that looks for line intersections to determine relative 3d positioning – something that we do ourselves – it can very easily look at that image and have no idea what the hell is going on there.
The proportions aren’t exactly consistent from frame to frame, don’t perfectly reflect reality, and might be more effective at conveying movement or whatever than an actual rendering of a 3d model would be. That works for human viewers. And existing 2D systems can kind of dodge the problem (as long as they’re willing to live with the limitations that intrinsically come with a 2D model) because they’re looking at a bunch of already-stylized images. But now imagine that they’re trying to take images, then reduce them into a coherent 3D world, then learn to re-apply stylization. That may involve creating not just a 3D model, but enough understanding of the objects in that world to understand what stylization is reasonable on, and when. Is it technically possible? Probably. But is it a minor effort to get there from here? No, probably not. You’re going to have to make a system that works wildly differently from the way that the existing systems do. That’s even though what you’re trying to do might seem small from the standpoint of a human observer – just being able to get arbitrary camera angles of the image being rendered.
tal@lemmy.today 5 months ago
On the other hand, there are things that a human artist is utterly awful at, that LLM-based generative AIs are amazing at. I mentioned that LLMs are great at producing works in a given style, can switch up virtually effortlessly. I’m gonna do a couple Spiderman renditions in different styles, takes about ten seconds a pop on my system:
Spiderman as done by Neal Adams:
Image
Spiderman as done by Alex Toth:
Image
Spiderman in a noir style done by Darwyn Cooke:
Image
Spiderman as done by Roy Lichtenstein:
Image
That’s something that’s really hard for a human to do, given how a human works, because for a human, the style is a function of the workflow and a whole collection of techniques used to arrive at the final image. Stable Diffusion doesn’t care about techniques, how the image got the way it is – it only looks at the output. So for Stable Diffusion, creating an image in a variety of styles is easy as pie, whereas for a single human artist, it’d be very difficult.
Even_Adder@lemmy.dbzer0.com 5 months ago
I don’t think it’s supposed to solve every problem. Just like very scene in the new Sand Land anime wasn’t 3D, the same goes for every other artistic tool. There are some things are easy with some tools, others it’s not well suited for.
What you have to ask yourself is what ways can it help you with what you’re trying to do.
hagelslager@feddit.nl 5 months ago
I think Corridor Digital made an AI animated film by hiring an illustrator (after an earlier attempt with a general dataset) and “draw” still frames from video of the lead actors, with Stable Diffusion generating the inbetweens.