Comment on DeviantArt’s Downfall Is Devastating, Depressing, and Dumb

<- View Parent
tal@lemmy.today ⁨1⁩ ⁨month⁩ ago

I think creating a lora for your character would help in that case.

A LORA is good for replicating a style, where there’s existing stuff, helps add training data for a particular subject. There are problems that existing generative AIs smack into that that’s good at fixing. But it’s not a cure-all for all limitations of such systems. The problems there is kinda fundamental to how the system works today – it’s not a lack of training data, but simply how the system deals with the world.

The problem is that the LLM-based systems today think of the world as a series of largely-decoupled 2D images, linked only by keywords. A human artist thinks of the world as 3D, can visualize something – maybe using a model to help with perspective – and then render it.

So, okay. If you want to create a facial portrait of a kinda novel character, that’s something that you can do pretty well with AI-based generators.

But now try and render that character you just created from ten different angles, in unique scenes. That’s something that a human is pretty good at:

Image

Like, try reproducing that page in Stable Diffusion, with the same views. Even if you can eventually get something even remotely approximating that, a human, traditional comic artist is going to be a lot faster at it than someone sitting in front of a Stable Diffusion box.

Is it possible to make some form of art generator that can do that? Yeah, maybe. But it’s going to have to have a much more-sophisticated “mental” model of the world, a 3D one, and have solid 3D computer vision to be able to reduce scenes to 3D. And while people are working on it, that has its own extensive set of problems. Look at your training set. The human artist slightly stylized stuff or made errors that human viewers can ignore pretty easily, but a computer vision model that doesn’t work exactly like human vision and the mind might go into conniptions over. For example, look at the fifth panel there. The artist screwed up – the ship slightly overlaps the dock, right above the “THWIP”. A human viewer probably wouldn’t notice or care. But if you have some kind of computer vision system that looks for line intersections to determine relative 3d positioning – something that we do ourselves – it can very easily look at that image and have no idea what the hell is going on there.

The proportions aren’t exactly consistent from frame to frame, don’t perfectly reflect reality, and might be more effective at conveying movement or whatever than an actual rendering of a 3d model would be. That works for human viewers. And existing 2D systems can kind of dodge the problem (as long as they’re willing to live with the limitations that intrinsically come with a 2D model) because they’re looking at a bunch of already-stylized images. But now imagine that they’re trying to take images, then reduce them into a coherent 3D world, then learn to re-apply stylization. That may involve creating not just a 3D model, but enough understanding of the objects in that world to understand what stylization is reasonable on, and when. Is it technically possible? Probably. But is it a minor effort to get there from here? No, probably not. You’re going to have to make a system that works wildly differently from the way that the existing systems do. That’s even though what you’re trying to do might seem small from the standpoint of a human observer – just being able to get arbitrary camera angles of the image being rendered.

source
Sort:hotnewtop