All diffusion and language models are autoregressive. That just means that the output is fed back in as input until the task is complete.
With diffusion models this means that it is fed an image that is 100% noise and it removes some small percentage of the noise and then then the denoised image is fed back in and another small percentage is removed. This is repeated until a defined stopping points (usually a set number of passes).
Combining images and using one image to control the generation of another has been available for quite a while. Controlnet and IPAdapters let you do exactly that: ‘Put this coat on this person’ or ‘Take this picture and do it in this style’. Here’s an 11 month old YouTube video explaining how to do this using open source models and software: www.youtube.com/watch?v=gmwZGC8UVHE
It’s nice for non-technical people that OpenAI will sell you a subscription in order to access an agent that can perform these kinds of image generation abilities, but it’s not doing anything new in terms of image generation.
theterrasque@infosec.pub 5 days ago
I know them, and used them a bit. I even mentioned them in an earlier comment. The capabilities of OpenAI’s new model is on a different level in my experience.
www.reddit.com/r/StableDiffusion/…/4o_vs_flux/ - read the comments there. That’s a community dedicated to running local diffusion models. They’re familiar with all the tricks. They’re pretty damn impressed too.
I can’t help but feel that people here either haven’t tried the new openai image model, or have never actually used any of the existing ai image generators before.
MITM0@lemmy.world 5 days ago
I cannot take you seriously with all that reddit comments.
But then why am I even surprised, you shill for a proprietary-AI
theterrasque@infosec.pub 5 days ago
ah yes, I forgot we live in post-truth society where reality doesn’t matter and only your feelings are important. And since your feelings say AI bad, proprietary bad, and reddit bad, you don’t have to actually think or take into consideration reality.
MITM0@lemmy.world 5 days ago
Truth in this case simply means Your ill-informed opinions
& FYI, I like AIs that are fully opensource