Comment

Comment on OpenAI's move to allow generating "Ghibly stlye" images isn't just a cute PR stunt. It is an expression of dominance and the will to reject and refuse democratic values. It is a display of power

<- View Parent

YarHarSuperstar@lemmy.world ⁨1⁩ ⁨year⁩ ago

How so?

source

Sort:hotnew top

theterrasque@infosec.pub ⁨1⁩ ⁨year⁩ ago
Autoregressive model

Multimodal with the LLM

Can keep consistency between images

Much better at text rendering

Can combine images, like you have one image and you upload a picture of a jacket and say “put this on him” and it does it

Can upload a picture of yourself and say “put me on the beach”, and then for example if you don’t like it you can tell it to do a different type of beach, and then say “and put me on a white horse and give me some nice beach wear” for example.

It understands what you’re telling it, and can generate images from vague descriptions, combine things from different images just by telling it, modify it and understand the context - like knowing that “me” is the person in the image, for example.
source
- mad_djinn@lemmy.world ⁨1⁩ ⁨year⁩ ago
  you know enough about the model for me to immediately distrust your opinion on the matter. why don’t you head back to ycombinator or whatever hole you crawled out of
  
  source
- Ilixtze@lemm.ee ⁨1⁩ ⁨year⁩ ago
  It is really sad that the most advanced model can only aspire to make derivative shit for techbro loosers,
  
  source
- YarHarSuperstar@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Okay so how does that compare to whatever competition you’re referencing
  
  source
  - theterrasque@infosec.pub ⁨1⁩ ⁨year⁩ ago
    No other model on market can do anything like that. The closest is diffusion based where you could train a lora with a person’s look or a specific clothing, then generate multiple times and / or use controlnet to sorta control the output. That’s fast hours or days of work, plus it’s quite technical to set it up and use.
    
    OpenAI’s new model is a paradigm shift in both what the model can do and how you use it, and can easily and effortlessly produce things that was extremely difficult or impossible without complicated procedures and post processing in Photoshop.
    
    source
    FauxLiving@lemmy.world ⁨1⁩ ⁨year⁩ ago
    All diffusion and language models are autoregressive. That just means that the output is fed back in as input until the task is complete.
    
    With diffusion models this means that it is fed an image that is 100% noise and it removes some small percentage of the noise and then then the denoised image is fed back in and another small percentage is removed. This is repeated until a defined stopping points (usually a set number of passes).
    
    Combining images and using one image to control the generation of another has been available for quite a while. Controlnet and IPAdapters let you do exactly that: ‘Put this coat on this person’ or ‘Take this picture and do it in this style’. Here’s an 11 month old YouTube video explaining how to do this using open source models and software: www.youtube.com/watch?v=gmwZGC8UVHE
    
    It’s nice for non-technical people that OpenAI will sell you a subscription in order to access an agent that can perform these kinds of image generation abilities, but it’s not doing anything new in terms of image generation.
    
    source
    -> View More Comments