Stable Diffusion XL Turbo can generate AI images as fast as you can type

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨thehatfox@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://arstechnica.com/information-technology/2023/11/stable-diffusion-turbo-xl-accelerates-image-synthesis-with-one-step-generation/?comments=1&comments-page=1

source

Comments

Sort:hotnew top

ubermeisters@lemmy.world ⁨1⁩ ⁨year⁩ ago
Seems like a bit of a stretch to call 4 seconds on a 360, “realtime”

source
- domi@lemmy.secnd.me ⁨1⁩ ⁨year⁩ ago
  I tried it on a 6900 XT recently and generation time was well under half a second.
  
  Results are not as good as with SDXL but for the time it needs it’s very impressive.
  
  source
- Paradox@lemdro.id ⁨1⁩ ⁨year⁩ ago
  The author can’t type very quickly
  
  source
  - ubermeisters@lemmy.world ⁨1⁩ ⁨year⁩ ago
    A rapid dark-tan mammalian with a bushy red tail propels itself upward off the ground to an elevation above or greater of the below canine, whom has a disposition contrary to prodcutivity.
    
    source
- barsoap@lemm.ee ⁨1⁩ ⁨year⁩ ago
  I’d guess that the ‘realtime’ is a quote from StabilityAI and of course they’re running that stuff on an A100. A couple of seconds is still interactive rate as generally speaking you want to think about the changes you’re making to your conditioning.
  
  Haven’t tried yet but if individual steps of XL Turbo take ballpark as much time as LCM steps then… well, it’s four to eight times faster. As quality generally isn’t production-ready we’re generally speaking about rough prompt prototyping, testing out an animation pipeline, such stuff, but that has the caveat that increasing step size often leads to markedly different results (complete change of composition, not just details) so the information you gain from those preview-quality images is limited.
  
  Oh, “production ready quality”: image quality being roughly en par with 4-step LCM means that it’s nowhere near production grade. For the final render you still want to give the model more steps. OTOH I’ve found that some LCM-based merges do in 30 steps what other models need 80 steps for so improvements are always welcome. But I’m also worried about these distilled models being less flexible, pruning only slightly trodden paths that you actually might want the model to take.
  
  source
- simple@lemm.ee ⁨1⁩ ⁨year⁩ ago
  Well, it is technically as fast as you can type if you’re running a better GPU. The 3060 is pretty mid-tier at this point.
  
  source
  - TheGrandNagus@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Low end card tbh.
    
    I’ll get crucified for saying that because people will interpret that as an attack on their PC or something daft like that.
    
    It’s Ampere, a GPU architecture from 3.5 years ago. And even then, here’s what the desktop stack was like:
    
    3090 Ti (GA102)
    
    3090 (GA102)
    
    3080 Ti (GA102)
    
    3080 12GB (GA102)
    
    3080 (GA102)
    
    3070 Ti (GA102/GA104)
    
    3070 (GA104)
    
    3060 Ti (GA104/GA103)
    
    3060 (GA106/GA104)
    
    3050 (GA106/GA107)
    
    It was almost at the bottom of Nvidia’s stack 3 years ago. It was a low end card then (because, you know, it was at the bottom end of what they were offering). It’s an even more low end card now.
    
    People are always fooled by Nvidia’s marketing and thinking they’re getting a mid range card when in reality Nvidia’s giving people the scraps and pretending they’re giving you a great deal. People need to demand more from these companies.
    
    source
    -> View More Comments
- neurogenesis@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
  I’m on a 3060 and with 4x upscaling it takes about a second and a half.
  
  source
ecnkmaxo@futurology.today ⁨1⁩ ⁨year⁩ ago
Yay more AIshit images to plague the internet

source
- ubermeisters@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Because all the Boomer clipart it’s replacing is so endearing
  
  source
  - Skipcast@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Now we get ai generated boomer art instead, and at a faster pace
    
    source
    -> View More Comments
- Sheeple@lemmy.world ⁨1⁩ ⁨year⁩ ago
  [deleted]
  source
  - TimeSquirrel@kbin.social ⁨1⁩ ⁨year⁩ ago
    You might be waiting a long time. We aren't going back and this is one of those things that are not going back into the box. So now we must prepare for it and learn to live with it as the best course of action and make sure it's not used to oppress us.
    
    source
    -> View More Comments
  - SCB@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Lmao I’m old enough to remember “the internet is just a fad”
    
    source
- lurch@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
  There’s a fair chance we’ll see (or actually don’t see) a lot more offline use. AI apps are coming to desktop PCs and phones and it means in the long run people don’t have to get some entertaining stuff from the web any more. Like if you want to a cool pic of a dragon for a wallpaper, you can just ask the AI app on your PC and it will make a bunch to choose from.
  
  source
  - atocci@kbin.social ⁨1⁩ ⁨year⁩ ago
    What's out there that actually works offline? Stable Diffusion is the only one I've heard about, everyone else is more interested in exclusively selling AI as a service.
    
    source
    -> View More Comments
DoucheBagMcSwag@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
This isn’t free BTW folks

source
- Sixner@lemmy.world ⁨1⁩ ⁨year⁩ ago
  I haven’t messed with any AI imaging stuff yet. And free recommendations to just have some fun?
  
  source
  - lloram239@feddit.de ⁨1⁩ ⁨year⁩ ago
    Bing Image Creator if you just want to create some images quick (free, Microsoft account required). It’s using DALLE3 behind the scenes, so it’s pretty much state-of-the-art, but rather limited in terms of features otherwise and rather heavy on the censorship.
    
    If you wanna generate something local on your PC with more flexibility, Automatic1111 along with one of the models from CivitAI, needs a reasonably modern graphics card and enough VRAM (8GB+) to be enjoyable and installation can be a bit fiddly (check Youtube & Co. for tutorials). But once past that you can create some pretty wild stuff.
    
    source
LifeInOregon@lemmy.world ⁨1⁩ ⁨year⁩ ago
And the resulting faces still all have lazy eyes, asymmetric features, and significantly uncanny issues.

source
- MostlyHarmless@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
  Humans have asymmetric features. No one is symmetrical
  
  source
  - LifeInOregon@lemmy.world ⁨1⁩ ⁨year⁩ ago
    These features are abnormally asymmetric to the point of being off-putting. General symmetry of features is a significant part of what attracts people one to another, and why facial droops from things like Bells Palsy or strokes can often be psychologically difficult for the patient who experiences them.
    
    General symmetry, not exact symmetry.
    
    source
  - Apothecary@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Anecdote: I think Denzel Washington is supposed to have one of the most symmetrical faces.
    
    source
- Deceptichum@kbin.social ⁨1⁩ ⁨year⁩ ago
  You can easily get incredibly canny stuff.
  
  source
mriormro@lemmy.world ⁨1⁩ ⁨year⁩ ago
Great, even more online noise that I can look forward to.

source
Zoboomafoo@lemmy.world ⁨1⁩ ⁨year⁩ ago
That’s impressive

source
Stalinwolf@lemmy.ca ⁨1⁩ ⁨year⁩ ago
I’ve tried to install this multiple times but always manage to fuck it up somehow. I think the guides I’m following are outdated or pointing me to one or more incompatible files.

source
- barsoap@lemm.ee ⁨1⁩ ⁨year⁩ ago
  Tough look running any code published by people who put out models, it’s research-grade software in every sense of the word. “Works on my machine” and “the source is the configuration file” kind of thing.
  
  Get yourself comfyui, they’re always very fast when it comes to supporting new stuff and the thing is generally faster and easier on VRAM than A1111. Prerequisite is a torch (the python package) enabled with CUDA (nvidia) or rocm (AMD) or whatever Intel uses. Fair warning: Getting rocm to run on not officially supported cards is an adventure in itself, I’m still on torch-1.13.1+rocm5.2 newer builds just won’t work as the GPU I’m telling rocm I have so that it runs in the first place supports instructions that my actual GPU doesn’t, and they started using them.
  
  source
- L_Acacia@lemmy.one ⁨1⁩ ⁨year⁩ ago
  Do you use comfyui ?
  
  source
autotldr@lemmings.world [bot] ⁨1⁩ ⁨year⁩ ago
This is the best summary I could come up with:

Stability detailed the model’s inner workings in a research paper released Tuesday that focuses on the ADD technique.

One of the claimed advantages of SDXL Turbo is its similarity to Generative Adversarial Networks (GANs), especially in producing single-step image outputs.

Stability AI says that on an Nvidia A100 (a powerful AI-tuned GPU), the model can generate a 512×512 image in 207 ms, including encoding, a single de-noising step, and decoding.

This move has already been met with some criticism in the Stable Diffusion community, but Stability AI has expressed openness to commercial applications and invites interested parties to get in touch for more information.

Meanwhile, Stability AI itself has faced internal management issues, with an investor recently urging CEO Emad Mostaque to resign.

Stability AI offers a beta demonstration of SDXL Turbo’s capabilities on its image-editing platform, Clipdrop.

The original article contains 553 words, the summary contains 138 words. Saved 75%. I’m a bot and I’m open source!

source
You999@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
This is great news for people who make animations with deforum as the speed increase should make Rakile’s deforumation GUI much more usable for live composition and framing.

source
Gabu@lemmy.world ⁨1⁩ ⁨year⁩ ago
Does it actually run any faster though? For instance, if I manually spun a model with the diffusers library and ran it locally on dml, would there be any difference?

source