Stable Diffusion

4454 readers
15 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 2 years ago
MODERATORS
1
40
MEGATHREAD (lemmy.dbzer0.com)
submitted 2 years ago by [email protected] to c/[email protected]
 
 

This is a copy of /r/stablediffusion wiki to help people who need access to that information


Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

#Community

Games

  • PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

Databases or Lists

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

  • Check out our guides section above!

Will it run on my machine?

  • Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
  • Only Nvidia cards are officially supported.
  • AMD support is available here unofficially.
  • Apple M1 Chip support is available here unofficially.
  • Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2
10
submitted 9 hours ago* (last edited 8 hours ago) by [email protected] to c/[email protected]
 
 

Abstract

Diffusion models (DMs) have become the leading choice for generative tasks across diverse domains. However, their reliance on multiple sequential forward passes significantly limits real-time performance. Previous acceleration methods have primarily focused on reducing the number of sampling steps or reusing intermediate results, failing to leverage variations across spatial regions within the image due to the constraints of convolutional U-Net structures. By harnessing the flexibility of Diffusion Transformers (DiTs) in handling variable number of tokens, we introduce RAS, a novel, training-free sampling strategy that dynamically assigns different sampling ratios to regions within an image based on the focus of the DiT model. Our key observation is that during each sampling step, the model concentrates on semantically meaningful regions, and these areas of focus exhibit strong continuity across consecutive steps. Leveraging this insight, RAS updates only the regions currently in focus, while other regions are updated using cached noise from the previous step. The model's focus is determined based on the output from the preceding step, capitalizing on the temporal consistency we observed. We evaluate RAS on Stable Diffusion 3 and Lumina-Next-T2I, achieving speedups up to 2.36x and 2.51x, respectively, with minimal degradation in generation quality. Additionally, a user study reveals that RAS delivers comparable qualities under human evaluation while achieving a 1.6x speedup. Our approach makes a significant step towards more efficient diffusion transformers, enhancing their potential for real-time applications.

Paper: https://arxiv.org/abs/2502.10389

Paper Summary: https://www.aimodels.fyi/papers/arxiv/region-adaptive-sampling-diffusion-transformers

Code: https://github.com/microsoft/RAS

Project Page: https://microsoft.github.io/RAS/

3
4
 
 

"This feature is launching first for Gold-tier Civitai Subscribers as we refine and improve its functionality" - so they're most likely planning to expand it to all users at some point.

Pretty disappointed with the direction CiviatAI is going.

5
 
 

6
7
8
9
10
11
12
13
 
 
  • Add img2img and Image Variations tabs to the Qt GUI by @monstruosoft

Release: https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.120

14
 
 

Flex.1 began as a finetune of FLUX.1-schnell, which allows the model to retain the Apache 2.0 license. It is designed to be fine tunable with Day 1 LoRA training support in AI-Toolkit.

15
16
17
18
19
 
 

Abstract

Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models. However, these models often lack effective control over character appearances and interactions, particularly in multi-character scenes. To address these limitations, we propose a new task: customized manga generation and introduce DiffSensei, an innovative framework specifically designed for generating manga with dynamic multi-character control. DiffSensei integrates a diffusion-based image generator with a multimodal large language model (MLLM) that acts as a text-compatible identity adapter. Our approach employs masked cross-attention to seamlessly incorporate character features, enabling precise layout control without direct pixel transfer. Additionally, the MLLM-based adapter adjusts character features to align with panel-specific text cues, allowing flexible adjustments in character expressions, poses, and actions. We also introduce MangaZero, a large-scale dataset tailored to this task, containing 43,264 manga pages and 427,147 annotated panels, supporting the visualization of varied character interactions and movements across sequential frames. Extensive experiments demonstrate that DiffSensei outperforms existing models, marking a significant advancement in manga generation by enabling text-adaptable character customization. The code, model, and dataset will be open-sourced to the community.

Paper: https://arxiv.org/abs/2412.07589

Code: https://github.com/jianzongwu/DiffSensei

Project Page: https://jianzongwu.github.io/projects/diffsensei/

20
 
 

Abstract

While diffusion models show extraordinary talents in text-to-image generation, they may still fail to generate highly aesthetic images. More specifically, there is still a gap between the generated images and the real-world aesthetic images in finer-grained dimensions including color, lighting, composition, etc. In this paper, we propose Cross-Attention Value Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. Our key insight is to enhance the aesthetic presentation of existing diffusion models by designing a superior condition control method, all while preserving the image-text alignment. Through our meticulous design, VMix is flexible enough to be applied to community models for better visual performance without retraining. To validate the effectiveness of our method, we conducted extensive experiments, showing that VMix outperforms other state-of-the-art methods and is compatible with other community modules (e.g., LoRA, ControlNet, and IPAdapter) for image generation.

Paper: https://arxiv.org/abs/2412.20800

Code: https://github.com/fenfenfenfan/VMix (Coming soon)

Project Page: https://vmix-diffusion.github.io/VMix/

21
45
1.58-bit FLUX (i.imgur.com)
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
 
 

Abstract

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.

Paper: https://arxiv.org/abs/2412.18653

Code: https://github.com/Chenglin-Yang/1.58bit.flux (coming soon)

22
23
 
 

Change Log for SD.Next

SD.Next Xmass edition: What's new?

While we have several new supported models, workflows and tools, this release is primarily about quality-of-life improvements:

  • New memory management engine
    list of changes that went into this one is long: changes to GPU offloading, brand new LoRA loader, system memory management, on-the-fly quantization, improved gguf loader, etc.
    but main goal is enabling modern large models to run on standard consumer GPUs
    without performance hits typically associated with aggressive memory swapping and needs for constant manual tweaks
  • New documentation website
    with full search and tons of new documentation
  • New settings panel with simplified and streamlined configuration

We've also added support for several new models such as highly anticipated NVLabs Sana (see supported models for full list)
And several new SOTA video models: Lightricks LTX-Video, Hunyuan Video and Genmo Mochi.1 Preview

And a lot of Control and IPAdapter goodies

  • for SDXL there is new ProMax, improved Union and Tiling models
  • for FLUX.1 there are Flux Tools as well as official Canny and Depth models,
    a cool Redux model as well as XLabs IP-adapter
  • for SD3.5 there are official Canny, Blur and Depth models in addition to existing 3rd party models
    as well as InstantX IP-adapter

Plus couple of new integrated workflows such as FreeScale and Style Aligned Image Generation

And it wouldn't be a Xmass edition without couple of custom themes: Snowflake and Elf-Green!
All-in-all, we're around ~180 commits worth of updates, check the changelog for full list

ReadMe | ChangeLog | Docs | WiKi | Discord

24
25
8
InvokeAI v5.5 Released (www.youtube.com)
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
view more: next ›