1
42
MEGATHREAD (lemmy.dbzer0.com)
submitted 2 years ago by [email protected] to c/[email protected]

This is a copy of /r/stablediffusion wiki to help people who need access to that information


Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

#Community

Games

  • PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

Databases or Lists

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

  • Check out our guides section above!

Will it run on my machine?

  • Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
  • Only Nvidia cards are officially supported.
  • AMD support is available here unofficially.
  • Apple M1 Chip support is available here unofficially.
  • Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2
11
submitted 6 days ago by [email protected] to c/[email protected]
3
5
submitted 1 week ago by [email protected] to c/[email protected]
4
10
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

FLUX.1 Kontext [dev] is a 12 billion parameter rectified flow transformer capable of editing images based on text instructions.

Model weights: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev

Code: https://github.com/black-forest-labs/flux

Self-Serve Portal: http://bfl.ai/pricing/licensing

Helpdesk: https://help.bfl.ai/

5
11
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

Abstract

The emergence of Large Language Models (LLMs) has unified language generation tasks and revolutionized human-machine interaction. However, in the realm of image generation, a unified model capable of handling various tasks within a single framework remains largely unexplored. In this work, we introduce OmniGen, a new diffusion model for unified image generation. OmniGen is characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports various downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional plugins. Moreover, compared to existing diffusion models, it is more user-friendly and can complete complex tasks end-to-end through instructions without the need for extra intermediate steps, greatly simplifying the image generation workflow. 3) Knowledge Transfer: Benefit from learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of the chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and we will release our resources at this https URL to foster future advancements.

Paper: https://arxiv.org/abs/2409.11340

Code: https://github.com/VectorSpaceLab/OmniGen2

Demo: https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file#-gradio-demo

Model: https://huggingface.co/OmniGen2/OmniGen2

Project Page: https://vectorspacelab.github.io/OmniGen2

6
23
submitted 1 week ago by [email protected] to c/[email protected]
7
9
submitted 2 weeks ago by [email protected] to c/[email protected]
8
7
submitted 2 weeks ago by [email protected] to c/[email protected]
9
11
submitted 2 weeks ago by [email protected] to c/[email protected]

Abstract

We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: this https URL

Report: https://arxiv.org/abs/2501.12202

Code: https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1

Demo: https://huggingface.co/spaces/tencent/Hunyuan3D-2.1

Project Page: https://3d.hunyuan.tencent.com/

10
5
submitted 3 weeks ago by [email protected] to c/[email protected]
11
15
submitted 3 weeks ago by [email protected] to c/[email protected]
12
5
submitted 4 weeks ago by [email protected] to c/[email protected]

Generate up to 15s of high quality speech or song driven Video on 10 GB of VRAM

13
6
submitted 4 weeks ago by [email protected] to c/[email protected]
14
7
submitted 1 month ago by [email protected] to c/[email protected]
15
4
Automagic optimizer? (sh.itjust.works)
submitted 1 month ago by [email protected] to c/[email protected]

I've been reading more into training (mostly for wan2.1) lately and noticed this optimizer as an option in ai-toolkit as well as in diffusion-pipe.

Aside from just trying to read through and understand the source code, does anyone know of any documentation on how this is supposed to work or recommended usage/parameters? I can't seem to find anything to learn more about it in my cursory searching.

16
13
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

17
11
ComfyUI Bounty Program | Notion (comfyorg.notion.site)
submitted 1 month ago by [email protected] to c/[email protected]
18
9
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

Abstract

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open0source foundational model that natively supports multimodal understanding and generation. BAGEL is a unified, decoder0only model pretrained on trillions of tokens curated from large0scale interleaved text, image, video, and web data. When scaled with such diverse multimodal interleaved data, BAGEL exhibits emerging capabilities in complex multimodal reasoning. As a result, it significantly outperforms open-source unified models in both multimodal generation and understanding across standard benchmarks, while exhibiting advanced multimodal reasoning abilities such as free-form image manipulation, future frame prediction, 3D manipulation, and world navigation. In the hope of facilitating further opportunities for multimodal research, we share the key findings, pretraining details, data creation protocal, and release our code and checkpoints to the community. The project page is at this https URL

Paper: https://arxiv.org/abs/2505.14683

Code: https://github.com/bytedance-seed/BAGEL

Demo: https://demo.bagel-ai.org/

Project Page: https://bagel-ai.org/

Model: https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT

19
20
submitted 1 month ago by [email protected] to c/[email protected]
20
10
submitted 1 month ago by [email protected] to c/[email protected]

Abstract

While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D generation remains comparatively underdeveloped due to fundamental challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an open framework addressing these challenges through: (1) a rigorous data curation pipeline processing >5M assets to create a 2M high-quality dataset with standardized geometric and textural properties; (2) a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with an diffusion-based texture synthesis module; and (3) the full open-source release of models, training code, and adaptation modules. For geometry generation, the hybrid VAE-DiT component produces TSDF representations by employing perceiver-based latent encoding with sharp edge sampling for detail preservation. The diffusion-based texture synthesis module then ensures cross-view consistency through geometric conditioning and latent-space synchronization. Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods, while also achieving competitive quality with proprietary solutions. Notably, the framework uniquely bridges the 2D and 3D generation paradigms by supporting direct transfer of 2D control techniques~(e.g., LoRA) to 3D synthesis. By simultaneously advancing data quality, algorithmic fidelity, and reproducibility, Step1X-3D aims to establish new standards for open research in controllable 3D asset generation.

Technical Report: https://arxiv.org/abs/2505.07747

Code: https://github.com/stepfun-ai/Step1X-3D

Demo: https://huggingface.co/spaces/stepfun-ai/Step1X-3D

Project Page: https://stepfun-ai.github.io/Step1X-3D/

Models: https://huggingface.co/stepfun-ai/Step1X-3D

21
6
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

Abstract

Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our entire project is publicly available on this https URL.

Paper: https://arxiv.org/abs/2412.10255

Code: https://github.com/bilibili/Index-anisora/tree/main

Hugging Face: https://huggingface.co/IndexTeam/Index-anisora

Modelscope: https://www.modelscope.cn/organization/bilibili-index

Project Page: https://komiko.app/video/AniSora

22
3
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
23
9
submitted 1 month ago by [email protected] to c/[email protected]
24
10
submitted 1 month ago by [email protected] to c/[email protected]
25
12
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
view more: next โ€บ

Stable Diffusion

4866 readers
1 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 2 years ago
MODERATORS