Skip to content

backblaze-labs/awesome-image-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Image Generation Awesome PRs Welcome License: CC0-1.0

A curated list of AI image generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.

Maintained by Backblaze.

Related Lists

Contents


Text-to-Image APIs

Commercial image-generation APIs with hosted inference and developer SDKs.

  • Adobe Firefly API – Image generation, editing, Photoshop automation, and Lightroom operations. Part of Firefly Services platform. Docs | SDK: JS/TS (official)
  • Amazon Titan Image Generator – Text-to-image via AWS Bedrock. Image conditioning, color palette guidance, background removal, and variations. Docs | SDK: Python (boto3), Java, PHP
  • Black Forest Labs (FLUX Pro) – FLUX 1.1 Pro and FLUX.2 (32B params) via REST API. From the creators of FLUX and Stable Diffusion. Also on Replicate, fal.ai, Together AI. Docs
  • fal.ai – Serverless inference hosting 1000+ image models. Fastest diffusion inference engine. Hosts FLUX, SD, and more. SOC 2 compliant. Docs | SDK: Python, JS
  • Google Gemini Image API – Native image generation via Gemini models (gemini-2.5-flash-image, gemini-3.1-flash-image-preview). Text-to-image, editing, multi-turn. Python/JS/Go/Java SDKs. Free tier via AI Studio. Docs | SDK: Python (google-genai), JS (google/generative-ai), Go, Java
  • Google Imagen (Vertex AI) – Imagen 4 via Vertex AI. Text-to-image, editing, outpainting, inpainting, customization. Docs | SDK: Python (google-cloud-aiplatform), Node
  • Ideogram – Known for high-quality text rendering in images. Ideogram 3.0 supports generation, remix, edit, and character reference. OpenAI-compatible interface. Docs
  • Leonardo AI – Text-to-image, image-to-image, and image-to-video. Webhooks, LoRA models, and "Get API Code" export from web UI. Docs | SDK: TypeScript, Python
  • Midjourney – Official API released late 2025. Enterprise/Pro plan holders only; no public self-service access. Docs
  • OpenAI GPT Image – gpt-image-1, gpt-image-1.5, gpt-image-1-mini. Natively multimodal generation, editing, and inpainting. DALL-E 2/3 deprecated May 2026. Docs | SDK: Python, Node
  • Recraft AI – Raster and vector image generation. V4 model (Feb 2026). Background removal, inpainting, outpainting, vectorization. OpenAI-compatible interface. Docs
  • Stability AI – Stable Diffusion 3.5 and Stable Image via REST API. Text-to-image, image-to-image, upscaling, inpainting. Docs
  • xAI Image Generation API – grok-imagine-image model via REST API. Text-to-image and image editing. Batch up to 10 images, 1k/2k resolution. OpenAI-compatible interface. Docs | SDK: Python (xai-sdk), JS (openai-compatible)

Open Source Models

Open-weight image-generation models you can run locally or self-host.

  • FLUX.1 [schnell] – 12B param rectified flow transformer. 1-4 step generation. Fully open for commercial use. Docs
  • FLUX.1 Kontext [dev] – 12B param instruction-based image editing model. Edit existing images via text prompts; character/style reference without finetuning. Non-commercial license. Docs
  • DeepFloyd IF – Cascaded pixel-space diffusion (64px → 256px → 1024px). Strong text rendering. Zero-Shot FID 6.66 on COCO.
  • LCM / LCM-LoRA – Latent Consistency Models enabling 2-4 step generation. LCM-LoRA is a lightweight ~100MB adapter for any SDXL model. Docs
  • PixArt-Alpha / PixArt-Sigma – DiT-based T2I at 10.8% of SD1.5 training cost. Near-commercial quality. Docs
  • Kandinsky 3 – Open-source T2I from AI Forever. 2x larger U-Net and 10x larger text encoder vs v2.x. Docs
  • FLUX.1 [dev] – 12B param guidance-distilled model. High quality, competitive with closed-source. Non-commercial license.
  • FLUX.2 [dev] – 32B param model with generation, editing, and multi-reference combining.
  • GLM-Image – 16B hybrid autoregressive + diffusion model from Zhipu AI. Excels at text rendering inside images. Supports T2I and I2I. Runs via GlmImagePipeline in diffusers. Docs
  • HiDream-I1 – 17B sparse diffusion transformer for text-to-image. Three variants (Full, Dev, Fast). Top benchmark scores; diffusers-native via HiDreamImagePipeline. Docs
  • Playground v2.5 – Aesthetic-focused model fine-tuned on SDXL architecture.
  • Qwen-Image – Alibaba's open-weight T2I family. Qwen-Image-2512 (text-to-image) and Qwen-Image-Edit variants. Strong text rendering including Chinese. Diffusers-native, Apache 2.0. Docs
  • SDXL-Turbo – Adversarial distillation of SDXL enabling single-step generation.
  • Stable Diffusion 1.5 – 860M UNet, runs on consumer GPUs. Foundation for massive community ecosystem of LoRAs, fine-tunes, and extensions.
  • Stable Diffusion 3.5 Large – MMDiT architecture with three text encoders (including T5-XXL). Highest-quality Stability open model. Docs
  • Stable Diffusion XL (SDXL) – Native 1024x1024. Improved text-in-image and limb generation. Base + refiner pipeline.

Open Source Frameworks and UIs

Graphical and programmatic interfaces for running diffusion pipelines.

  • AUTOMATIC1111 WebUI – Most widely used Gradio-based SD web UI. 161k+ stars. Extensive extension ecosystem. Docs
  • ComfyUI – Node-based graph UI and backend for diffusion models. Highly customizable, API-accessible. Supports SD, SDXL, Flux, and modern models. Docs
  • Fooocus – Midjourney-inspired SDXL UI. Prompt-only workflow, no manual parameter tweaking.
  • InvokeAI – Creative engine for SD models targeting professionals. Industry-leading WebUI. Docs
  • Forge – Fork of AUTOMATIC1111 with improved GPU memory management and performance. Compatible with A1111 extensions.
  • AI Toolkit (ostris) – All-in-one training suite for diffusion models. GUI and CLI. Trains FLUX.1/2, SDXL, SD 1.5, Qwen-Image, HiDream, and video models on consumer hardware.
  • ComfyUI-Manager – Extension for ComfyUI that installs, updates, and manages 800+ custom nodes via a GUI or CLI. Auto-installed with ComfyUI Desktop. Docs
  • DiffSynth-Studio – Python diffusion engine by ModelScope. Inference and LoRA training for FLUX.1/2, Qwen-Image, Z-Image, and JoyAI-Image. Low-VRAM optimizations, ControlNet, IP-Adapter support.
  • kohya_ss – Gradio-based GUI for Kohya's SD training scripts. Supports LoRA, DreamBooth, and fine-tuning for SD 1.5, SDXL, SD3, and FLUX.1.
  • OneTrainer – GUI and CLI training suite for diffusion models. Supports FLUX.1/2, Chroma, SD 1.5/2/3/XL, SDXL, PixArt, HiDream, and Hunyuan Video.
  • stable-diffusion.cpp – Diffusion model inference in pure C/C++ with no external dependencies. Runs SD 1.x/2.x/XL/3.5, FLUX.1/2, Chroma, Qwen-Image, and Z-Image. CPU/CUDA/Metal/Vulkan backends.

Image Editing and Enhancement

Conditioning, adaptation, restoration, and upscaling tools.

  • GFPGAN – Face restoration from Tencent ARC. Restores facial details from degraded images. Often paired with Real-ESRGAN.
  • Real-ESRGAN – Image and video upscaler, up to 8x. Handles real-world blind super-resolution with noise/artifact removal. Docs
  • IP-Adapter – Lightweight adapter (~100MB) for image-based prompting. New cross-attention layers for image feature conditioning. Docs
  • ControlNet – Precise structural control for diffusion models via edge maps, depth, pose, normals. Available for SD1.5, SDXL, and Flux. Docs
  • Upscayl – Desktop GUI for AI image upscaling on Linux, macOS, and Windows. Uses Real-ESRGAN and other models; up to 16x upscale. Requires Vulkan GPU. Docs

SDKs and Developer Tooling

Libraries and client SDKs for integrating image generation into apps.

  • Gradio – Python library for building interactive ML demos and web UIs. Foundation for AUTOMATIC1111, Fooocus, and HuggingFace Spaces. Includes gradio-client for programmatic access. Docs | SDK: Python (pip install gradio)
  • HuggingFace Diffusers – The canonical PyTorch library for diffusion models. SD 1.5, SDXL, SD3, Flux, ControlNet, IP-Adapter, and more. Docs | SDK: Python (pip install diffusers)
  • Replicate SDK – Python/JS client for 50,000+ hosted ML models. Pay-per-second, no GPU management. Docs | SDK: Python (pip install replicate), Node (npm install replicate)
  • fal.ai SDK – Python and JS SDKs for serverless inference. Also a Vercel AI SDK provider. Docs | SDK: Python (pip install fal-client), Node (npm install @fal-ai/client)
  • OpenAI SDK – Official SDK for GPT Image generation and editing. client.images.generate() and client.images.edit(). SDK: Python (pip install openai), Node (npm install openai)

GPU Cloud Providers

Serverless and on-demand GPU platforms for running image models.

  • fal.ai (GPU) – Fastest diffusion inference engine. 1000+ hosted models. Docs
  • Lambda Labs – On-demand A100 and H100 GPUs. Competitive pricing (~$1.10/hr A100 80GB). Docs
  • Modal – Serverless Python GPU cloud. Sub-second cold starts. Docs | SDK: Python (pip install modal)
  • Replicate – Serverless model hosting for open-source image models. Docs
  • RunPod – GPU pods and serverless endpoints. 48% of serverless cold starts under 200ms. Docs
  • Together AI – Inference API for 200+ open models. Docs
  • WaveSpeed AI – Serverless inference platform with 700+ image and video models. Sub-second cold starts for FLUX and other diffusion models. OpenAI-compatible REST API. Docs | SDK: Python, JS

Image Storage and Delivery

Object stores and CDNs suited to generated-image workloads.

  • Backblaze B2 – S3-compatible object storage at low cost. Free egress via Cloudflare. Docs | B2 integration
  • Cloudflare Images – Image CDN on Cloudflare's global network. Pre-defined variants for transformations.
  • Cloudinary – Enterprise image/video CDN with AI-powered transformations. Docs | SDK: Python, Node, Ruby, PHP, Java, .NET
  • Imgix – Real-time image processing CDN. URL-parameter-based transforms. Connects to existing S3/GCS storage. Docs

Evaluation and Observability

Metrics, leaderboards, and quality tooling for generated images.

  • pytorch-fid – PyTorch FID (Fréchet Inception Distance) implementation. Measures distribution similarity between real and generated images. SDK: Python (pip install pytorch-fid)
  • IQA-PyTorch – Comprehensive image quality toolbox. PSNR, SSIM, LPIPS, FID, NIQE, MUSIQ, TOPIQ, NIMA, BRISQUE, and more.
  • CLIP Score – Measures semantic alignment between text prompts and generated images using CLIP embeddings. Available via torchmetrics.multimodal.CLIPScore.
  • ImageReward – First general-purpose human preference reward model for T2I (NeurIPS 2023). Trained on 137k expert comparison pairs. Docs
  • torch-fidelity – High-fidelity ISC, FID, KID, and PRC metrics. Supports InceptionV3, CLIP, DINOv2, VGG16 feature extractors. Docs | SDK: Python (pip install torch-fidelity)

Templates and Example Projects

Reference implementations, demos, and starter projects.


Contributing

Contributions are welcome. See CONTRIBUTING.md. One entry per PR — edit entries.yaml only and let the maintainers regenerate README.md.

License

Released under CC0 1.0 Universal. You may copy, modify, and redistribute without attribution.

About Backblaze B2

Backblaze B2 Cloud Storage is S3-compatible object storage designed for AI and media workloads. This list is maintained as part of our work making B2 a convenient storage layer for AI workflows.

About

A curated list of AI image generation APIs, SDKs, and tools including text-to-image, image editing, diffusion models, generative art systems, and multimodal AI platforms. Covers commercial services, open source models with APIs, and scalable infrastructure for developers building visual applications.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors