A curated list of AI image generation APIs, SDKs, and production-ready tools. Focused on services developers can integrate today.
Maintained by Backblaze.
- Awesome Audio Generation
- Awesome Video Generation
- Awesome ML Data Pipelines
- Awesome Multimodal Data
- Awesome Agent Infrastructure
- Awesome Physical AI
- Text-to-Image APIs
- Open Source Models
- Open Source Frameworks and UIs
- Image Editing and Enhancement
- SDKs and Developer Tooling
- GPU Cloud Providers
- Image Storage and Delivery
- Evaluation and Observability
- Templates and Example Projects
Commercial image-generation APIs with hosted inference and developer SDKs.
- Adobe Firefly API – Image generation, editing, Photoshop automation, and Lightroom operations. Part of Firefly Services platform. Docs | SDK: JS/TS (official)
- Amazon Titan Image Generator – Text-to-image via AWS Bedrock. Image conditioning, color palette guidance, background removal, and variations. Docs | SDK: Python (boto3), Java, PHP
- Black Forest Labs (FLUX Pro) – FLUX 1.1 Pro and FLUX.2 (32B params) via REST API. From the creators of FLUX and Stable Diffusion. Also on Replicate, fal.ai, Together AI. Docs
- fal.ai – Serverless inference hosting 1000+ image models. Fastest diffusion inference engine. Hosts FLUX, SD, and more. SOC 2 compliant. Docs | SDK: Python, JS
- Google Gemini Image API – Native image generation via Gemini models (gemini-2.5-flash-image, gemini-3.1-flash-image-preview). Text-to-image, editing, multi-turn. Python/JS/Go/Java SDKs. Free tier via AI Studio. Docs | SDK: Python (google-genai), JS (google/generative-ai), Go, Java
- Google Imagen (Vertex AI) – Imagen 4 via Vertex AI. Text-to-image, editing, outpainting, inpainting, customization. Docs | SDK: Python (google-cloud-aiplatform), Node
- Ideogram – Known for high-quality text rendering in images. Ideogram 3.0 supports generation, remix, edit, and character reference. OpenAI-compatible interface. Docs
- Leonardo AI – Text-to-image, image-to-image, and image-to-video. Webhooks, LoRA models, and "Get API Code" export from web UI. Docs | SDK: TypeScript, Python
- Midjourney – Official API released late 2025. Enterprise/Pro plan holders only; no public self-service access. Docs
- OpenAI GPT Image – gpt-image-1, gpt-image-1.5, gpt-image-1-mini. Natively multimodal generation, editing, and inpainting. DALL-E 2/3 deprecated May 2026. Docs | SDK: Python, Node
- Recraft AI – Raster and vector image generation. V4 model (Feb 2026). Background removal, inpainting, outpainting, vectorization. OpenAI-compatible interface. Docs
- Stability AI – Stable Diffusion 3.5 and Stable Image via REST API. Text-to-image, image-to-image, upscaling, inpainting. Docs
- xAI Image Generation API – grok-imagine-image model via REST API. Text-to-image and image editing. Batch up to 10 images, 1k/2k resolution. OpenAI-compatible interface. Docs | SDK: Python (xai-sdk), JS (openai-compatible)
Open-weight image-generation models you can run locally or self-host.
- FLUX.1 [schnell] – 12B param rectified flow transformer. 1-4 step generation. Fully open for commercial use. Docs
- FLUX.1 Kontext [dev] – 12B param instruction-based image editing model. Edit existing images via text prompts; character/style reference without finetuning. Non-commercial license. Docs
- DeepFloyd IF – Cascaded pixel-space diffusion (64px → 256px → 1024px). Strong text rendering. Zero-Shot FID 6.66 on COCO.
- LCM / LCM-LoRA – Latent Consistency Models enabling 2-4 step generation. LCM-LoRA is a lightweight ~100MB adapter for any SDXL model. Docs
- PixArt-Alpha / PixArt-Sigma – DiT-based T2I at 10.8% of SD1.5 training cost. Near-commercial quality. Docs
- Kandinsky 3 – Open-source T2I from AI Forever. 2x larger U-Net and 10x larger text encoder vs v2.x. Docs
- FLUX.1 [dev] – 12B param guidance-distilled model. High quality, competitive with closed-source. Non-commercial license.
- FLUX.2 [dev] – 32B param model with generation, editing, and multi-reference combining.
- GLM-Image – 16B hybrid autoregressive + diffusion model from Zhipu AI. Excels at text rendering inside images. Supports T2I and I2I. Runs via GlmImagePipeline in diffusers. Docs
- HiDream-I1 – 17B sparse diffusion transformer for text-to-image. Three variants (Full, Dev, Fast). Top benchmark scores; diffusers-native via HiDreamImagePipeline. Docs
- Playground v2.5 – Aesthetic-focused model fine-tuned on SDXL architecture.
- Qwen-Image – Alibaba's open-weight T2I family. Qwen-Image-2512 (text-to-image) and Qwen-Image-Edit variants. Strong text rendering including Chinese. Diffusers-native, Apache 2.0. Docs
- SDXL-Turbo – Adversarial distillation of SDXL enabling single-step generation.
- Stable Diffusion 1.5 – 860M UNet, runs on consumer GPUs. Foundation for massive community ecosystem of LoRAs, fine-tunes, and extensions.
- Stable Diffusion 3.5 Large – MMDiT architecture with three text encoders (including T5-XXL). Highest-quality Stability open model. Docs
- Stable Diffusion XL (SDXL) – Native 1024x1024. Improved text-in-image and limb generation. Base + refiner pipeline.
Graphical and programmatic interfaces for running diffusion pipelines.
- AUTOMATIC1111 WebUI – Most widely used Gradio-based SD web UI. 161k+ stars. Extensive extension ecosystem. Docs
- ComfyUI – Node-based graph UI and backend for diffusion models. Highly customizable, API-accessible. Supports SD, SDXL, Flux, and modern models. Docs
- Fooocus – Midjourney-inspired SDXL UI. Prompt-only workflow, no manual parameter tweaking.
- InvokeAI – Creative engine for SD models targeting professionals. Industry-leading WebUI. Docs
- Forge – Fork of AUTOMATIC1111 with improved GPU memory management and performance. Compatible with A1111 extensions.
- AI Toolkit (ostris) – All-in-one training suite for diffusion models. GUI and CLI. Trains FLUX.1/2, SDXL, SD 1.5, Qwen-Image, HiDream, and video models on consumer hardware.
- ComfyUI-Manager – Extension for ComfyUI that installs, updates, and manages 800+ custom nodes via a GUI or CLI. Auto-installed with ComfyUI Desktop. Docs
- DiffSynth-Studio – Python diffusion engine by ModelScope. Inference and LoRA training for FLUX.1/2, Qwen-Image, Z-Image, and JoyAI-Image. Low-VRAM optimizations, ControlNet, IP-Adapter support.
- kohya_ss – Gradio-based GUI for Kohya's SD training scripts. Supports LoRA, DreamBooth, and fine-tuning for SD 1.5, SDXL, SD3, and FLUX.1.
- OneTrainer – GUI and CLI training suite for diffusion models. Supports FLUX.1/2, Chroma, SD 1.5/2/3/XL, SDXL, PixArt, HiDream, and Hunyuan Video.
- stable-diffusion.cpp – Diffusion model inference in pure C/C++ with no external dependencies. Runs SD 1.x/2.x/XL/3.5, FLUX.1/2, Chroma, Qwen-Image, and Z-Image. CPU/CUDA/Metal/Vulkan backends.
Conditioning, adaptation, restoration, and upscaling tools.
- GFPGAN – Face restoration from Tencent ARC. Restores facial details from degraded images. Often paired with Real-ESRGAN.
- Real-ESRGAN – Image and video upscaler, up to 8x. Handles real-world blind super-resolution with noise/artifact removal. Docs
- IP-Adapter – Lightweight adapter (~100MB) for image-based prompting. New cross-attention layers for image feature conditioning. Docs
- ControlNet – Precise structural control for diffusion models via edge maps, depth, pose, normals. Available for SD1.5, SDXL, and Flux. Docs
- Upscayl – Desktop GUI for AI image upscaling on Linux, macOS, and Windows. Uses Real-ESRGAN and other models; up to 16x upscale. Requires Vulkan GPU. Docs
Libraries and client SDKs for integrating image generation into apps.
- Gradio – Python library for building interactive ML demos and web UIs. Foundation for AUTOMATIC1111, Fooocus, and HuggingFace Spaces. Includes gradio-client for programmatic access. Docs | SDK: Python (pip install gradio)
- HuggingFace Diffusers – The canonical PyTorch library for diffusion models. SD 1.5, SDXL, SD3, Flux, ControlNet, IP-Adapter, and more. Docs | SDK: Python (pip install diffusers)
- Replicate SDK – Python/JS client for 50,000+ hosted ML models. Pay-per-second, no GPU management. Docs | SDK: Python (pip install replicate), Node (npm install replicate)
- fal.ai SDK – Python and JS SDKs for serverless inference. Also a Vercel AI SDK provider. Docs | SDK: Python (pip install fal-client), Node (npm install @fal-ai/client)
- OpenAI SDK – Official SDK for GPT Image generation and editing. client.images.generate() and client.images.edit(). SDK: Python (pip install openai), Node (npm install openai)
Serverless and on-demand GPU platforms for running image models.
- fal.ai (GPU) – Fastest diffusion inference engine. 1000+ hosted models. Docs
- Lambda Labs – On-demand A100 and H100 GPUs. Competitive pricing (~$1.10/hr A100 80GB). Docs
- Modal – Serverless Python GPU cloud. Sub-second cold starts. Docs | SDK: Python (pip install modal)
- Replicate – Serverless model hosting for open-source image models. Docs
- RunPod – GPU pods and serverless endpoints. 48% of serverless cold starts under 200ms. Docs
- Together AI – Inference API for 200+ open models. Docs
- WaveSpeed AI – Serverless inference platform with 700+ image and video models. Sub-second cold starts for FLUX and other diffusion models. OpenAI-compatible REST API. Docs | SDK: Python, JS
Object stores and CDNs suited to generated-image workloads.
- Backblaze B2 – S3-compatible object storage at low cost. Free egress via Cloudflare. Docs | B2 integration
- Cloudflare Images – Image CDN on Cloudflare's global network. Pre-defined variants for transformations.
- Cloudinary – Enterprise image/video CDN with AI-powered transformations. Docs | SDK: Python, Node, Ruby, PHP, Java, .NET
- Imgix – Real-time image processing CDN. URL-parameter-based transforms. Connects to existing S3/GCS storage. Docs
Metrics, leaderboards, and quality tooling for generated images.
- pytorch-fid – PyTorch FID (Fréchet Inception Distance) implementation. Measures distribution similarity between real and generated images. SDK: Python (pip install pytorch-fid)
- IQA-PyTorch – Comprehensive image quality toolbox. PSNR, SSIM, LPIPS, FID, NIQE, MUSIQ, TOPIQ, NIMA, BRISQUE, and more.
- CLIP Score – Measures semantic alignment between text prompts and generated images using CLIP embeddings. Available via torchmetrics.multimodal.CLIPScore.
- ImageReward – First general-purpose human preference reward model for T2I (NeurIPS 2023). Trained on 137k expert comparison pairs. Docs
- torch-fidelity – High-fidelity ISC, FID, KID, and PRC metrics. Supports InceptionV3, CLIP, DINOv2, VGG16 feature extractors. Docs | SDK: Python (pip install torch-fidelity)
Reference implementations, demos, and starter projects.
- B2 Background Removal with Transformers.js – Browser-based background removal using Transformers.js with Backblaze B2 storage. B2 integration
- B2 Image Generation Prompt Flow – Image generation pipeline with prompt flow and Backblaze B2 cloud storage integration. B2 integration
- HuggingFace Diffusers Examples – Official scripts for DreamBooth, LoRA fine-tuning, ControlNet training, and more.
- HuggingFace Spaces – Free hosting for Gradio and Streamlit ML demos. Thousands of image generation demos. Docs
- OpenAI Cookbook (GPT Image) – Official notebooks for image generation and editing with gpt-image-1.
- Replicate Text-to-Image Collection – Curated runnable models with inline API code examples.
Contributions are welcome. See CONTRIBUTING.md. One entry per PR — edit entries.yaml only and let the maintainers regenerate README.md.
Released under CC0 1.0 Universal. You may copy, modify, and redistribute without attribution.
Backblaze B2 Cloud Storage is S3-compatible object storage designed for AI and media workloads. This list is maintained as part of our work making B2 a convenient storage layer for AI workflows.