Image Manipulation with SAM and Stable Diffusion

An interactive image manipulation tool that combines Meta's Segment Anything Model (SAM) with Stable Diffusion for precise, AI-powered image editing. Click on any object in an image to segment it, then use natural language prompts to transform it into something new.

Features

Interactive Segmentation: Click on any object in an image to automatically segment it using SAM
AI-Powered Inpainting: Use text prompts to generate new content in the masked region
ControlNet Integration: Leverage semantic segmentation for better control over generation
Background Mode: Option to inpaint backgrounds instead of foreground objects
Real-time Preview: See masks and segmentation results instantly
Web Interface: User-friendly Gradio interface accessible via browser

How It Works

Architecture Overview

User Input Image → SAM Segmentation → Mask Generation → ControlNet + Stable Diffusion → Output Image

Technical Pipeline

Image Upload & Selection: User uploads an image and clicks on objects to edit
Segmentation (SAM):
- Converts point clicks into precise segmentation masks using Vision Transformer (ViT-H)
- Generates automatic semantic segmentation of the entire scene
Mask Processing: Creates boolean masks and colored segmentation maps
Inpainting (Stable Diffusion + ControlNet):
- ControlNet conditions generation using semantic segmentation
- Stable Diffusion inpaints masked regions based on text prompts
- UniPC scheduler optimizes generation (20 inference steps)
Output: Seamlessly edited image with natural blending

Key Components

Segment Anything Model (SAM)

Model: sam_vit_h_4b8939.pth (ViT-Huge backbone)
Function: Point-based segmentation and automatic mask generation
Output: Precise pixel-level masks and semantic segmentation maps

Stable Diffusion Inpainting

Model: runwayml/stable-diffusion-inpainting
Function: Generate new content in masked regions
Features: Text-guided, context-aware inpainting

ControlNet

Model: lllyasviel/sd-controlnet-seg
Function: Semantic control for stable generation
Benefit: Maintains structure and coherence in edits

Installation

1. Clone the Repository

git clone https://github.com/bhavyashah10/Image-Manipulation-SAM.git
cd Image-Manipulation-SAM

2. Set Up Environment

Option A: Using Conda (Recommended)

conda create -n sam-sd python=3.10
conda activate sam-sd

Option B: Using venv

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

Use requirements file:

pip install -r requirements.txt

or

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install gradio numpy pillow diffusers transformers accelerate segment-anything

4. Download SAM Model

Download the SAM checkpoint:

wget https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth

Or manually download from here and place in the project root.

5. Fix Device Configuration

Before running, fix the typo in app.py line 13:

# Change this:
device = "cp    `u"

# To this:
device = "cuda"  # or "cpu" if no GPU available

Usage

Running the Application

Gradio Web Interface (Recommended)

python app.py

The interface will launch at http://localhost:7860. For remote access, a public URL will be generated.

Jupyter Notebook

jupyter notebook Diffusion_with_sam.ipynb

Example Use Cases

Object Replacement

1. Upload: car.jpeg
2. Click: on the car
3. Prompt: "a yellow taxi cab"
4. Result: Car transformed into a taxi

Style Transfer

1. Upload: any image
2. Select: Background checkbox
3. Prompt: "sunset beach, golden hour"
4. Result: Background changed to beach scene

Object Modification

1. Upload: girl.png
2. Click: on clothing
3. Prompt: "wearing a red dress"
4. Result: Outfit changed to red dress

Project Structure

Image-Manipulation-SAM/
├── app.py                          # Main Gradio application
├── controlnet_inpaint.py           # Custom ControlNet inpainting pipeline
├── Diffusion_with_sam.ipynb        # Jupyter notebook demo
├── requirements.txt                # Python dependencies
├── README.md                       # Project documentation
└── Test Images/                    # Sample test images

Configuration

Generation Parameters

Modify inference steps for quality vs speed tradeoff:

# In app.py, inpaint function
output = pipe(
    prompt,
    image,
    mask,
    seg_img,
    negative_prompt=negative_prompt,
    num_inference_steps=20,  # Increase for better quality (20-50)
)

Image Resolution

Adjust processing resolution:

# In app.py, inpaint function
image = image.resize((512, 512))  # Change to (768, 768) for higher resolution

Troubleshooting

Common Issues

1. CUDA Out of Memory

# Solution 1: Use CPU
device = "cpu"

# Solution 2: Reduce image size
image = image.resize((384, 384))  # Smaller resolution

# Solution 3: Clear GPU cache
torch.cuda.empty_cache()

2. SAM Model Not Found

# Verify file exists
ls -lh sam_vit_h_4b8939.pth

# Re-download if needed
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

3. Slow Generation

Ensure GPU is being used: Check device = "cuda" in app.py
Close other GPU-intensive applications
Reduce num_inference_steps to 15-20

4. Poor Quality Results

Increase num_inference_steps to 30-50
Improve prompt specificity
Use detailed negative prompts
Try re-selecting the object with better clicks

5. Import Errors

# Reinstall dependencies
pip install --upgrade diffusers transformers accelerate

Technical Details

Memory Requirements

Recommended:

Run it on Google Collab

Optimal (for Local usage):

GPU: 12GB+ VRAM (for 768x768 images)
RAM: 32GB system memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Manipulation with SAM and Stable Diffusion

Features

How It Works

Architecture Overview

Technical Pipeline

Key Components

Installation

1. Clone the Repository

2. Set Up Environment

3. Install Dependencies

4. Download SAM Model

5. Fix Device Configuration

Usage

Running the Application

Example Use Cases

Project Structure

Configuration

Generation Parameters

Image Resolution

Troubleshooting

Common Issues

Technical Details

Memory Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Test Images		Test Images
.DS_Store		.DS_Store
.gitignore		.gitignore
Diffusion_with_sam.ipynb		Diffusion_with_sam.ipynb
README.md		README.md
app.py		app.py
controlnet_inpaint.py		controlnet_inpaint.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Image Manipulation with SAM and Stable Diffusion

Features

How It Works

Architecture Overview

Technical Pipeline

Key Components

Installation

1. Clone the Repository

2. Set Up Environment

3. Install Dependencies

4. Download SAM Model

5. Fix Device Configuration

Usage

Running the Application

Example Use Cases

Project Structure

Configuration

Generation Parameters

Image Resolution

Troubleshooting

Common Issues

Technical Details

Memory Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages