Skip to content

bhavyashah10/Image-Manipulation-SAM

Repository files navigation

Image Manipulation with SAM and Stable Diffusion

An interactive image manipulation tool that combines Meta's Segment Anything Model (SAM) with Stable Diffusion for precise, AI-powered image editing. Click on any object in an image to segment it, then use natural language prompts to transform it into something new.

Features

  • Interactive Segmentation: Click on any object in an image to automatically segment it using SAM
  • AI-Powered Inpainting: Use text prompts to generate new content in the masked region
  • ControlNet Integration: Leverage semantic segmentation for better control over generation
  • Background Mode: Option to inpaint backgrounds instead of foreground objects
  • Real-time Preview: See masks and segmentation results instantly
  • Web Interface: User-friendly Gradio interface accessible via browser

How It Works

Architecture Overview

User Input Image → SAM Segmentation → Mask Generation → ControlNet + Stable Diffusion → Output Image

Technical Pipeline

  1. Image Upload & Selection: User uploads an image and clicks on objects to edit
  2. Segmentation (SAM):
    • Converts point clicks into precise segmentation masks using Vision Transformer (ViT-H)
    • Generates automatic semantic segmentation of the entire scene
  3. Mask Processing: Creates boolean masks and colored segmentation maps
  4. Inpainting (Stable Diffusion + ControlNet):
    • ControlNet conditions generation using semantic segmentation
    • Stable Diffusion inpaints masked regions based on text prompts
    • UniPC scheduler optimizes generation (20 inference steps)
  5. Output: Seamlessly edited image with natural blending

Key Components

Segment Anything Model (SAM)

  • Model: sam_vit_h_4b8939.pth (ViT-Huge backbone)
  • Function: Point-based segmentation and automatic mask generation
  • Output: Precise pixel-level masks and semantic segmentation maps

Stable Diffusion Inpainting

  • Model: runwayml/stable-diffusion-inpainting
  • Function: Generate new content in masked regions
  • Features: Text-guided, context-aware inpainting

ControlNet

  • Model: lllyasviel/sd-controlnet-seg
  • Function: Semantic control for stable generation
  • Benefit: Maintains structure and coherence in edits

Installation

1. Clone the Repository

git clone https://github.com/bhavyashah10/Image-Manipulation-SAM.git
cd Image-Manipulation-SAM

2. Set Up Environment

Option A: Using Conda (Recommended)

conda create -n sam-sd python=3.10
conda activate sam-sd

Option B: Using venv

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

Use requirements file:

pip install -r requirements.txt

or

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install gradio numpy pillow diffusers transformers accelerate segment-anything

4. Download SAM Model

Download the SAM checkpoint:

wget https://huggingface.co/spaces/abhishek/StableSAM/blob/main/sam_vit_h_4b8939.pth

Or manually download from here and place in the project root.

5. Fix Device Configuration

Before running, fix the typo in app.py line 13:

# Change this:
device = "cp    `u"

# To this:
device = "cuda"  # or "cpu" if no GPU available

Usage

Running the Application

Gradio Web Interface (Recommended)

python app.py

The interface will launch at http://localhost:7860. For remote access, a public URL will be generated.

Jupyter Notebook

jupyter notebook Diffusion_with_sam.ipynb

Example Use Cases

Object Replacement

1. Upload: car.jpeg
2. Click: on the car
3. Prompt: "a yellow taxi cab"
4. Result: Car transformed into a taxi

Style Transfer

1. Upload: any image
2. Select: Background checkbox
3. Prompt: "sunset beach, golden hour"
4. Result: Background changed to beach scene

Object Modification

1. Upload: girl.png
2. Click: on clothing
3. Prompt: "wearing a red dress"
4. Result: Outfit changed to red dress

Project Structure

Image-Manipulation-SAM/
├── app.py                          # Main Gradio application
├── controlnet_inpaint.py           # Custom ControlNet inpainting pipeline
├── Diffusion_with_sam.ipynb        # Jupyter notebook demo
├── requirements.txt                # Python dependencies
├── README.md                       # Project documentation
└── Test Images/                    # Sample test images

Configuration

Generation Parameters

Modify inference steps for quality vs speed tradeoff:

# In app.py, inpaint function
output = pipe(
    prompt,
    image,
    mask,
    seg_img,
    negative_prompt=negative_prompt,
    num_inference_steps=20,  # Increase for better quality (20-50)
)

Image Resolution

Adjust processing resolution:

# In app.py, inpaint function
image = image.resize((512, 512))  # Change to (768, 768) for higher resolution

Troubleshooting

Common Issues

1. CUDA Out of Memory

# Solution 1: Use CPU
device = "cpu"

# Solution 2: Reduce image size
image = image.resize((384, 384))  # Smaller resolution

# Solution 3: Clear GPU cache
torch.cuda.empty_cache()

2. SAM Model Not Found

# Verify file exists
ls -lh sam_vit_h_4b8939.pth

# Re-download if needed
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

3. Slow Generation

  • Ensure GPU is being used: Check device = "cuda" in app.py
  • Close other GPU-intensive applications
  • Reduce num_inference_steps to 15-20

4. Poor Quality Results

  • Increase num_inference_steps to 30-50
  • Improve prompt specificity
  • Use detailed negative prompts
  • Try re-selecting the object with better clicks

5. Import Errors

# Reinstall dependencies
pip install --upgrade diffusers transformers accelerate

Technical Details

Memory Requirements

Recommended:

  • Run it on Google Collab

Optimal (for Local usage):

  • GPU: 12GB+ VRAM (for 768x768 images)
  • RAM: 32GB system memory

About

Prompt-based image editing tool that allows users to modify images by removing, replacing, or editing objects and backgrounds.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors