Skip to content

Raessan/depth_anything_v3_deepstream

Repository files navigation

depth_anything_v3_deepstream: Real-time Depth Anything V3 with NVIDIA DeepStream

Real-time monocular depth estimation using Depth Anything V3 as a production DeepStream 8.0 / TensorRT pipeline. Supports USB cameras, video files, images, and RTSP streams on desktop GPUs and NVIDIA Jetson.

Demo

Features

  • Real-time inference: Full TensorRT FP16 pipeline via DeepStream 8.0
  • Multiple models: DA3METRIC-LARGE (metric depth), DA3MONO-LARGE, DA3-SMALL/BASE/LARGE-1.1
  • GPU-accelerated visualization: Percentile-normalized inferno colormap via CUDA
  • Side-by-side view: --show-original tiles the original frame alongside the depth map
  • Flexible input: Camera (V4L2), video file, image (JPEG/PNG), RTSP stream, generic URI

Benchmarks

Measured on RTX 5070 Laptop GPU (8 GiB), DeepStream 8.0, TensorRT FP16, 504×504, batch size 1.

Model Params FPS GPU Mem
DA3-SMALL 34M 30 522 MiB
DA3-BASE 135M 30 744 MiB
DA3-LARGE-1.1 411M 30 1312 MiB
DA3METRIC-LARGE 334M 30 1208 MiB
DA3MONO-LARGE 334M 30 1208 MiB

Installation

Dependencies

  1. Install CUDA Toolkit

    Follow the NVIDIA CUDA installation guide.

  2. Install DeepStream SDK

    Download and install from NVIDIA DeepStream.

  3. Install GStreamer development libraries

    Follow the GStreamer installation guide.

Model Export

Export a DA3 checkpoint to ONNX using the script in export/. See export/README.md for the conda environment setup, required source patches, and per-model export commands.

Exported ONNX files go to models/. On first run DeepStream builds the TensorRT FP16 engine and caches it alongside the ONNX.

Switching Models

Edit configs/config_infer_depth_anything_v3.txt to point at a different model:

onnx-file=/depth_anything_v3_deepstream/models/DA3-SMALL.onnx
model-engine-file=/depth_anything_v3_deepstream/models/DA3-SMALL.onnx_b1_gpu0_fp16.engine

Build

cd depth_anything_v3_deepstream
mkdir -p build && cd build
cmake ..
make -j$(nproc)

Docker

Docker support with NVIDIA Container Toolkit is available for simplified deployment.

Prerequisites

  1. Install the NVIDIA Container Toolkit on the host machine.

Build and Run

xhost +local:docker   # grant display access to the container
docker compose build
docker compose up

Access the container:

docker exec -it depth_anything_v3_deepstream bash

Usage

Build and run from inside the container (or natively after following the Installation steps):

cd /depth_anything_v3_deepstream/depth_anything_v3_deepstream/build

USB camera:

./depth_anything_v3_deepstream --source-type camera --source-uri /dev/video0

Video file:

./depth_anything_v3_deepstream --source-type file --source-uri /path/to/video.mp4

Image:

./depth_anything_v3_deepstream --source-type image --source-uri /path/to/image.jpg

RTSP stream:

./depth_anything_v3_deepstream --source-type rtsp \
    --source-uri rtsp://192.168.1.100:8554/stream

Side-by-side (original + depth):

./depth_anything_v3_deepstream --source-type file \
    --source-uri /path/to/video.mp4 --show-original true

Options

Flag Description Default
--source-type camera, file, image, rtsp, uri camera
--source-uri Device path, file path, or stream URL /dev/video0
--framerate Target frame rate (camera only) 30
--config Path to nvinfer config file see configs/
--show-original Tile original frame alongside depth false
--debug Print pipeline string and emit DOT file false
--dot-file Path for pipeline DOT file ./pipeline.dot

Architecture

Single-model pipeline (default):

Source → nvstreammux → nvinfer(DA3) → depth_probe → nveglglessink

Side-by-side pipeline (--show-original true):

Source → nvstreammux → tee
           ├─ original → nvvideoconvert(NVMM RGBA) ──────────────────┐
           └─ nvinfer(DA3) → depth_probe → nvvideoconvert(NVMM RGBA) ┤
                                           nvstreammux → nvmultistreamtiler → nveglglessink

DA3 is a standalone encoder-decoder — the full model runs in a single nvinfer element. The pad probe extracts the depth tensor from NvDsInferTensorMeta and applies percentile normalization and inferno colormap rendering directly to the display surface via a CUDA kernel.

The raw model output can be converted to metric depth in meters (DA3METRIC-LARGE only):

metric_depth_m = focal_pixels × raw_output / 300.0

where focal_pixels = (fx + fy) / 2 from the camera intrinsic matrix.

License

  • Code: Apache-2.0
  • Depth Anything V3: see upstream license
  • NVIDIA DeepStream SDK: NVIDIA proprietary

References

About

Real-time monocular depth estimation with Depth Anything V3 and NVIDIA DeepStream 8.0.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors