depth_anything_v3_deepstream: Real-time Depth Anything V3 with NVIDIA DeepStream

Real-time monocular depth estimation using Depth Anything V3 as a production DeepStream 8.0 / TensorRT pipeline. Supports USB cameras, video files, images, and RTSP streams on desktop GPUs and NVIDIA Jetson.

Features

Real-time inference: Full TensorRT FP16 pipeline via DeepStream 8.0
Multiple models: DA3METRIC-LARGE (metric depth), DA3MONO-LARGE, DA3-SMALL/BASE/LARGE-1.1
GPU-accelerated visualization: Percentile-normalized inferno colormap via CUDA
Side-by-side view: --show-original tiles the original frame alongside the depth map
Flexible input: Camera (V4L2), video file, image (JPEG/PNG), RTSP stream, generic URI

Benchmarks

Measured on RTX 5070 Laptop GPU (8 GiB), DeepStream 8.0, TensorRT FP16, 504×504, batch size 1.

Model	Params	FPS	GPU Mem
DA3-SMALL	34M	30	522 MiB
DA3-BASE	135M	30	744 MiB
DA3-LARGE-1.1	411M	30	1312 MiB
DA3METRIC-LARGE	334M	30	1208 MiB
DA3MONO-LARGE	334M	30	1208 MiB

Installation

Dependencies

Install CUDA Toolkit

Follow the NVIDIA CUDA installation guide.
Install DeepStream SDK

Download and install from NVIDIA DeepStream.
Install GStreamer development libraries

Follow the GStreamer installation guide.

Model Export

Export a DA3 checkpoint to ONNX using the script in export/. See export/README.md for the conda environment setup, required source patches, and per-model export commands.

Exported ONNX files go to models/. On first run DeepStream builds the TensorRT FP16 engine and caches it alongside the ONNX.

Switching Models

Edit configs/config_infer_depth_anything_v3.txt to point at a different model:

onnx-file=/depth_anything_v3_deepstream/models/DA3-SMALL.onnx
model-engine-file=/depth_anything_v3_deepstream/models/DA3-SMALL.onnx_b1_gpu0_fp16.engine

Build

cd depth_anything_v3_deepstream
mkdir -p build && cd build
cmake ..
make -j$(nproc)

Docker

Docker support with NVIDIA Container Toolkit is available for simplified deployment.

Prerequisites

Install the NVIDIA Container Toolkit on the host machine.

Build and Run

xhost +local:docker   # grant display access to the container
docker compose build
docker compose up

Access the container:

docker exec -it depth_anything_v3_deepstream bash

Usage

Build and run from inside the container (or natively after following the Installation steps):

cd /depth_anything_v3_deepstream/depth_anything_v3_deepstream/build

USB camera:

./depth_anything_v3_deepstream --source-type camera --source-uri /dev/video0

Video file:

./depth_anything_v3_deepstream --source-type file --source-uri /path/to/video.mp4

Image:

./depth_anything_v3_deepstream --source-type image --source-uri /path/to/image.jpg

RTSP stream:

./depth_anything_v3_deepstream --source-type rtsp \
    --source-uri rtsp://192.168.1.100:8554/stream

Side-by-side (original + depth):

./depth_anything_v3_deepstream --source-type file \
    --source-uri /path/to/video.mp4 --show-original true

Options

Flag	Description	Default
`--source-type`	`camera`, `file`, `image`, `rtsp`, `uri`	`camera`
`--source-uri`	Device path, file path, or stream URL	`/dev/video0`
`--framerate`	Target frame rate (camera only)	`30`
`--config`	Path to nvinfer config file	see `configs/`
`--show-original`	Tile original frame alongside depth	`false`
`--debug`	Print pipeline string and emit DOT file	`false`
`--dot-file`	Path for pipeline DOT file	`./pipeline.dot`

Architecture

Single-model pipeline (default):

Source → nvstreammux → nvinfer(DA3) → depth_probe → nveglglessink

Side-by-side pipeline (--show-original true):

Source → nvstreammux → tee
           ├─ original → nvvideoconvert(NVMM RGBA) ──────────────────┐
           └─ nvinfer(DA3) → depth_probe → nvvideoconvert(NVMM RGBA) ┤
                                           nvstreammux → nvmultistreamtiler → nveglglessink

DA3 is a standalone encoder-decoder — the full model runs in a single nvinfer element. The pad probe extracts the depth tensor from NvDsInferTensorMeta and applies percentile normalization and inferno colormap rendering directly to the display surface via a CUDA kernel.

The raw model output can be converted to metric depth in meters (DA3METRIC-LARGE only):

metric_depth_m = focal_pixels × raw_output / 300.0

where focal_pixels = (fx + fy) / 2 from the camera intrinsic matrix.

License

Code: Apache-2.0
Depth Anything V3: see upstream license
NVIDIA DeepStream SDK: NVIDIA proprietary

References

Depth Anything V3
ros2-depth-anything-v3-trt — ONNX export script
NVIDIA DeepStream SDK
dinov3_deepstream — related multi-task pipeline by the same author

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
depth_anything_v3_deepstream		depth_anything_v3_deepstream
export		export
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

depth_anything_v3_deepstream: Real-time Depth Anything V3 with NVIDIA DeepStream

Features

Benchmarks

Installation

Dependencies

Model Export

Switching Models

Build

Docker

Prerequisites

Build and Run

Usage

Options

Architecture

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

depth_anything_v3_deepstream: Real-time Depth Anything V3 with NVIDIA DeepStream

Features

Benchmarks

Installation

Dependencies

Model Export

Switching Models

Build

Docker

Prerequisites

Build and Run

Usage

Options

Architecture

License

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages