stechdrive-3dgs-utils

v1.16.0

A Windows-first integrated GUI tool for turning 360° camera video into images, masks, and camera data that are practical for 3D Gaussian Splatting (3DGS) training.

setup_windows.bat detects Python 3.12 and FFmpeg/FFprobe, installs missing system dependencies through winget when needed, creates a virtual environment, and installs the required runtime packages. Day-to-day launch is handled by run_gui.bat, so users do not need to run Python commands manually for the normal GUI workflow.

JP 日本語の説明

Forked from tetraface/tetraface-3dgs-utils.

What You Can Do

1. 360° Video to Metashape SfM and 3DGS Training

Extract equirectangular still frames from Insta360 / Osmo 360 or similar 360° camera video, review which frames to keep, and generate masks for people, the camera operator, tripods, sky, stitch seams, and blown-out highlights before running SfM in Metashape.

After Metashape SfM, export cubemap images, masks, and transforms.json for Postshot, Brush, and LichtFeld Studio. For LichtFeld Studio 3DGUT workflows, the app can also create a direct dataset that keeps the equirectangular images and masks in place while writing transforms.json and pointcloud.ply. This is the main workflow for preparing 360° video as a 3DGS training dataset.

2. 360° Video to SphereSfM, LichtFeld 3DGUT, or Cubemap Data

You can skip Metashape and run spherical SfM directly on the extracted equirectangular images with SphereSfM's COLMAP build. From that result, the GUI can write either LichtFeld 3DGUT data or cubemap data under output/ for Postshot, Brush, or LichtFeld.

3. 360° Video to COLMAP Rig Dataset

You can also skip Metashape and export a COLMAP Rig cubemap dataset from extracted 360° frames. The GUI can optionally run COLMAP so the result is ready to pass to COLMAP-compatible 3DGS tools.

4. Mask Preprocessing for Normal Photos or Video Frames

For video or image sequences from DSLR, mirrorless, smartphone, or other normal cameras, Step 3 can generate fast YOLO/SAM2.1 masks for people, vehicles, and other selectable object types, higher-accuracy SAM3.1 prompt masks for people and sky, optional Mask2Former sky masks, plus overexposure masks. This is useful as a mask-preparation stage before sending images to SfM software.

Highlights

Extract 360° video into still frames that are practical for SfM and 3DGS training. The GUI can thin footage for walking shots or aerial/distant scenes, and it marks frames that may need review because they are blurry, too similar, or contain a large viewpoint change.
Review extracted frames in a large single-image view or a thumbnail list, then mark unwanted frames as keep/drop decisions. For 360° images, the 90° FOV perspective view lets you inspect details in a normal-camera-like view.
Generate masks for people, the camera operator, tripods, hands, vehicles, sky, blown-out highlights, and stitch seams. Use YOLO/SAM2.1 when you want fast person-focused masks, or SAM3.1 when you want higher-accuracy people and sky masks plus prompt-based cleanup after generation.
Preview mask results before saving and inspect them in the thumbnail list. When only a few frames have misses or false detections, regenerate just those frames instead of rerunning the whole image set.
With SAM3.1, add missed targets such as tripods or subtract false detections such as signs and logos from existing masks. This reduces the amount of manual mask painting needed after the first pass.
Mask2Former remains available as a helper option when you want to try sky masks without setting up SAM3.1.
Use the same mask-preparation workflow for normal-camera video after Step 1 extraction and for normal photo or image-sequence sets, not only 360° images. This is useful before sending images to SfM software.
Import Metashape SfM results and export cubemap images, masks, and transforms.json for Postshot, Brush, and LichtFeld Studio. For LichtFeld Studio, the GUI can also create a 3DGUT (LichtFeld) direct dataset without cubemap conversion.
If you print and place AprilTags before capture, the Step 4 Scale tab can estimate metric scale from an existing Cubemap output. After reviewing the estimate, you can apply the same scale to output/transforms.json and output/pointcloud.ply.
Select SphereSfM's colmap.exe to run spherical SfM without Metashape, then convert the result into either LichtFeld 3DGUT data or cubemap data.
Skip Metashape when needed by exporting COLMAP Rig cubemap images and masks from extracted 360° frames. The GUI can optionally continue into COLMAP SfM processing.
Prepare the Windows environment with setup scripts that handle Python, FFmpeg/FFprobe, and the main Python packages. Normal use starts from run_gui.bat.

Easy Setup

For a normal release ZIP, extract it and run:

setup_windows.bat
run_gui.bat

The first setup_windows.bat run can take a while. It checks Python 3.12, FFmpeg/FFprobe, GPU-oriented Python packages, and prepares missing pieces where it can.

Python packages are installed into a virtual environment dedicated to this app, so your everyday Python environment is less likely to be affected. After setup completes, normal use is just running run_gui.bat to launch the GUI.

What Setup Does Internally

setup_windows.bat looks for Python 3.12 and FFmpeg/FFprobe and can install missing system dependencies through winget when needed. It then creates this app's dedicated virtual environment under .venv/, installs packages such as PyTorch CUDA wheels, OpenCV, Pillow, Open3D, ultralytics, PySide6, and the SAM3.1 runtime, and verifies the environment.

Python packages are kept inside .venv/, so they are not normally installed into the system-wide Python environment or other projects. .venv/ is an internal working directory, and you usually do not need to edit it manually.

Updating or Rebuilding the Environment

This is usually unnecessary. To update an existing environment to the latest compatible package set, run:

update_venv.bat

To rebuild with the pinned verified package set from requirements/, run update_venv.bat --locked. To recreate the environment from scratch, run setup_windows.bat --force.

YOLO/SAM2, Mask2Former, and SAM3.1 model weights may be downloaded on first use. Local YOLO/SAM weights can be placed under models/ultralytics/; local Mask2Former weights can be placed under models/mask2former-swin-large-ade-semantic/; SAM3.1 prompt masking uses models/sam3.1/sam3.1_multiplex.pt. Release ZIP assets do not include model weights, generated scene data, user settings, or local setup logs. These third-party libraries and model weights are governed by separate license terms; see THIRD_PARTY_LICENSES.md.

Mask Generation Model Guide

Use YOLO/SAM2.1 when you want fast person-only masks.
Use SAM3.1 when you want the highest practical accuracy for people or sky. Because it is prompt-controlled, you can add missed targets after generation or subtract false detections.
Use Mask2Former when you want to try sky masks without setting up SAM3.1.

SAM3.1 Prompt Masks

setup_windows.bat installs the SAM3.1 runtime package, but the checkpoint is not bundled because access requires your Hugging Face account and SAM License acceptance.

This app uses the official facebook/sam3.1 sam3.1_multiplex.pt checkpoint. SAM3.1 is a CUDA-GPU-oriented model. Running it on an NVIDIA GPU environment is recommended.

If GPU memory runs out during SAM3.1 batch processing, completed masks remain saved. Rerun with the same settings to resume from unfinished images.

When mask accuracy is the priority, SAM3.1 is recommended over YOLO/SAM2.1. Use SAM3.1 when you want more accurate prompt-controlled masks, especially for sky masks or targeted cleanup. After generating masks once, you can select only the images that need correction and use SAM3.1 prompts to add missed regions such as tripod, hand, selfie stick, or cell phone, or subtract false detections such as male icon, female icon, logo, or sign.

Create or sign in to a Hugging Face account.
Open Meta's facebook/sam3.1 Hugging Face repository and request access/accept the SAM License. Hugging Face gated model requests are tied to an individual user account and may require sharing your username/email with the model author.
- Hugging Face gated models can use automatic or manual approval. If you can open the Files tab or download sam3.1_multiplex.pt from facebook/sam3.1 in the browser after accepting the terms, your account already has access and you do not need to wait for an email reply. If the page shows a pending/approval-waiting state, wait for the model author approval.
Create a Hugging Face access token from your account settings.
- App downloads require a Read token created by the same Hugging Face account that has access. Browser login state is not used by this app.
- Copy the token value immediately after creating it. Hugging Face may not show existing token values again from the token list. If you missed the value, create a new Read token or use Invalidate and refresh to issue a new value. Refreshing invalidates the old token.
- Treat access tokens as secrets equivalent to passwords. Do not paste them into README files, issues, chats, screenshots, or logs. Read permission is enough for downloading the SAM3.1 checkpoint. Prefer creating a dedicated token for SAM3.1, and delete or refresh it from Hugging Face settings when you no longer need it.
In Step 3, choose SAM3.1. If models/sam3.1/sam3.1_multiplex.pt is missing, the app asks for the token and downloads the checkpoint. The token is passed only to that download request. The app does not save the token for automatic reuse and does not write it to app settings, the scene folder, or execution logs. This reduces the risk of a token leaking from local files or being reused unintentionally. Enter a token again if you need to download the checkpoint again.

You can also place the checkpoint manually at models/sam3.1/sam3.1_multiplex.pt.

GUI Workflow

If the scene folder path contains non-ASCII characters, an extremely long path, control characters, or ", the GUI stops before running. These paths are likely to fail in OpenCV or external 3DGS/SfM tools. Spaces and OneDrive paths are not blocked by themselves. Use a short ASCII working path, for example D:\work\scene01.

360° video or images
  -> Step 1: frame extraction
  -> Step 2: frame review and keep/drop decisions
  -> Step 3: mask generation
  -> Step 4: convert
      -> build 3DGS-ready outputs from Metashape SfM results
      -> run SphereSfM on 360° images and convert to 3DGUT or cubemap data
      -> export COLMAP Rig cubemap images and optionally run COLMAP
  -> Step 5: training
      -> launch LichtFeld Studio / Postshot / custom CLI with an existing dataset

Step	Purpose	Current Default
1. Frame Extraction	Extract equirectangular still frames from 360° video	Fixed interval + motion adjustment
2. Frame Review	Review extracted frames in single/thumbnail views and apply keep/drop decisions to CSV	Review low-quality candidates and unwanted frames
3. Mask Generation	Generate model-based masks plus optional stitch seam, overexposure, and custom masks	YOLO/SAM2.1, High quality
4. Convert	Create 3DGS datasets from SfM results, run SphereSfM, or export COLMAP Rig cubemap images	Metashape / SphereSfM / LichtFeld / 3DGUT / Cube6
5. Training	Launch an external 3DGS application with an existing dataset	LichtFeld Studio / Postshot / Custom

Detailed GUI docs:

Step	Docs
Step 1 Frame Extraction	EN / JP
Step 2 Frame Review	EN / JP
Step 3 Mask Generation	EN / JP
Step 4 Convert	EN / JP
Step 5 Training	EN / JP
Scene Import	EN / JP

Recommended Workflow: Metashape Route

Prepare 360° video from an Insta360 / Osmo 360 or similar camera.
Extract SfM-friendly frames in Step 1.
Review low-quality or unnecessary frames in Step 2.
Generate masks for people, camera operators, tripods, sky, or similar SfM-unfriendly regions in Step 3. Quality: High is the recommended starting point.
If masks still leak through, switch only the affected images to Quality: Best or regenerate them with SAM3.1. Mask2Former is also available when you want to try sky masks without setting up SAM3.1.
Enable stitch seam, overexposure, and custom masks when they match the source material.
Import the generated masks/ folder into Metashape as per-image masks, then run SfM.
Use Step 4 with the Metashape XML/PLY result to export cubemap training data or a direct 3DGUT (LichtFeld) dataset.
To estimate scale with AprilTags, print and place the tags before capture. After creating Cubemap output, open the Scale tab, enter the printed tag size and IDs, run estimation, and use Apply to Scale only when the result looks reasonable. This updates output/transforms.json and output/pointcloud.ply. Direct equirectangular output for 3DGUT cannot be estimated here.
When needed, use Step 5 to launch LichtFeld Studio or Postshot CLI with the dataset you just created.

COLMAP Route

Use Steps 1-3 in the same way as the Metashape route.
In Step 4, choose COLMAP to write cubemap images and masks to output/colmap_rig/.
Turn on the left SfM sub-stage when you want COLMAP to estimate camera positions and a sparse point cloud. COLMAP SfM needs cubemap images, so turning on SfM also turns on Cube.
After completion, pass output/colmap_rig/ as the COLMAP project folder to COLMAP-compatible 3DGS tools.

SphereSfM Route

Use Steps 1-3 in the same way as the Metashape route. Prepare images/ and, when used, masks/.
In Step 4, choose SphereSfM and select SphereSfM's colmap.exe from a json87/SphereSfM release or local build. Standard COLMAP cannot be used because it lacks the spherical-image SfM features.
On RTX 50-series GPUs, the GitHub-distributed binary can stop during CUDA SIFT. For RTX 50-series systems, build SphereSfM locally with CMAKE_CUDA_ARCHITECTURES=120 and select that colmap.exe.
Start with both left sub-stages, SfM and Cube, turned on, plus Matcher: Sequential and SfM Quality: Standard.
In Output Shape, choose whether to create LichtFeld 3DGUT data or cubemap data for Postshot, Brush, or LichtFeld.
After completion, output/ is the dataset passed to downstream apps for both 3DGUT and cubemap output. SphereSfM working files and logs stay under output/spheresfm/.

Mask Preprocessing for Normal Images

For normal-camera video from DSLR, mirrorless, smartphone, or similar cameras, extract frames in Step 1. For existing image sequences, place them in images/ or use the + icon on the Step 3 Images Folder row to copy them into the scene. Step 3 detects the image type from Step 1 records, external image registration, or image headers. Normal images keep model-based masking and overexposure masking available while disabling stitch seam masking and 360° pole projection assist.

Use this when you want to exclude people, vehicles, blown-out regions, or similar areas before importing images into SfM software.

Mask Tuning Notes

Start with Quality: High.
Use Quality: Standard for faster test runs.
If people leak through, try Quality: Best or raise Expand slightly.
Quality: Best prioritizes accuracy and takes longer, so it is best used to regenerate only images where misses remain.
When you find a miss in preview, adjust settings and use Regenerate Mask to save only that image back to masks/ using the current model and enabled extra masks. In thumbnail mode, use Ctrl / Shift selection to regenerate multiple selected images together. SAM3.1 can also add or subtract prompt detections against existing saved masks.
Stitch seam masks are useful when the seam position is stable in the equirectangular image. If FlowState stabilization, direction lock, AI stitching, or similar processing moves the seam, verify it in the preview before using it.

Requirements

Windows 10/11
Python 3.12 (3.12.10 confirmed)
CUDA-capable GPU
CUDA Toolkit 12.8
FFmpeg / FFprobe (setup_windows.bat installs Gyan.FFmpeg through winget when missing)

Main Python packages resolved by setup_windows.bat:

torch / torchvision / torchaudio from the CUDA 12.8 wheel index
numpy, opencv-python, Pillow, open3d, ultralytics, transformers, safetensors, tqdm, PySide6, sam3

setup_windows.bat uses the pinned verified package set under requirements/ for reproducible first-time setup. update_venv.bat resolves the latest compatible packages by default; pass --locked when you want to rebuild from the pinned set instead.

CLI Tools

The GUI wraps these CLI engines, which can also be used directly. The root-level scripts are stable public entry points; shared implementation code lives under core/.

Script	Purpose	Docs
`extract_frames.py`	Extract frames from 360° video	EN
`apply_frame_decisions.py`	Apply keep/drop decisions from CSV	EN
`review_frames.py`	Frame review GUI	EN
`yolo_mask.py`	YOLO+SAM2.1 mask generation	EN
`sky_mask.py`	Semantic mask generation with Mask2Former ADE20K labels or SAM3.1 prompts	EN
`stitch_mask.py`	Stitch seam mask generation	EN
`overexposure_mask.py`	Overexposure mask generation	EN
`custom_mask.py`	AND-merge a user-provided PNG mask	EN
`cubemap_transforms_json.py`	Convert equirectangular images to cubemap views	EN
`transforms_to_colmap.py`	Export COLMAP files from `transforms.json`	EN

License

MIT License. See LICENSE.

Mask generation features use third-party libraries and model weights with separate license terms. See THIRD_PARTY_LICENSES.md.

Original code by tetraface Inc. Fork extensions by stechdrive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stechdrive-3dgs-utils

What You Can Do

1. 360° Video to Metashape SfM and 3DGS Training

2. 360° Video to SphereSfM, LichtFeld 3DGUT, or Cubemap Data

3. 360° Video to COLMAP Rig Dataset

4. Mask Preprocessing for Normal Photos or Video Frames

Highlights

Easy Setup

What Setup Does Internally

Updating or Rebuilding the Environment

Mask Generation Model Guide

SAM3.1 Prompt Masks

GUI Workflow

Recommended Workflow: Metashape Route

COLMAP Route

SphereSfM Route

Mask Preprocessing for Normal Images

Mask Tuning Notes

Requirements

CLI Tools

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 557 Commits
core		core
devtools		devtools
doc		doc
gui		gui
images		images
models		models
requirements		requirements
scripts		scripts
tests		tests
vendor/metashape_360_lfs		vendor/metashape_360_lfs
.gitignore		.gitignore
LICENSE		LICENSE
README.ja.md		README.ja.md
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
apply_frame_decisions.py		apply_frame_decisions.py
cubemap_transforms_json.py		cubemap_transforms_json.py
custom_mask.py		custom_mask.py
extract_frames.py		extract_frames.py
init_masks.py		init_masks.py
overexposure_mask.py		overexposure_mask.py
pyproject.toml		pyproject.toml
review_frames.py		review_frames.py
run_gui.bat		run_gui.bat
setup_windows.bat		setup_windows.bat
sky_mask.py		sky_mask.py
stitch_mask.py		stitch_mask.py
transforms_to_colmap.py		transforms_to_colmap.py
update_venv.bat		update_venv.bat
yolo_mask.py		yolo_mask.py

Folders and files

Latest commit

History

Repository files navigation

stechdrive-3dgs-utils

What You Can Do

1. 360° Video to Metashape SfM and 3DGS Training

2. 360° Video to SphereSfM, LichtFeld 3DGUT, or Cubemap Data

3. 360° Video to COLMAP Rig Dataset

4. Mask Preprocessing for Normal Photos or Video Frames

Highlights

Easy Setup

What Setup Does Internally

Updating or Rebuilding the Environment

Mask Generation Model Guide

SAM3.1 Prompt Masks

GUI Workflow

Recommended Workflow: Metashape Route

COLMAP Route

SphereSfM Route

Mask Preprocessing for Normal Images

Mask Tuning Notes

Requirements

CLI Tools

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages