Skip to content

satrajitghosh183/NeuralCharacterGeneration

Repository files navigation

Neural Character Generation

We propose a unified pipeline for generating photorealistic, animatable 3D avatars from unstructured image collections without known camera intrinsics or dense viewpoint coverage. Our system is built around a tight integration of three core components:

  • NeRFtrinsic-inspired pose refinement, using Gaussian Fourier Features and a pose-focal MLP supervised by a NeRF rendering loss.
  • Transformer-based Multi-Token Context Model (MTCM) for optimal view selection, trained with TinyNeRF supervision.
  • TinyNeRF with attention-based refinement, consuming selected views to produce high-fidelity NeRF reconstructions.

The final outputs support mesh extraction, relighting, and NeRF-based texture synthesis for downstream animation and deployment.


🧠 Core Pipeline Overview

  1. Pose Refinement: Initial coarse poses from MediaPipe are refined using nerftrinsic_four, which learns camera extrinsics and intrinsics via NeRF supervision and Gaussian Fourier Features.
  2. Multimodal Token Creation: DINOv2 visual embeddings are fused with refined poses, focal lengths, segmentation mask areas, and resolution into 394D multimodal tokens.
  3. View Selection: The MTCM transformer selects the most informative views using a learned selection mechanism optimized via TinyNeRF rendering loss.
  4. NeRF Rendering: Selected views are fed into a modified NeRF with attention-based refinement for novel view synthesis and volumetric rendering.

📁 Project Structure

.
├── nerftrinsic_four/          # Forked and modified from NeRFtrinsic-Four (pose + focal learning)
├── mtcm_mae/                  # MTCM transformer for view selection and pose prediction
├── nerf/                      # TinyNeRF and weighted variants with attention refinement
├── fdnerf/                    # FDNeRF for few shot avatar reconstruction. Deployment guide within sub-directory.
├── scripts/                   # App runner (includes app.py web interface)
├── dataset_joint_mtcm_nerf.py
├── train_joint_mtcm_nerf.py
├── training_utils.py
├── environment.yml            # Conda environment with pinned versions
├── requirements.txt           # Additional pip packages
└── README.md

🔧 Setup Instructions

Step 1: Clone This Repo

git clone git@github.com:satrajitghosh183/NeuralCharacterGeneration.git
cd NeuralCharacterGeneration

Step 2: Set Up Conda Environment

conda env create -f environment.yml
conda activate neural-character-gen

(Optional) Also install pip packages:

pip install -r requirements.txt

📂 Dataset

We recommend the Celebrity Face Dataset.

Step-by-Step Processing:

  1. Run LLFF preprocessing:

    python nerftrinsic_four/scripts/generate_llff.py --input_dir /path/to/raw/images
  2. Train NeRFtrinsic module:

    python nerftrinsic_four/tasks/train_gf.py --data_dir /path/to/llff_output
  3. Extract poses and focal lengths:

    python nerftrinsic_four/scripts/extract_nerftrinsic_poses.py --output_dir nerftrinsic_outputs/
  4. Preprocess for MTCM-NeRF using the scripts shown in the screenshots (image embeddings, mask generation, pose alignment, etc.).


🏋️ Training

Run joint training (MTCM + NeRF):

python train_joint_mtcm_nerf.py \
  --data-dir path/to/preprocessed/data \
  --output-dir experiments/ \
  --batch-size 8 \
  --num-epochs 50 \
  --num-selected-views 5 \
  --debug

This script:

  • Loads multimodal tokens (DINOv2 + pose + focal + mask area + resolution)
  • Predicts optimal views and poses via transformer
  • Supervises with NeRF rendering loss for consistent reconstructions

🌐 Web Interface

To launch the demo or visual debugging app:

python scripts/app.py

📝 Notes

  • Most of the code in nerftrinsic_four/ is adapted from the original NeRFtrinsic Four repository, with minor modifications to accept our data pipeline and produce pose+focal outputs compatible with token construction.
  • FDNeRF is used as architectural reference for attention-based refinement in the NeRF stage, although it is not fully integrated as-is. image

image

image image image image image image

About

End to end pipeline to generate 3D realistic animitable characters from unstructured images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors