We propose a unified pipeline for generating photorealistic, animatable 3D avatars from unstructured image collections without known camera intrinsics or dense viewpoint coverage. Our system is built around a tight integration of three core components:
- NeRFtrinsic-inspired pose refinement, using Gaussian Fourier Features and a pose-focal MLP supervised by a NeRF rendering loss.
- Transformer-based Multi-Token Context Model (MTCM) for optimal view selection, trained with TinyNeRF supervision.
- TinyNeRF with attention-based refinement, consuming selected views to produce high-fidelity NeRF reconstructions.
The final outputs support mesh extraction, relighting, and NeRF-based texture synthesis for downstream animation and deployment.
- Pose Refinement: Initial coarse poses from MediaPipe are refined using
nerftrinsic_four, which learns camera extrinsics and intrinsics via NeRF supervision and Gaussian Fourier Features. - Multimodal Token Creation: DINOv2 visual embeddings are fused with refined poses, focal lengths, segmentation mask areas, and resolution into 394D multimodal tokens.
- View Selection: The MTCM transformer selects the most informative views using a learned selection mechanism optimized via TinyNeRF rendering loss.
- NeRF Rendering: Selected views are fed into a modified NeRF with attention-based refinement for novel view synthesis and volumetric rendering.
.
├── nerftrinsic_four/ # Forked and modified from NeRFtrinsic-Four (pose + focal learning)
├── mtcm_mae/ # MTCM transformer for view selection and pose prediction
├── nerf/ # TinyNeRF and weighted variants with attention refinement
├── fdnerf/ # FDNeRF for few shot avatar reconstruction. Deployment guide within sub-directory.
├── scripts/ # App runner (includes app.py web interface)
├── dataset_joint_mtcm_nerf.py
├── train_joint_mtcm_nerf.py
├── training_utils.py
├── environment.yml # Conda environment with pinned versions
├── requirements.txt # Additional pip packages
└── README.md
git clone git@github.com:satrajitghosh183/NeuralCharacterGeneration.git
cd NeuralCharacterGenerationconda env create -f environment.yml
conda activate neural-character-gen(Optional) Also install pip packages:
pip install -r requirements.txtWe recommend the Celebrity Face Dataset.
-
Run LLFF preprocessing:
python nerftrinsic_four/scripts/generate_llff.py --input_dir /path/to/raw/images
-
Train NeRFtrinsic module:
python nerftrinsic_four/tasks/train_gf.py --data_dir /path/to/llff_output
-
Extract poses and focal lengths:
python nerftrinsic_four/scripts/extract_nerftrinsic_poses.py --output_dir nerftrinsic_outputs/
-
Preprocess for MTCM-NeRF using the scripts shown in the screenshots (image embeddings, mask generation, pose alignment, etc.).
python train_joint_mtcm_nerf.py \
--data-dir path/to/preprocessed/data \
--output-dir experiments/ \
--batch-size 8 \
--num-epochs 50 \
--num-selected-views 5 \
--debugThis script:
- Loads multimodal tokens (DINOv2 + pose + focal + mask area + resolution)
- Predicts optimal views and poses via transformer
- Supervises with NeRF rendering loss for consistent reconstructions
To launch the demo or visual debugging app:
python scripts/app.py- Most of the code in
nerftrinsic_four/is adapted from the original NeRFtrinsic Four repository, with minor modifications to accept our data pipeline and produce pose+focal outputs compatible with token construction. - FDNeRF is used as architectural reference for attention-based refinement in the NeRF stage, although it is not fully integrated as-is.







