- Please download the data and the caption json file from the google drive here.
This code was modified from the original Kubric with an additional transparent material.
- The Kubric data used to train our models can also be found here.
cd ./datasets
pip install -e .
python effect-removal-synthesis.py \
--seed 43 \
--output_dir ./train-casper/kubric \
--video_length 80 \
--mask_mode trimask
- You may use a smaller number for
--video_lengthto quickly test if the installation is successful. - The outputs will be saved in
./train-casper/kubric:rgb_full: the input of the model (before object effect removal)mask: the input mask. 1 for the preservation and 0 for the removal. And 0.5 for the background if themask_modeis trimask.rgb_removed: the target output of the object-effect removal.
Please note that the current transparent material still cannot handle a few complex objects, causing some uncleaned pixels remaining in the target removal outputs. Some manual filtering may require to remove such cases.
You may use a simple iteration script to run the above generation command with different random seeds.
You may use caption_videos.py to get the input text prompts for the generated kubric videos.
Due to the license, please download a video dataset on your own (e.g., YouTube Video Object Segmentation). Once you have your own set of videos, you will need to obtain masks (e.g., via segmenters such as SAM2.) and captions.
You may use the following script to generate the captions. The scripts is modified from here (Thanks to the authors!)
The source videos and masks should be
object-paste
├─ videos
│ ├─ XXXXX.mp4
│ └─ YYYYY.mp4
├─ masks
│ ├─ XXXXX.mp4
│ └─ YYYYY.mp4
└─ caption.json
python generate_batch \
--source_rootdir ./train-casper/object-paste \
--output_dir OUTPUT_DIR \
--num_tuples 1
- set
--num_tuplesas a larger number to generate more examples. - The outputs will be saved in
./train-casper/object-paste:rgb_full: the input of the model (before object effect removal)mask: the input mask. 1 for the preservation and 0 for the removal. And 0.5 for the background if themask_modeis trimask.rgb_removed: the target output of the object-effect removal.
Put all geneated categories under the same directory (e.g., ./train-casper):
./train-casper/
├── kubric/
├── object-paste/
├── omnimatte+tripod/
├── kubric.json
├── object-paste.json
└── omnimatte+tripod.json
The json files are generated captions for the categories.
cd datasets
python merge_training_data.py --input_rootdir ./train-casper --output_json casper.json
-
Balancing the categories:
- Please adjust the repeat times
--repeat_omnimatte_tripod,--repeat_kubric, and--repeat_object_paste($r_{ot}$ ,$r_{kubric}$ ,$r_{op}$ , respectively) according to your needs. - Supppose you have
$N_{kubric}$ Kubric videos,$N_{op}$ ObjectPaste videos, along with our released 46 omnimatte and tripod videos,- Total number of instances
$N_{total}$ = 46 *$r_{ot}$ + $N_{kubric} *$r_{kubric}$ +$N_{op}$ *$r_{op}$ - the weight of each category is:
- Omnimatte+tripod: 46 *
$r_{ot}$ /$N_{total}$ - Kubric:
$N_{kubric}$ *$r_{kubric}$ /$N_{total}$ - Object Paste:
$N_{op}$ *$r_{op}$ /$N_{total}$
- Omnimatte+tripod: 46 *
- Total number of instances
- In our paper, we make the training set contain 50% omnimatte+tripod, 48% kubric, and 2% object-paste.
- Please adjust the repeat times
-
Replace the output json path for in the training scripts of
scripts/xxxxx/train_casper.sh. Please check Casper Training for the details.