Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions examples/multiple_physics_pretraining/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
136 changes: 136 additions & 0 deletions examples/multiple_physics_pretraining/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Multiple Physics Pretraining (MPP)

This example integrates the [MPP](https://openreview.net/forum?id=DKSI3bULiZ) (Multiple Physics Pretraining) model into PaddleCFD.

Multiple Physics Pretraining is a pretraining strategy in which multiple sets of dynamics are jointly normalized and embedded into a single space for prediction. It uses an **AViT** (Axial Vision Transformer) architecture that learns multiple physics simultaneously through pretraining, enabling strong finetuning performance even across different physics domains.

Paper: "Multiple Physics Pretraining for Spatiotemporal Surrogate Models" (NeurIPS 2024)

Below are quick instructions on paddle, full readme please visit https://github.com/PolymathicAI/multiple_physics_pretraining

## Installation

```bash
pip install ppcfd
pip install wandb # optional
```

## Quick Start

### Import the model

```python
from ppcfd.models.multiple_physics_pretraining import AViT, build_avit
```

### Train (single device)

```bash
python train_basic.py --run_name my_experiment --config basic_config --yaml_config config/mpp_avit_ti_config.yaml
```

### Finetune from pretrained weights

Original PyTorch pretrained weights are available at:
https://drive.google.com/drive/folders/1Qaqa-RnzUDOO8-Gi4zlf4BE53SfWqDwx

To use them in PaddleCFD, first convert to PaddlePaddle format:

```bash
python convert_torch_weights.py \
--yaml_config config/mpp_avit_s_config.yaml \
--config basic_config \
--weights path/to/MPP_AViT_S.tar \
--output models_paddle/MPP_AViT_S.pdparams
```

Then finetune:

```bash
python train_basic.py --run_name my_finetune --config finetune --yaml_config config/mpp_avit_s_config.yaml
```

### Inference

If needed, use follow code to generate test input.

```bash
python multiple_physics_pretraining/generate_forward_case.py --output /tmp/case.npz --labels 0,1,2 --bcs 0,0 --output ./forward_case.npz
```

Then run a test forward case:

```bash
python forward_pretrained.py \
--yaml_config config/mpp_avit_s_config.yaml \
--config basic_config \
--weights path/to/checkpoint.pdparams \
--case_npz path/to/input.npz \
--output path/to/output.npz
```

## Model Variants

| Variant | embed_dim | num_heads | processor_blocks |
| --------- | --------- | --------- | ---------------- |
| Ti (Tiny) | 192 | 3 | 12 |
| S (Small) | 384 | 6 | 12 |
| B (Base) | 768 | 12 | 12 |
| L (Large) | 1024 | 16 | 24 |

Config files are provided in `config/` for each variant. Use the `basic_config` namespace for pretraining and `finetune` for finetuning.

## Directory Structure

```
examples/multiple_physics_pretraining/
├── config/ # YAML configuration files (Ti/S/B/L)
├── train_basic.py # Training script
├── forward_pretrained.py # Inference script
├── convert_torch_weights.py # PyTorch -> PaddlePaddle weight conversion
├── requirements.txt # Additional dependencies
├── LICENSE # MIT License
└── README.md # This file

ppcfd/models/multiple_physics_pretraining/
├── avit.py # AViT model definition
├── shared_modules.py # MLP, Attention, PositionBias
├── spatial_modules.py # AxialAttention, hMLP stem/output
├── time_modules.py # Temporal attention block
├── mixed_modules.py # SpaceTimeBlock combiner
├── DropPath_util.py # Stochastic depth
├── paddle_utils.py # PaddlePaddle utilities
├── utils/ # Training utilities
│ ├── YParams.py # YAML config parser
│ ├── logging_utils.py # Logging
│ ├── schedulers.py # LR scheduler
│ ├── adan_paddle.py # Adan optimizer
│ ├── dadapt_adam_paddle.py # DAdaptAdam optimizer
│ ├── dadapt_adan_paddle.py # DAdaptAdan optimizer
│ └── custom_optimizer_base.py # Optimizer base class
└── data_utils/ # Data loading
├── datasets.py # MixedDataset, dataset registry
├── hdf5_datasets.py # HDF5 dataset classes (SWE, NS, etc.)
└── mixed_dset_sampler.py # Multi-dataset sampler
```

## Adding Datasets

Datasets must return data in `(Batch, Time, Channel, H, W)` format and extend `BaseHDF5DirectoryDataset`. See `data_utils/hdf5_datasets.py` for examples.

1. Define your dataset class in `ppcfd/models/multiple_physics_pretraining/data_utils/hdf5_datasets.py`
2. Register it in `DSET_NAME_TO_OBJECT` in `datasets.py`
3. Add data paths to the config YAML file

## Citing

```bibtex
@inproceedings{
mccabe2024multiple,
title={Multiple Physics Pretraining for Spatiotemporal Surrogate Models},
author={Michael McCabe and Bruno R{\'e}galdo-Saint Blancard and others},
booktitle={NeurIPS},
year={2024},
url={https://openreview.net/forum?id=DKSI3bULiZ}
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
basic_config: &basic_config # Run settings
log_to_wandb: !!bool True # Use wandb integration
log_to_screen: !!bool True # Log progress to screen.
save_checkpoint: !!bool True # Save checkpoints
checkpoint_save_interval: 10 # Save every # epochs - also saves "best" according to val loss
debug_grad: !!bool True # Compute gradient/step_sizes/ect for debugging
true_time: !!bool False # Debugging setting - sets num workers to zero and activates syncs
num_data_workers: 6 # Generally pulling 8 cpu per process, so using 6 for DL - not sure if best ratio
enable_amp: !!bool False # Use automatic mixed precision - blows up with low variance fields right now
compile: !!bool False # Compile model - Does not currently work
gradient_checkpointing: !!bool False # Whether to use gradient checkpointing - Slow, but lower memory
exp_dir: "./" # Output path, modify as needed
log_interval: 1 # How often to log - Don't think this is actually implemented
pretrained: !!bool False # Whether to load a pretrained model
# wandb settings
project: "project"
group: "debugging"
entity: "entity"
# Training settings
drop_path: 0.1
batch_size: 1
max_epochs: 500
scheduler_epochs: -1
epoch_size: 2000 # Artificial epoch size
rescale_gradients: !!bool False # Activate hook that scales block gradients to norm 1
optimizer: "adan" # adam, adan, whatever else i end up adding - adan did better on HP sweep
scheduler: "cosine" # Only cosine implemented
warmup_steps: 1000 # Warmup when not using DAdapt
learning_rate: -1 # -1 means use DAdapt
weight_decay: 1e-3
n_states: 12 # Number of state variables across the datasets - Can be larger than real number and things will just go unused
state_names: ["Pressure", "Vx", "Vy", "Density", "Vx", "Vy", "Density", "Pressure"] # Should be sorted
dt: 1 # Striding of data - Not currently implemented > 1
n_steps: 16 # Length of history to include in input
enforce_max_steps: !!bool False # If false and n_steps > dataset steps, use dataset steps. Otherwise, raise Exception.
accum_grad: 5 # Real batch size is accum * batch_size, real steps/"epoch" is epoch_size / accum
# Model settings
model_type: "avit" # Only option so far
block_type: "axial" # Which type of block to use - if axial, next two fields must be set to define axial ops
time_type: "attention" # Conditional on block type
space_type: "axial_attention" # Conditional on block type
tie_fields: !!bool False # Whether to use 1 embedding per field per data
embed_dim: 1024 # Dimension of internal representation - 192/384/768/1024 for Ti/S/B/L
num_heads: 16 # Number of heads for attention - 3/6/12/16 for Ti/S/B/L
processor_blocks: 24 # Number of transformer blocks in the backbone - 12/12/12/24 for Ti/S/B/L
patch_size: [16, 16] # Actually currently hardcoded at 16
bias_type: "rel" # Options rel, continuous, none
# Data settings
train_val_test: [.8, .1, .1]
augmentation: !!bool False # Augmentation not implemented
use_all_fields: !!bool True # Prepopulate the field metadata dictionary from dictionary in datasets
tie_batches: !!bool False # Force everything in batch to come from one dset
extended_names: !!bool False # Whether to use extended names - not currently implemented
embedding_offset: 0 # Use when adding extra finetuning fields
train_data_paths:
[
["~/PDEBench/2D/shallow-water", "swe", ""],
["~/PDEBench/2D/NS_incom", "incompNS", ""],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
]
valid_data_paths:
[
["~/PDEBench/2D/shallow-water", "swe", ""],
["~/PDEBench/2D/NS_incom", "incompNS", ""],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
]
append_datasets: [] # List of datasets to append to the input/output projections for finetuning

finetune: &finetune
<<: *basic_config
max_epochs: 500
train_val_test: [.8, .1, .1]
accum_grad: 1
pretrained: !!bool True
group: "debugging"
pretrained_ckpt_path: "/B16-noNS/training_checkpoints/ckpt.tar"
train_data_paths: [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
valid_data_paths: # These are the same for all configs - uses split according to train_val_test
[["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
embedding_offset: 0 # Number of fields in original model - FT fields start after this
freeze_middle: !!bool False # Whether to freeze the middle layers of the model
freeze_processor: !!bool False
append_datasets: [] # List of datasets to append to the input/output projections for finetuning

frozen: &frozen
<<: *finetune
freeze_middle: !!bool True # Whether to freeze the middle layers of the model
freeze_processor: !!bool False

less_frozen: &less_frozen
<<: *finetune
freeze_middle: !!bool True # Whether to freeze the middle layers of the model
freeze_processor: !!bool True
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
basic_config: &basic_config # Run settings
log_to_wandb: !!bool True # Use wandb integration
log_to_screen: !!bool True # Log progress to screen.
save_checkpoint: !!bool True # Save checkpoints
checkpoint_save_interval: 10 # Save every # epochs - also saves "best" according to val loss
debug_grad: !!bool True # Compute gradient/step_sizes/ect for debugging
true_time: !!bool False # Debugging setting - sets num workers to zero and activates syncs
num_data_workers: 6 # Generally pulling 8 cpu per process, so using 6 for DL - not sure if best ratio
enable_amp: !!bool False # Use automatic mixed precision - blows up with low variance fields right now
compile: !!bool False # Compile model - Does not currently work
gradient_checkpointing: !!bool False # Whether to use gradient checkpointing - Slow, but lower memory
exp_dir: "./" # Output path, modify as needed
log_interval: 1 # How often to log - Don't think this is actually implemented
pretrained: !!bool False # Whether to load a pretrained model
# wandb settings
project: "project"
group: "debugging"
entity: "entity"
# Training settings
drop_path: 0.1
batch_size: 1
max_epochs: 500
scheduler_epochs: -1
epoch_size: 2000 # Artificial epoch size
rescale_gradients: !!bool False # Activate hook that scales block gradients to norm 1
optimizer: "adan" # adam, adan, whatever else i end up adding - adan did better on HP sweep
scheduler: "cosine" # Only cosine implemented
warmup_steps: 1000 # Warmup when not using DAdapt
learning_rate: -1 # -1 means use DAdapt
weight_decay: 1e-3
n_states: 12 # Number of state variables across the datasets - Can be larger than real number and things will just go unused
state_names: ["Pressure", "Vx", "Vy", "Density", "Vx", "Vy", "Density", "Pressure"] # Should be sorted
dt: 1 # Striding of data - Not currently implemented > 1
n_steps: 16 # Length of history to include in input
enforce_max_steps: !!bool False # If false and n_steps > dataset steps, use dataset steps. Otherwise, raise Exception.
accum_grad: 5 # Real batch size is accum * batch_size, real steps/"epoch" is epoch_size / accum
# Model settings
model_type: "avit" # Only option so far
block_type: "axial" # Which type of block to use - if axial, next two fields must be set to define axial ops
time_type: "attention" # Conditional on block type
space_type: "axial_attention" # Conditional on block type
tie_fields: !!bool False # Whether to use 1 embedding per field per data
embed_dim: 768 # Dimension of internal representation - 192/384/768/1024 for Ti/S/B/L
num_heads: 12 # Number of heads for attention - 3/6/12/16 for Ti/S/B/L
processor_blocks: 12 # Number of transformer blocks in the backbone - 12/12/12/24 for Ti/S/B/L
patch_size: [16, 16] # Actually currently hardcoded at 16
bias_type: "rel" # Options rel, continuous, none
# Data settings
train_val_test: [.8, .1, .1]
augmentation: !!bool False # Augmentation not implemented
use_all_fields: !!bool True # Prepopulate the field metadata dictionary from dictionary in datasets
tie_batches: !!bool False # Force everything in batch to come from one dset
extended_names: !!bool False # Whether to use extended names - not currently implemented
embedding_offset: 0 # Use when adding extra finetuning fields
train_data_paths:
[
["~/PDEBench/2D/shallow-water", "swe", ""],
["~/PDEBench/2D/NS_incom", "incompNS", ""],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
]
valid_data_paths:
[
["~/PDEBench/2D/shallow-water", "swe", ""],
["~/PDEBench/2D/NS_incom", "incompNS", ""],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
]
append_datasets: [] # List of datasets to append to the input/output projections for finetuning

finetune: &finetune
<<: *basic_config
max_epochs: 500
train_val_test: [.8, .1, .1]
accum_grad: 1
pretrained: !!bool True
group: "debugging"
pretrained_ckpt_path: "/B16-noNS/training_checkpoints/ckpt.tar"
train_data_paths: [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
valid_data_paths: # These are the same for all configs - uses split according to train_val_test
[["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
embedding_offset: 0 # Number of fields in original model - FT fields start after this
freeze_middle: !!bool False # Whether to freeze the middle layers of the model
freeze_processor: !!bool False
append_datasets: [] # List of datasets to append to the input/output projections for finetuning

frozen: &frozen
<<: *finetune
freeze_middle: !!bool True # Whether to freeze the middle layers of the model
freeze_processor: !!bool False

less_frozen: &less_frozen
<<: *finetune
freeze_middle: !!bool True # Whether to freeze the middle layers of the model
freeze_processor: !!bool True
Loading