PaddlePaddle · lkyu-ly · Apr 2, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/examples/multiple_physics_pretraining/LICENSE b/examples/multiple_physics_pretraining/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/examples/multiple_physics_pretraining/README.md b/examples/multiple_physics_pretraining/README.md
@@ -0,0 +1,136 @@
+# Multiple Physics Pretraining (MPP)
+
+This example integrates the [MPP](https://openreview.net/forum?id=DKSI3bULiZ) (Multiple Physics Pretraining) model into PaddleCFD.
+
+Multiple Physics Pretraining is a pretraining strategy in which multiple sets of dynamics are jointly normalized and embedded into a single space for prediction. It uses an **AViT** (Axial Vision Transformer) architecture that learns multiple physics simultaneously through pretraining, enabling strong finetuning performance even across different physics domains.
+
+Paper: "Multiple Physics Pretraining for Spatiotemporal Surrogate Models" (NeurIPS 2024)
+
+Below are quick instructions on paddle, full readme please visit https://github.com/PolymathicAI/multiple_physics_pretraining
+
+## Installation
+
+```bash
+pip install ppcfd
+pip install wandb  # optional
+```
+
+## Quick Start
+
+### Import the model
+
+```python
+from ppcfd.models.multiple_physics_pretraining import AViT, build_avit
+```
+
+### Train (single device)
+
+```bash
+python train_basic.py --run_name my_experiment --config basic_config --yaml_config config/mpp_avit_ti_config.yaml
+```
+
+### Finetune from pretrained weights
+
+Original PyTorch pretrained weights are available at:
+https://drive.google.com/drive/folders/1Qaqa-RnzUDOO8-Gi4zlf4BE53SfWqDwx
+
+To use them in PaddleCFD, first convert to PaddlePaddle format:
+
+```bash
+python convert_torch_weights.py \
+    --yaml_config config/mpp_avit_s_config.yaml \
+    --config basic_config \
+    --weights path/to/MPP_AViT_S.tar \
+    --output models_paddle/MPP_AViT_S.pdparams
+```
+
+Then finetune:
+
+```bash
+python train_basic.py --run_name my_finetune --config finetune --yaml_config config/mpp_avit_s_config.yaml
+```
+
+### Inference
+
+If needed, use follow code to generate test input.
+
+```bash
+python multiple_physics_pretraining/generate_forward_case.py --output /tmp/case.npz --labels 0,1,2 --bcs 0,0 --output ./forward_case.npz
+```
+
+Then run a test forward case:
+
+```bash
+python forward_pretrained.py \
+    --yaml_config config/mpp_avit_s_config.yaml \
+    --config basic_config \
+    --weights path/to/checkpoint.pdparams \
+    --case_npz path/to/input.npz \
+    --output path/to/output.npz
+```
+
+## Model Variants
+
+| Variant   | embed_dim | num_heads | processor_blocks |
+| --------- | --------- | --------- | ---------------- |
+| Ti (Tiny) | 192       | 3         | 12               |
+| S (Small) | 384       | 6         | 12               |
+| B (Base)  | 768       | 12        | 12               |
+| L (Large) | 1024      | 16        | 24               |
+
+Config files are provided in `config/` for each variant. Use the `basic_config` namespace for pretraining and `finetune` for finetuning.
+
+## Directory Structure
+
+```
+examples/multiple_physics_pretraining/
+├── config/                      # YAML configuration files (Ti/S/B/L)
+├── train_basic.py               # Training script
+├── forward_pretrained.py        # Inference script
+├── convert_torch_weights.py     # PyTorch -> PaddlePaddle weight conversion
+├── requirements.txt             # Additional dependencies
+├── LICENSE                      # MIT License
+└── README.md                    # This file
+
+ppcfd/models/multiple_physics_pretraining/
+├── avit.py                      # AViT model definition
+├── shared_modules.py            # MLP, Attention, PositionBias
+├── spatial_modules.py           # AxialAttention, hMLP stem/output
+├── time_modules.py              # Temporal attention block
+├── mixed_modules.py             # SpaceTimeBlock combiner
+├── DropPath_util.py             # Stochastic depth
+├── paddle_utils.py              # PaddlePaddle utilities
+├── utils/                       # Training utilities
+│   ├── YParams.py               # YAML config parser
+│   ├── logging_utils.py         # Logging
+│   ├── schedulers.py            # LR scheduler
+│   ├── adan_paddle.py           # Adan optimizer
+│   ├── dadapt_adam_paddle.py    # DAdaptAdam optimizer
+│   ├── dadapt_adan_paddle.py    # DAdaptAdan optimizer
+│   └── custom_optimizer_base.py # Optimizer base class
+└── data_utils/                  # Data loading
+    ├── datasets.py              # MixedDataset, dataset registry
+    ├── hdf5_datasets.py         # HDF5 dataset classes (SWE, NS, etc.)
+    └── mixed_dset_sampler.py    # Multi-dataset sampler
+```
+
+## Adding Datasets
+
+Datasets must return data in `(Batch, Time, Channel, H, W)` format and extend `BaseHDF5DirectoryDataset`. See `data_utils/hdf5_datasets.py` for examples.
+
+1. Define your dataset class in `ppcfd/models/multiple_physics_pretraining/data_utils/hdf5_datasets.py`
+2. Register it in `DSET_NAME_TO_OBJECT` in `datasets.py`
+3. Add data paths to the config YAML file
+
+## Citing
+
+```bibtex
+@inproceedings{
+  mccabe2024multiple,
+  title={Multiple Physics Pretraining for Spatiotemporal Surrogate Models},
+  author={Michael McCabe and Bruno R{\'e}galdo-Saint Blancard and others},
+  booktitle={NeurIPS},
+  year={2024},
+  url={https://openreview.net/forum?id=DKSI3bULiZ}
+}
+```
diff --git a/examples/multiple_physics_pretraining/config/mpp_avit_L_config.yaml b/examples/multiple_physics_pretraining/config/mpp_avit_L_config.yaml
@@ -0,0 +1,99 @@
+basic_config: &basic_config # Run settings
+  log_to_wandb: !!bool True # Use wandb integration
+  log_to_screen: !!bool True # Log progress to screen.
+  save_checkpoint: !!bool True # Save checkpoints
+  checkpoint_save_interval: 10 # Save every # epochs - also saves "best" according to val loss
+  debug_grad: !!bool True # Compute gradient/step_sizes/ect for debugging
+  true_time: !!bool False # Debugging setting - sets num workers to zero and activates syncs
+  num_data_workers: 6 # Generally pulling 8 cpu per process, so using 6 for DL - not sure if best ratio
+  enable_amp: !!bool False # Use automatic mixed precision - blows up with low variance fields right now
+  compile: !!bool False # Compile model - Does not currently work
+  gradient_checkpointing: !!bool False # Whether to use gradient checkpointing - Slow, but lower memory
+  exp_dir: "./" # Output path, modify as needed
+  log_interval: 1 # How often to log - Don't think this is actually implemented
+  pretrained: !!bool False # Whether to load a pretrained model
+  # wandb settings
+  project: "project"
+  group: "debugging"
+  entity: "entity"
+  # Training settings
+  drop_path: 0.1
+  batch_size: 1
+  max_epochs: 500
+  scheduler_epochs: -1
+  epoch_size: 2000 # Artificial epoch size
+  rescale_gradients: !!bool False # Activate hook that scales block gradients to norm 1
+  optimizer: "adan" # adam, adan, whatever else i end up adding - adan did better on HP sweep
+  scheduler: "cosine" # Only cosine implemented
+  warmup_steps: 1000 # Warmup when not using DAdapt
+  learning_rate: -1 # -1 means use DAdapt
+  weight_decay: 1e-3
+  n_states: 12 # Number of state variables across the datasets - Can be larger than real number and things will just go unused
+  state_names: ["Pressure", "Vx", "Vy", "Density", "Vx", "Vy", "Density", "Pressure"] # Should be sorted
+  dt: 1 # Striding of data - Not currently implemented > 1
+  n_steps: 16 # Length of history to include in input
+  enforce_max_steps: !!bool False # If false and n_steps > dataset steps, use dataset steps. Otherwise, raise Exception.
+  accum_grad: 5 # Real batch size is accum * batch_size, real steps/"epoch" is epoch_size / accum
+  # Model settings
+  model_type: "avit" # Only option so far
+  block_type: "axial" # Which type of block to use - if axial, next two fields must be set to define axial ops
+  time_type: "attention" # Conditional on block type
+  space_type: "axial_attention" # Conditional on block type
+  tie_fields: !!bool False # Whether to use 1 embedding per field per data
+  embed_dim: 1024 # Dimension of internal representation - 192/384/768/1024 for Ti/S/B/L
+  num_heads: 16 # Number of heads for attention - 3/6/12/16 for Ti/S/B/L
+  processor_blocks: 24 # Number of transformer blocks in the backbone - 12/12/12/24 for Ti/S/B/L
+  patch_size: [16, 16] # Actually currently hardcoded at 16
+  bias_type: "rel" # Options rel, continuous, none
+  # Data settings
+  train_val_test: [.8, .1, .1]
+  augmentation: !!bool False # Augmentation not implemented
+  use_all_fields: !!bool True # Prepopulate the field metadata dictionary from dictionary in datasets
+  tie_batches: !!bool False # Force everything in batch to come from one dset
+  extended_names: !!bool False # Whether to use extended names - not currently implemented
+  embedding_offset: 0 # Use when adding extra finetuning fields
+  train_data_paths:
+    [
+      ["~/PDEBench/2D/shallow-water", "swe", ""],
+      ["~/PDEBench/2D/NS_incom", "incompNS", ""],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
+      ["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
+      ["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
+    ]
+  valid_data_paths:
+    [
+      ["~/PDEBench/2D/shallow-water", "swe", ""],
+      ["~/PDEBench/2D/NS_incom", "incompNS", ""],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
+      ["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
+      ["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
+    ]
+  append_datasets: [] # List of datasets to append to the input/output projections for finetuning
+
+finetune: &finetune
+  <<: *basic_config
+  max_epochs: 500
+  train_val_test: [.8, .1, .1]
+  accum_grad: 1
+  pretrained: !!bool True
+  group: "debugging"
+  pretrained_ckpt_path: "/B16-noNS/training_checkpoints/ckpt.tar"
+  train_data_paths: [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
+  valid_data_paths: # These are the same for all configs - uses split according to train_val_test
+    [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
+  embedding_offset: 0 # Number of fields in original model - FT fields start after this
+  freeze_middle: !!bool False # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool False
+  append_datasets: [] # List of datasets to append to the input/output projections for finetuning
+
+frozen: &frozen
+  <<: *finetune
+  freeze_middle: !!bool True # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool False
+
+less_frozen: &less_frozen
+  <<: *finetune
+  freeze_middle: !!bool True # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool True
diff --git a/examples/multiple_physics_pretraining/config/mpp_avit_b_config.yaml b/examples/multiple_physics_pretraining/config/mpp_avit_b_config.yaml
@@ -0,0 +1,99 @@
+basic_config: &basic_config # Run settings
+  log_to_wandb: !!bool True # Use wandb integration
+  log_to_screen: !!bool True # Log progress to screen.
+  save_checkpoint: !!bool True # Save checkpoints
+  checkpoint_save_interval: 10 # Save every # epochs - also saves "best" according to val loss
+  debug_grad: !!bool True # Compute gradient/step_sizes/ect for debugging
+  true_time: !!bool False # Debugging setting - sets num workers to zero and activates syncs
+  num_data_workers: 6 # Generally pulling 8 cpu per process, so using 6 for DL - not sure if best ratio
+  enable_amp: !!bool False # Use automatic mixed precision - blows up with low variance fields right now
+  compile: !!bool False # Compile model - Does not currently work
+  gradient_checkpointing: !!bool False # Whether to use gradient checkpointing - Slow, but lower memory
+  exp_dir: "./" # Output path, modify as needed
+  log_interval: 1 # How often to log - Don't think this is actually implemented
+  pretrained: !!bool False # Whether to load a pretrained model
+  # wandb settings
+  project: "project"
+  group: "debugging"
+  entity: "entity"
+  # Training settings
+  drop_path: 0.1
+  batch_size: 1
+  max_epochs: 500
+  scheduler_epochs: -1
+  epoch_size: 2000 # Artificial epoch size
+  rescale_gradients: !!bool False # Activate hook that scales block gradients to norm 1
+  optimizer: "adan" # adam, adan, whatever else i end up adding - adan did better on HP sweep
+  scheduler: "cosine" # Only cosine implemented
+  warmup_steps: 1000 # Warmup when not using DAdapt
+  learning_rate: -1 # -1 means use DAdapt
+  weight_decay: 1e-3
+  n_states: 12 # Number of state variables across the datasets - Can be larger than real number and things will just go unused
+  state_names: ["Pressure", "Vx", "Vy", "Density", "Vx", "Vy", "Density", "Pressure"] # Should be sorted
+  dt: 1 # Striding of data - Not currently implemented > 1
+  n_steps: 16 # Length of history to include in input
+  enforce_max_steps: !!bool False # If false and n_steps > dataset steps, use dataset steps. Otherwise, raise Exception.
+  accum_grad: 5 # Real batch size is accum * batch_size, real steps/"epoch" is epoch_size / accum
+  # Model settings
+  model_type: "avit" # Only option so far
+  block_type: "axial" # Which type of block to use - if axial, next two fields must be set to define axial ops
+  time_type: "attention" # Conditional on block type
+  space_type: "axial_attention" # Conditional on block type
+  tie_fields: !!bool False # Whether to use 1 embedding per field per data
+  embed_dim: 768 # Dimension of internal representation - 192/384/768/1024 for Ti/S/B/L
+  num_heads: 12 # Number of heads for attention - 3/6/12/16 for Ti/S/B/L
+  processor_blocks: 12 # Number of transformer blocks in the backbone - 12/12/12/24 for Ti/S/B/L
+  patch_size: [16, 16] # Actually currently hardcoded at 16
+  bias_type: "rel" # Options rel, continuous, none
+  # Data settings
+  train_val_test: [.8, .1, .1]
+  augmentation: !!bool False # Augmentation not implemented
+  use_all_fields: !!bool True # Prepopulate the field metadata dictionary from dictionary in datasets
+  tie_batches: !!bool False # Force everything in batch to come from one dset
+  extended_names: !!bool False # Whether to use extended names - not currently implemented
+  embedding_offset: 0 # Use when adding extra finetuning fields
+  train_data_paths:
+    [
+      ["~/PDEBench/2D/shallow-water", "swe", ""],
+      ["~/PDEBench/2D/NS_incom", "incompNS", ""],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
+      ["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
+      ["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
+    ]
+  valid_data_paths:
+    [
+      ["~/PDEBench/2D/shallow-water", "swe", ""],
+      ["~/PDEBench/2D/NS_incom", "incompNS", ""],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "128"],
+      ["~/PDEBench/2D/CFD/2D_Train_Rand", compNS, "512"],
+      ["~/PDEBench/2D/CFD/2D_Train_Turb", compNS, ""],
+      ["~/PDEBench/2D/diffusion-reaction", "diffre2d", ""],
+    ]
+  append_datasets: [] # List of datasets to append to the input/output projections for finetuning
+
+finetune: &finetune
+  <<: *basic_config
+  max_epochs: 500
+  train_val_test: [.8, .1, .1]
+  accum_grad: 1
+  pretrained: !!bool True
+  group: "debugging"
+  pretrained_ckpt_path: "/B16-noNS/training_checkpoints/ckpt.tar"
+  train_data_paths: [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
+  valid_data_paths: # These are the same for all configs - uses split according to train_val_test
+    [["/PDEBench/2D/CFD/2D_Train_Turb", "compNS", "M1.0"]]
+  embedding_offset: 0 # Number of fields in original model - FT fields start after this
+  freeze_middle: !!bool False # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool False
+  append_datasets: [] # List of datasets to append to the input/output projections for finetuning
+
+frozen: &frozen
+  <<: *finetune
+  freeze_middle: !!bool True # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool False
+
+less_frozen: &less_frozen
+  <<: *finetune
+  freeze_middle: !!bool True # Whether to freeze the middle layers of the model
+  freeze_processor: !!bool True