Skip to content

feat: add MPS module — multivariate Direct Sampling (supersedes #409)#414

Closed
n0228a wants to merge 35 commits into
GeoStat-Framework:mainfrom
n0228a:multivariate
Closed

feat: add MPS module — multivariate Direct Sampling (supersedes #409)#414
n0228a wants to merge 35 commits into
GeoStat-Framework:mainfrom
n0228a:multivariate

Conversation

@n0228a

@n0228a n0228a commented Jun 25, 2026

Copy link
Copy Markdown

Overview

This PR supersedes #409 and delivers the complete gstools.mps subpackage implementing Multiple-Point Statistics simulation via the Direct Sampling algorithm. It extends the work in #409 in several significant ways: multivariate co-simulation, a fully decomposed module architecture, NaN-masked training images, and groundwork for non-stationary simulation.

Please close #409 in favour of this one.


What was added

TrainingImage — univariate and multivariate

TrainingImage now accepts either a plain NumPy array (univariate) or a dict of named arrays (multivariate co-simulation). Key features:

  • Categorical (weighted mismatch, Mariethoz 2010 Eq. 3) and continuous (L1, L2, Lp, variation-distance with automatic mean-shift correction, Mariethoz 2010 Eq. 9) distance functions.
  • Spatial-decay weighting of neighbours via the distance-power exponent δ (Mariethoz 2010 Eq. 5).
  • Per-variable distance weights for multivariate TIs (uniform default).
  • NaN masking: undefined TI cells (NaN) are excluded from pattern distances and never pasted into the simulation grid.

Multivariate co-simulation (ds_simulate)

At each simulation node a single joint scan minimises the weighted aggregate distance Σ_k w_k d_k over all variables. The selected TI cell's full variable vector is copied to the node's uninformed slots, so cross-variable joint structure is reproduced by construction. Variables already known at the node (via conditioning data) act as collocated h = 0 constraints upweighted by cond_weight.

DirectSampling — public API class

Subclasses gstools.field.base.Field and follows the same call interface as SRF. All features from #409 are preserved:

  • n-dimensional structured grids.
  • Randomised simulation path (Mariethoz 2010 §3 ¶13).
  • scan_fraction caps the per-node TI scan (Juda 2022 §2).
  • threshold=0.0 activates DSBC mode (full scan, best-candidate selection).
  • boundary="strict" (default) or "partial" — partial mode drops lags that exceed the TI extent and searches with the reduced template.
  • set_condition() with nearest-node snapping and collision resolution.
  • DAG-based parallelism via num_threads (or gstools.config.NUM_THREADS): independent nodes in the simulation path are processed concurrently via ThreadPoolExecutor.

New in this PR:

  • set_nonstationary() — per-node rotation and anisotropy maps for geometric lag transform (infrastructure for non-stationary simulation).
  • Univariate simulations are internally routed through the multivariate engine, so all paths share one code base.

MPSModel — configuration object

Separates search-parameter validation (n_neighbors, threshold, scan_fraction, boundary, max_radius) from the DirectSampling public API.

Module architecture

The monolithic direct_sampling.py of #409 has been split into focused modules:

File Responsibility
model.py MPSModel — validated search-parameter config
training_image.py TrainingImage — data + distance functions
distance.py Stateless distance helpers
neighbors.py Neighbour selection, lag transforms, geometric transforms, search-window bounds
scan.py TI scanning loop
runner.py Simulation path and DAG thread executor
simulate.py _DirectSamplingEngine + ds_simulate entry point
direct_sampling.py DirectSampling — public API

Examples

Four standalone Python examples (examples/13_mps/) and an overview notebook:

  • 00_simple_unconditional.py — minimal unconditional simulation
  • 01_conditional.py — hard-data conditioning
  • 02_continuous.py — continuous variable simulation
  • 03_channel_strebelle.py — classic Strebelle (2002) channelised TI, conditional

Tests

tests/test_mps.py (~2 200 lines) covers univariate, multivariate, conditional, parallel, boundary-mode, NaN-masking, and non-stationarity scenarios.


Usage

import numpy as np
from gstools import mps

# Univariate categorical
ti = mps.TrainingImage(ti_array, categorical=True)
ds = mps.DirectSampling(ti, n_neighbors=30, scan_fraction=1.0, threshold=0.01, num_threads=4)
ds.set_condition(cond_pos, cond_val)
field = ds([np.arange(100, dtype=float), np.arange(100, dtype=float)], seed=42)

# Multivariate co-simulation
ti_mv = mps.TrainingImage(
    {"facies": facies_arr, "porosity": por_arr},
    categorical={"facies": True, "porosity": False},
    weights={"facies": 0.6, "porosity": 0.4},
)
ds_mv = mps.DirectSampling(ti_mv, n_neighbors=20, threshold=0.05)
fields = ds_mv([np.arange(50, dtype=float), np.arange(50, dtype=float)], seed=0)
# fields["facies"], fields["porosity"]

References

  • Mariethoz et al. (2010): Direct sampling method to perform multiple-point geostatistical simulations.
  • Juda et al. (2022): A parsimonious parametrisation of the Direct Sampling algorithm.
  • Strebelle (2002): Conditional simulation of complex geological structures using multiple-point statistics.

n0228a and others added 30 commits May 12, 2026 15:36
Adds a new gstools.mps submodule implementing Multiple-Point Statistics
via a DirectSampling algorithm. Includes TrainingImage and distance
utilities, plus an initial channel demo example.
## Add `boundary` and `max_radius` to `DirectSampling`

Two new parameters for `DirectSampling` and `ds_simulate`:

**`max_radius`** (float, optional)
Caps SG neighbour selection by Euclidean distance. Provides finer spatial control than the integer `max_offset`, which only bounds the precomputed offset table.

**`boundary`** (`"strict"` | `"partial"`)
Controls what happens when the data-event template extends beyond the training image edges.
- `"strict"` (default) — existing behaviour: if no valid window exists, fall back to a random TI value.
- `"partial"` — drops lags that can never be placed in the TI (|h| ≥ TI size in any dimension), then searches with the reduced template (Mariethoz 2010 §6.2). Avoids unnecessary random fallbacks when a large or stretched template only partially overlaps the TI.
n0228a added 5 commits June 20, 2026 20:57
Remove the gstools_core import guard, _use_core(), and all Rust dispatch
seams (_marshal_ffi, _scan_window dispatch, ffi_args plumbing, DAG and
joint-scan Rust branches) so this branch is pure-Python only. The
authoritative pure-Python paths (_select_neighbors_py, _scan_window_py,
the Python DAG builder) are unchanged. Drop the now-dead distance_spec
encoding helpers and the Rust-vs-Python equivalence/snapshot tests.

The Rust-capable version is preserved on branch multivariate-rust.
@n0228a

n0228a commented Jun 25, 2026

Copy link
Copy Markdown
Author

Closing in favour of a clean branch with only MPS-module files (no upstream noise).

@n0228a n0228a closed this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant