feat: add MPS module — multivariate Direct Sampling (supersedes #409)#414
Closed
n0228a wants to merge 35 commits into
Closed
feat: add MPS module — multivariate Direct Sampling (supersedes #409)#414n0228a wants to merge 35 commits into
n0228a wants to merge 35 commits into
Conversation
Adds a new gstools.mps submodule implementing Multiple-Point Statistics via a DirectSampling algorithm. Includes TrainingImage and distance utilities, plus an initial channel demo example.
## Add `boundary` and `max_radius` to `DirectSampling` Two new parameters for `DirectSampling` and `ds_simulate`: **`max_radius`** (float, optional) Caps SG neighbour selection by Euclidean distance. Provides finer spatial control than the integer `max_offset`, which only bounds the precomputed offset table. **`boundary`** (`"strict"` | `"partial"`) Controls what happens when the data-event template extends beyond the training image edges. - `"strict"` (default) — existing behaviour: if no valid window exists, fall back to a random TI value. - `"partial"` — drops lags that can never be placed in the TI (|h| ≥ TI size in any dimension), then searches with the reduced template (Mariethoz 2010 §6.2). Avoids unnecessary random fallbacks when a large or stretched template only partially overlaps the TI.
scan_fraction*search_window
Remove the gstools_core import guard, _use_core(), and all Rust dispatch seams (_marshal_ffi, _scan_window dispatch, ffi_args plumbing, DAG and joint-scan Rust branches) so this branch is pure-Python only. The authoritative pure-Python paths (_select_neighbors_py, _scan_window_py, the Python DAG builder) are unchanged. Drop the now-dead distance_spec encoding helpers and the Rust-vs-Python equivalence/snapshot tests. The Rust-capable version is preserved on branch multivariate-rust.
Author
|
Closing in favour of a clean branch with only MPS-module files (no upstream noise). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR supersedes #409 and delivers the complete
gstools.mpssubpackage implementing Multiple-Point Statistics simulation via the Direct Sampling algorithm. It extends the work in #409 in several significant ways: multivariate co-simulation, a fully decomposed module architecture, NaN-masked training images, and groundwork for non-stationary simulation.Please close #409 in favour of this one.
What was added
TrainingImage— univariate and multivariateTrainingImagenow accepts either a plain NumPy array (univariate) or adictof named arrays (multivariate co-simulation). Key features:Multivariate co-simulation (
ds_simulate)At each simulation node a single joint scan minimises the weighted aggregate distance
Σ_k w_k d_kover all variables. The selected TI cell's full variable vector is copied to the node's uninformed slots, so cross-variable joint structure is reproduced by construction. Variables already known at the node (via conditioning data) act as collocatedh = 0constraints upweighted bycond_weight.DirectSampling— public API classSubclasses
gstools.field.base.Fieldand follows the same call interface asSRF. All features from #409 are preserved:scan_fractioncaps the per-node TI scan (Juda 2022 §2).threshold=0.0activates DSBC mode (full scan, best-candidate selection).boundary="strict"(default) or"partial"— partial mode drops lags that exceed the TI extent and searches with the reduced template.set_condition()with nearest-node snapping and collision resolution.num_threads(orgstools.config.NUM_THREADS): independent nodes in the simulation path are processed concurrently viaThreadPoolExecutor.New in this PR:
set_nonstationary()— per-node rotation and anisotropy maps for geometric lag transform (infrastructure for non-stationary simulation).MPSModel— configuration objectSeparates search-parameter validation (
n_neighbors,threshold,scan_fraction,boundary,max_radius) from theDirectSamplingpublic API.Module architecture
The monolithic
direct_sampling.pyof #409 has been split into focused modules:model.pyMPSModel— validated search-parameter configtraining_image.pyTrainingImage— data + distance functionsdistance.pyneighbors.pyscan.pyrunner.pysimulate.py_DirectSamplingEngine+ds_simulateentry pointdirect_sampling.pyDirectSampling— public APIExamples
Four standalone Python examples (
examples/13_mps/) and an overview notebook:00_simple_unconditional.py— minimal unconditional simulation01_conditional.py— hard-data conditioning02_continuous.py— continuous variable simulation03_channel_strebelle.py— classic Strebelle (2002) channelised TI, conditionalTests
tests/test_mps.py(~2 200 lines) covers univariate, multivariate, conditional, parallel, boundary-mode, NaN-masking, and non-stationarity scenarios.Usage
References