Skip to content

Implementing nautilus-sampler#707

Open
renecotyfanboy wants to merge 2 commits intoxpsi-group:mainfrom
renecotyfanboy:main
Open

Implementing nautilus-sampler#707
renecotyfanboy wants to merge 2 commits intoxpsi-group:mainfrom
renecotyfanboy:main

Conversation

@renecotyfanboy
Copy link
Copy Markdown

Hi there, I am @lmauviard office mate, and I am dealing a lot with nested sampling with slow-to-evaluate likelihoods recently. After discussing, I went on and implemented a draft of nautilus-sampler integration for xpsi.

For a bit of context, nautilus is a nested-sampling algorithm that uses machine learning to increase the speed of convergence and reduce the required number of likelihood evaluations. The main idea is to learn an approximation of the given bounded volume at a given iteration using multiple MLPs and the already evaluated points. See the article for more context.

Doing so, the proposal is much more efficient compared to ellipsoids as in Multinest or local ellipsoids as in Ultranest. In our applications (X-ray spectroscopy), we were able to go from ~23M likelihood calls (with Ultranest + step sampler) to ~400k, without any loss on the estimated evidence, and posterior distributions that are perfectly comparable. I would be extremely curious to see how this kind of algorithm performs with the highly skewed and multimodal distributions you are dealing with, thus the draft PR.

I ran the introduction example with a bunch of parameters fixed (to speed up the sampling on my computer) with Multinest (1000 live points) and nautilus (2000 live points), and obtained respectively logZ = -27582.12 and logZ = -27583.41, for respectively 80k likelihood evaluations and 120k likelihood evaluations. It might look like nautilus underperformed here, but it had more live points, and it is good to know that nautilus can re-run after the first nested sampling, using the learnt volumes. Doing so, it performs some sort of importance sampling to estimate the Evidence, and it should unbias the result from the peculiarity of the run, but require extra likelihood evaluations (that are performed at an extremely high sampling efficiency, so this additional cost is really worth).

I am currently running both Multinest and nautilus with 5k live points each, to see how they compare in a higher live point regime. In general, nautilus scales much better with the number of live points compared to other algorithms, so you shouldn't be afraid to put this much.

Files to replicate the run : I didn't use MPI, however, it looks like nautilus as an integration for it that that I "sorta" implemented, but this should be test under real conditions.

intro-nautilus.py
intro-multinest.py

@renecotyfanboy renecotyfanboy changed the title Implenting nautilus-sampler Implementing nautilus-sampler Apr 17, 2026
@renecotyfanboy
Copy link
Copy Markdown
Author

renecotyfanboy commented Apr 17, 2026

Some updates on this : I ran two lower resolution runs with 5k live points using nautilus and Multinest.

  • nautilus converged at approx ~271k calls, pre-rerun evidence is logZ = -27591.40
  • nautilus post run cost ~14k extra calls, post-rerun evidence is logZ = -27591.39
  • Multinest finished at 408k calls and evidence of logZ = -27590.21

This shows a x2 speed up on modest dimension problem, which should scale even better as the parameter space increases. (see this paper).

Capture d’écran 2026-04-17 à 16 23 16

@thjsal
Copy link
Copy Markdown
Contributor

thjsal commented Apr 17, 2026

Is there a reason why nautilus reported slightly worse evidence than MultiNest? Should they be the same if both have converged?

@renecotyfanboy
Copy link
Copy Markdown
Author

renecotyfanboy commented Apr 17, 2026

My takes would be the following:

  • I backtested cross-tested nautilus against ultranest and another ML-based sampler (nessai) and all of them gave me consistent evidence within 0.1 dlogZ for X-ray spectroscopy
  • Multinest shows poor behavior when increasing the number of dimensions or also here regardless of the number of live points
  • On the two runs I did, I got a ~1 dlogZ discrepancy, which is (IMO) too low to be concerned with, as I would not use such low dlogZ to perform model comparison in the first place

The default dlogZ threshold for nautilus is roughly dlogZ~0.01, so even with the dlogZ ~ 0.5 threshold from the Multinest run, both values remain incompatible, but I trust the first more when coming to high dimensions. To be 100% sure, I should run an ultranest sampling + stepsampler, but my computer was already busy for a while, and I think such a run could easily take a full day or two 😅

@renecotyfanboy
Copy link
Copy Markdown
Author

After discussing with @lmauviard and @ckazantsev, I checked the posterior distribution for both approaches to confirm that the difference in evidence is likely a sampler issue. Both posterior distributions are exactly similar

posterior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants