You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I am @lmauviard office mate, and I am dealing a lot with nested sampling with slow-to-evaluate likelihoods recently. After discussing, I went on and implemented a draft of nautilus-sampler integration for xpsi.
For a bit of context, nautilus is a nested-sampling algorithm that uses machine learning to increase the speed of convergence and reduce the required number of likelihood evaluations. The main idea is to learn an approximation of the given bounded volume at a given iteration using multiple MLPs and the already evaluated points. See the article for more context.
Doing so, the proposal is much more efficient compared to ellipsoids as in Multinest or local ellipsoids as in Ultranest. In our applications (X-ray spectroscopy), we were able to go from ~23M likelihood calls (with Ultranest + step sampler) to ~400k, without any loss on the estimated evidence, and posterior distributions that are perfectly comparable. I would be extremely curious to see how this kind of algorithm performs with the highly skewed and multimodal distributions you are dealing with, thus the draft PR.
I ran the introduction example with a bunch of parameters fixed (to speed up the sampling on my computer) with Multinest (1000 live points) and nautilus (2000 live points), and obtained respectively logZ = -27582.12 and logZ = -27583.41, for respectively 80k likelihood evaluations and 120k likelihood evaluations. It might look like nautilus underperformed here, but it had more live points, and it is good to know that nautilus can re-run after the first nested sampling, using the learnt volumes. Doing so, it performs some sort of importance sampling to estimate the Evidence, and it should unbias the result from the peculiarity of the run, but require extra likelihood evaluations (that are performed at an extremely high sampling efficiency, so this additional cost is really worth).
I am currently running both Multinest and nautilus with 5k live points each, to see how they compare in a higher live point regime. In general, nautilus scales much better with the number of live points compared to other algorithms, so you shouldn't be afraid to put this much.
Files to replicate the run : I didn't use MPI, however, it looks like nautilus as an integration for it that that I "sorta" implemented, but this should be test under real conditions.
I backtested cross-tested nautilus against ultranest and another ML-based sampler (nessai) and all of them gave me consistent evidence within 0.1 dlogZ for X-ray spectroscopy
On the two runs I did, I got a ~1 dlogZ discrepancy, which is (IMO) too low to be concerned with, as I would not use such low dlogZ to perform model comparison in the first place
The default dlogZ threshold for nautilus is roughly dlogZ~0.01, so even with the dlogZ ~ 0.5 threshold from the Multinest run, both values remain incompatible, but I trust the first more when coming to high dimensions. To be 100% sure, I should run an ultranest sampling + stepsampler, but my computer was already busy for a while, and I think such a run could easily take a full day or two 😅
After discussing with @lmauviard and @ckazantsev, I checked the posterior distribution for both approaches to confirm that the difference in evidence is likely a sampler issue. Both posterior distributions are exactly similar
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi there, I am @lmauviard office mate, and I am dealing a lot with nested sampling with slow-to-evaluate likelihoods recently. After discussing, I went on and implemented a draft of
nautilus-samplerintegration forxpsi.For a bit of context,
nautilusis a nested-sampling algorithm that uses machine learning to increase the speed of convergence and reduce the required number of likelihood evaluations. The main idea is to learn an approximation of the given bounded volume at a given iteration using multiple MLPs and the already evaluated points. See the article for more context.Doing so, the proposal is much more efficient compared to ellipsoids as in
Multinestor local ellipsoids as inUltranest. In our applications (X-ray spectroscopy), we were able to go from ~23M likelihood calls (withUltranest+ step sampler) to ~400k, without any loss on the estimated evidence, and posterior distributions that are perfectly comparable. I would be extremely curious to see how this kind of algorithm performs with the highly skewed and multimodal distributions you are dealing with, thus the draft PR.I ran the introduction example with a bunch of parameters fixed (to speed up the sampling on my computer) with
Multinest(1000 live points) andnautilus(2000 live points), and obtained respectively logZ = -27582.12 and logZ = -27583.41, for respectively 80k likelihood evaluations and 120k likelihood evaluations. It might look likenautilusunderperformed here, but it had more live points, and it is good to know thatnautiluscan re-run after the first nested sampling, using the learnt volumes. Doing so, it performs some sort of importance sampling to estimate the Evidence, and it should unbias the result from the peculiarity of the run, but require extra likelihood evaluations (that are performed at an extremely high sampling efficiency, so this additional cost is really worth).I am currently running both
Multinestandnautiluswith 5k live points each, to see how they compare in a higher live point regime. In general,nautilusscales much better with the number of live points compared to other algorithms, so you shouldn't be afraid to put this much.Files to replicate the run : I didn't use MPI, however, it looks like
nautilusas an integration for it that that I "sorta" implemented, but this should be test under real conditions.intro-nautilus.py
intro-multinest.py