Release v0.2.0 · NVIDIA-NeMo/Gym

Release Summary

NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:

17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
End-to-end rollout collection with a locally managed vLLM server
Install directly from PyPI with pip install nemo-gym

First-Time Contributors

We welcomed 15 new contributors to this release! Here are a few highlights:

@sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
@3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
@Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation

Thank you to all the new contributors for helping make NeMo Gym better!

Major Features & Improvements

New Environments

Added 17 new resources servers spanning:
- Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
- Math: Lean4 Mathematical Proofs (#563)
- Science: Aviary (#55), NewtonBench (#650)
- Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
- Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
- Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
- RLHF: Generative Reward Model Compare (#674)
Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)

Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments

Future House Aviary (#55, #590)
Open-Thought Reasoning Gym (#113)
Prime Intellect Verifiers (#573)

Model Serving

Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
vLLM 0.16+ support for the reasoning field in responses (#816)
VLLMModel chat template kwargs support (#538, #636)
Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)

Rollout Collection & Profiling

New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
CPU profiling for rollout performance analysis (#763)
Add option for seeding on num_repeats for rollouts (#740)

Infrastructure & Developer Experience

PyPI compatibility: install via pip install nemo-gym (#649)
Dry run mode: ng_run +dryrun=true to validate configs and install environments without starting servers (#743)
ng_status command to list running servers and their health (#290)
Server stdout/stderr redirection with server name prefixes (#703)
FastAPI worker support for higher throughput across multiple workers (#566)

Model Recipes

Nemotron 3 Nano training recipe (#699)
Nemotron 3 Super training recipe (#863)

Deprecation Notices

Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.

Bug Fixes

Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
Fixed crash when server receives malformed JSON during rollout collection (#770)
Fixed dry run mode failing (#746)
Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
Fixed reward profiling failing when model response doesn't include usage stats (#824)
Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
ng_dump_config now redacts API key values from output (#567)

Documentation

New training tutorials: Unsloth training with NeMo Gym, multi-environment training
New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
Infrastructure: added SWE RL infrastructure case study, deployment topology docs
Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes

Looking Ahead

VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling

Release Assets

GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super

What's Changed

Bump to v0.2.0 by @bxyu-nvidia in #510
reasoning-gym resource server by @cmunley1 in #113
docs: redirect setup by @lbliii in #513
docs: Miscellaneous GRPO tutorial fixes by @bxyu-nvidia in #512
docs settings update by @lbliii in #525
Debug server package versions by @fsiino-nvidia in #406
List running server health and status by @fsiino-nvidia in #290
VLLMModel supports chat template kwargs by @pjin-nvidia in #538
Salesforce xlam-function-calling-60k resources server by @cmunley1 in #262
python flag for colab venv installation by @cmunley1 in #526
add unsloth and trl to docs by @cmunley1 in #536
docs: remove trl docs by @cmunley1 in #543
Remove PlainTextResponse response_class by @fsiino-nvidia in #544
Increase test_train_data_utils coverage by @fsiino-nvidia in #553
Generic Aviary integration by @sidnarayanan in #55
ng_dump_config sanity removes API key values by @bxyu-nvidia in #567
Feat: Add reward profiling and fractional reward by @abukharin-nv in #83
Single step environments for SWE tasks by @atefehsz in #561
NL2Bash using Equivalency Judge by @kbhardwaj-nvidia in #569
enh: use agent ref from data in rollouts by @gwarmstrong in #568
FastAPI worker support by @bxyu-nvidia in #566
Local vLLM model and other misc improvements by @bxyu-nvidia in #558
Update math_with_judge artifact paths by @roclark in #582
Add Hugging Face identifier for coding resource by @roclark in #583
updating swerl_gen config by @atefehsz in #588
NeMo Skills Tools Resource by @gwarmstrong in #571
Add math_formal_lean resource server for Lean4 proof verification by @stephencge in #563
Aviary rollouts can be configured to return transitions or not by @sidnarayanan in #590
openhands by @sdevare-nv in #343
Terminus (judge only) Slicing Environment by @kbhardwaj-nvidia in #594
0.2.0 new doc stubs by @lbliii in #581
Add tutorial for custom data preparation by @roclark in #596
Fix invalid ref in docs build by @bxyu-nvidia in #604
Fix Nemo-Skills python tool to use http by @gwarmstrong in #606
Expanding Terminus Slicing PR by @kbhardwaj-nvidia in #597
Updating swerl_gen to support custom parsers by @atefehsz in #624
docs: unsloth fix by @cmunley1 in #622
arc-agi resource server by @cmunley1 in #105
arc readme by @cmunley1 in #634
VLLMModel: Add chat template kwargs on tokenize request by @bxyu-nvidia in #636
[docs] Add architecture diagrams by @ananthsub in #574
feat: reward profiling by @cmunley1 in #621
docs: issue 626 by @lbliii in #638
ci: Enable the test job to build a wheel and publiish to test.pypi by @chtruong814 in #651
v1 of text-to-sql by @3mei in #648
Yev/text to sql v1.1 by @3mei in #653
Upstream Super 3 dev 20260205 by @bxyu-nvidia in #654
ns tools stability by @gwarmstrong in #658
Bump package versions to fix security vulnerabilities by @srogawski-nvidia in #667
docs: tutorial section standarization by @lbliii in #656
docs: add responses api model config to ng prepare data in docs by @cmunley1 in #678
feat: always track aggregate tool call timing in ns_tools by @gwarmstrong in #668
Add config-based venv skip when .venv is present by @yashaswikarnati in #680
Add single GPU training instructions to tutorial by @srogawski-nvidia in #681
Text-to-SQL: harden for scale by @3mei in #677
feat: Global config dict and find_open_port respect port ranges by @bxyu-nvidia in #685
remove rfc section by @lbliii in #686
docs: Multi verifier rollouts by @bxyu-nvidia in #682
docs: revamp ecosystem page, restructure training tutorials by @cwing-nvidia in #683
docs: Add topology docs and link to training framework integration by @ananthsub in #693
docs: Integration protocol by @bxyu-nvidia in #671
docs: SWE RL Infra case study by @bxyu-nvidia in #695
docs: Clean SWE RL Case study by @bxyu-nvidia in #696
docs: move arch page by @cwing-nvidia in #701
docs: remove placeholder agent server pages by @cwing-nvidia in #702
docs: remove draft markers from data toctree entries by @cwing-nvidia in #704
docs: Fix product naming and broken link in infrastructure docs by @cwing-nvidia in #705
Simplify README ecosystem section to match docs page by @cwing-nvidia in #707
Cwing/hpundt/readme by @cwing-nvidia in #708
Redirect server stdout/stderr by default, and apply server prefix during server venv setup by @pjin-nvidia in #703
Update README ecosystem links to match docs tutorial pages by @cwing-nvidia in #709
docs: VLLMModel by @bxyu-nvidia in #697
Remove model-recipes section from docs by @cwing-nvidia in #710
docs: TRL integration by @cmunley1 in #602
docs: Environment properties by @bxyu-nvidia in #713
docs: Clean environment properties by @bxyu-nvidia in #714
Environments overview cleanup by @cwing-nvidia in #715
docs: Fix broken README links and clean up unsloth docs by @cwing-nvidia in #716
Style Guide Sweep by @lbliii in #717
docs: Remove first-training-run page and update next steps across Getting Started by @cwing-nvidia in #721
docs: FAQ: inference.nvidia.com has no response diversity by @bxyu-nvidia in #718
Remove Dataset viewer and Gradio dependency by @bxyu-nvidia in #726
Remove extraneous MLFlow deps by @bxyu-nvidia in #727
Add Nemotron 3 Nano 30B multi-node training tutorial by @srogawski-nvidia in #699
feat: Prime Intellect verifiers integration by @cmunley1 in #573
docs: Fix "Prepare and Validate Data" command by @bxyu-nvidia in #730
docs: Fix broken link by @bxyu-nvidia in #731
docs: Fix typo by @bxyu-nvidia in #732
Add Claude Code skill for adding benchmarks by @jfarris-nvidia in #734
Add CLAUDE.md for Claude Code onboarding by @jfarris-nvidia in #733
Bump Pillow >= 12.1.1 by @bxyu-nvidia in #739
feat: Add option for seeding on num repeats by @bxyu-nvidia in #740
feat: ng_run Dry run support by @bxyu-nvidia in #743
feat: Top level UV config by @bxyu-nvidia in #745
feat: Fix dry run by @bxyu-nvidia in #746
fix: ng prepare data metrics conflict by @cmunley1 in #738
add agent ref to reasoning gym dataset by @cmunley1 in #752
feat: install environments serially in dryrun (optimization) by @terrykong in #753
Add NewtonBench Resource Server by @Kelvin0110 in #650
feat: Rollout infra upgrades by @bxyu-nvidia in #761
feat: CPU Profiling by @bxyu-nvidia in #763
Fix offending file name by @bxyu-nvidia in #766
Remove requirement for cryptographic signature on commits by @Kipok in #767
fix: multi-environment-training agent_ref by @cmunley1 in #788
fix: remove newtonbench readme note on reasoning parser by @cmunley1 in #786
fix: add model server config to ng_prepare_data in docs by @cmunley1 in #787
Rollout collection tutorial fixes by @cwing-nvidia in #790
docs: align tutorial time by @cmunley1 in #791
docs: Move environment best practices from contributing to environment tutorials section by @bxyu-nvidia in #785
fix: typos in verifiers agent readme by @cmunley1 in #755
docs: clarify rollout vs trajectory definitions by @cwing-nvidia in #798
fix: include all environments in docs by @fsiino-nvidia in #757
Improve concepts docs and add docs for agent and resources server by @cwing-nvidia in #796
docs: Provide guidance on NeMo RL files in GRPO tutorial by @bxyu-nvidia in #802
Add tutorial links on docs home by @cwing-nvidia in #805
docs: Fixes to Create Training Environment tutorial by @bxyu-nvidia in #806
Explain what the test script in NeMo RL validates by @cwing-nvidia in #809
docs: Add determinism tip by @bxyu-nvidia in #807
feat: Rollout collection fixes by @bxyu-nvidia in #795
docs: remove server components summary from homepage by @cwing-nvidia in #810
docs: Improve vLLM Model Server docs by @bxyu-nvidia in #811
feat: E2E rollout collection with LocalVLLMModel by @bxyu-nvidia in #762
fix: pipecleaning 0.1.1 envs for rl by @fsiino-nvidia in #768
docs: move Multi-Environment Training to Training Tutorials by @cwing-nvidia in #817
feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. by @ffrujeri in #674
feat: Support vLLM>=0.16.0 reasoning field by @bxyu-nvidia in #816
feat: per task chat template and extra body args by @cmunley1 in #672
bugfix: Do not crash with json decoding error by @activatedgeek in #770
fix: reward profiling not require usage by @cmunley1 in #824
feat: Upstream Nemotron 3 Super envs by @bxyu-nvidia in #825
feat: Upstream Nemotron 3 Super Part 2 by @bxyu-nvidia in #826
fix: allow nested responses create params overrides by @cmunley1 in #827
pypi compatibility by @cmunley1 in #649
docs: faq monotonic trajectory by @cmunley1 in #613
docs: Slightly improve create training environment by @bxyu-nvidia in #831
docs: redirect sweep by @lbliii in #804
docs: undo trl until stable by @cmunley1 in #832
docs: consistent resources server naming by @cmunley1 in #836
docs: pin working unsloth version by @cmunley1 in #835
Swap readme table columns by @fsiino-nvidia in #853

New Contributors

@atefehsz made their first contribution in #561
@roclark made their first contribution in #582
@stephencge made their first contribution in #563
@ananthsub made their first contribution in #574
@3mei made their first contribution in #648
@srogawski-nvidia made their first contribution in #667
@yashaswikarnati made their first contribution in #680
@jfarris-nvidia made their first contribution in #734
@terrykong made their first contribution in #753
@Kelvin0110 made their first contribution in #650
@Kipok made their first contribution in #767
@activatedgeek made their first contribution in #770

Full Changelog: https://github.com/NVIDIA-NeMo/Gym/commits/v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Choose a tag to compare

Sorry, something went wrong.