v0.2.0
Release Summary
NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:
- 17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
- Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
- End-to-end rollout collection with a locally managed vLLM server
- Install directly from PyPI with pip install nemo-gym
First-Time Contributors
We welcomed 15 new contributors to this release! Here are a few highlights:
- @sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
- @3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
- @Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation
Thank you to all the new contributors for helping make NeMo Gym better!
Major Features & Improvements
New Environments
- Added 17 new resources servers spanning:
- Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
- Math: Lean4 Mathematical Proofs (#563)
- Science: Aviary (#55), NewtonBench (#650)
- Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
- Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
- Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
- RLHF: Generative Reward Model Compare (#674)
- Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)
Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments
Model Serving
- Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
- vLLM 0.16+ support for the reasoning field in responses (#816)
- VLLMModel chat template kwargs support (#538, #636)
- Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)
Rollout Collection & Profiling
- New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
- CPU profiling for rollout performance analysis (#763)
- Add option for seeding on num_repeats for rollouts (#740)
Infrastructure & Developer Experience
- PyPI compatibility: install via
pip install nemo-gym(#649) - Dry run mode:
ng_run +dryrun=trueto validate configs and install environments without starting servers (#743) ng_statuscommand to list running servers and their health (#290)- Server stdout/stderr redirection with server name prefixes (#703)
- FastAPI worker support for higher throughput across multiple workers (#566)
Model Recipes
Deprecation Notices
- Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.
Bug Fixes
- Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
- Fixed crash when server receives malformed JSON during rollout collection (#770)
- Fixed dry run mode failing (#746)
- Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
- Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
- Fixed reward profiling failing when model response doesn't include usage stats (#824)
- Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
- Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
- ng_dump_config now redacts API key values from output (#567)
Documentation
- New training tutorials: Unsloth training with NeMo Gym, multi-environment training
- New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
- Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
- Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
- Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
- Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
- Infrastructure: added SWE RL infrastructure case study, deployment topology docs
- Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes
Looking Ahead
- VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
- Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
- Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
- Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling
Release Assets
GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super
What's Changed
- Bump to v0.2.0 by @bxyu-nvidia in #510
- reasoning-gym resource server by @cmunley1 in #113
- docs: redirect setup by @lbliii in #513
- docs: Miscellaneous GRPO tutorial fixes by @bxyu-nvidia in #512
- docs settings update by @lbliii in #525
- Debug server package versions by @fsiino-nvidia in #406
- List running server health and status by @fsiino-nvidia in #290
- VLLMModel supports chat template kwargs by @pjin-nvidia in #538
- Salesforce xlam-function-calling-60k resources server by @cmunley1 in #262
- python flag for colab venv installation by @cmunley1 in #526
- add unsloth and trl to docs by @cmunley1 in #536
- docs: remove trl docs by @cmunley1 in #543
- Remove PlainTextResponse response_class by @fsiino-nvidia in #544
- Increase test_train_data_utils coverage by @fsiino-nvidia in #553
- Generic Aviary integration by @sidnarayanan in #55
- ng_dump_config sanity removes API key values by @bxyu-nvidia in #567
- Feat: Add reward profiling and fractional reward by @abukharin-nv in #83
- Single step environments for SWE tasks by @atefehsz in #561
- NL2Bash using Equivalency Judge by @kbhardwaj-nvidia in #569
- enh: use agent ref from data in rollouts by @gwarmstrong in #568
- FastAPI worker support by @bxyu-nvidia in #566
- Local vLLM model and other misc improvements by @bxyu-nvidia in #558
- Update math_with_judge artifact paths by @roclark in #582
- Add Hugging Face identifier for coding resource by @roclark in #583
- updating swerl_gen config by @atefehsz in #588
- NeMo Skills Tools Resource by @gwarmstrong in #571
- Add math_formal_lean resource server for Lean4 proof verification by @stephencge in #563
- Aviary rollouts can be configured to return transitions or not by @sidnarayanan in #590
- openhands by @sdevare-nv in #343
- Terminus (judge only) Slicing Environment by @kbhardwaj-nvidia in #594
- 0.2.0 new doc stubs by @lbliii in #581
- Add tutorial for custom data preparation by @roclark in #596
- Fix invalid ref in docs build by @bxyu-nvidia in #604
- Fix Nemo-Skills python tool to use http by @gwarmstrong in #606
- Expanding Terminus Slicing PR by @kbhardwaj-nvidia in #597
- Updating swerl_gen to support custom parsers by @atefehsz in #624
- docs: unsloth fix by @cmunley1 in #622
- arc-agi resource server by @cmunley1 in #105
- arc readme by @cmunley1 in #634
- VLLMModel: Add chat template kwargs on tokenize request by @bxyu-nvidia in #636
- [docs] Add architecture diagrams by @ananthsub in #574
- feat: reward profiling by @cmunley1 in #621
- docs: issue 626 by @lbliii in #638
- ci: Enable the test job to build a wheel and publiish to test.pypi by @chtruong814 in #651
- v1 of text-to-sql by @3mei in #648
- Yev/text to sql v1.1 by @3mei in #653
- Upstream Super 3 dev 20260205 by @bxyu-nvidia in #654
- ns tools stability by @gwarmstrong in #658
- Bump package versions to fix security vulnerabilities by @srogawski-nvidia in #667
- docs: tutorial section standarization by @lbliii in #656
- docs: add responses api model config to ng prepare data in docs by @cmunley1 in #678
- feat: always track aggregate tool call timing in ns_tools by @gwarmstrong in #668
- Add config-based venv skip when .venv is present by @yashaswikarnati in #680
- Add single GPU training instructions to tutorial by @srogawski-nvidia in #681
- Text-to-SQL: harden for scale by @3mei in #677
- feat: Global config dict and find_open_port respect port ranges by @bxyu-nvidia in #685
- remove rfc section by @lbliii in #686
- docs: Multi verifier rollouts by @bxyu-nvidia in #682
- docs: revamp ecosystem page, restructure training tutorials by @cwing-nvidia in #683
- docs: Add topology docs and link to training framework integration by @ananthsub in #693
- docs: Integration protocol by @bxyu-nvidia in #671
- docs: SWE RL Infra case study by @bxyu-nvidia in #695
- docs: Clean SWE RL Case study by @bxyu-nvidia in #696
- docs: move arch page by @cwing-nvidia in #701
- docs: remove placeholder agent server pages by @cwing-nvidia in #702
- docs: remove draft markers from data toctree entries by @cwing-nvidia in #704
- docs: Fix product naming and broken link in infrastructure docs by @cwing-nvidia in #705
- Simplify README ecosystem section to match docs page by @cwing-nvidia in #707
- Cwing/hpundt/readme by @cwing-nvidia in #708
- Redirect server stdout/stderr by default, and apply server prefix during server venv setup by @pjin-nvidia in #703
- Update README ecosystem links to match docs tutorial pages by @cwing-nvidia in #709
- docs: VLLMModel by @bxyu-nvidia in #697
- Remove model-recipes section from docs by @cwing-nvidia in #710
- docs: TRL integration by @cmunley1 in #602
- docs: Environment properties by @bxyu-nvidia in #713
- docs: Clean environment properties by @bxyu-nvidia in #714
- Environments overview cleanup by @cwing-nvidia in #715
- docs: Fix broken README links and clean up unsloth docs by @cwing-nvidia in #716
- Style Guide Sweep by @lbliii in #717
- docs: Remove first-training-run page and update next steps across Getting Started by @cwing-nvidia in #721
- docs: FAQ: inference.nvidia.com has no response diversity by @bxyu-nvidia in #718
- Remove Dataset viewer and Gradio dependency by @bxyu-nvidia in #726
- Remove extraneous MLFlow deps by @bxyu-nvidia in #727
- Add Nemotron 3 Nano 30B multi-node training tutorial by @srogawski-nvidia in #699
- feat: Prime Intellect verifiers integration by @cmunley1 in #573
- docs: Fix "Prepare and Validate Data" command by @bxyu-nvidia in #730
- docs: Fix broken link by @bxyu-nvidia in #731
- docs: Fix typo by @bxyu-nvidia in #732
- Add Claude Code skill for adding benchmarks by @jfarris-nvidia in #734
- Add CLAUDE.md for Claude Code onboarding by @jfarris-nvidia in #733
- Bump Pillow >= 12.1.1 by @bxyu-nvidia in #739
- feat: Add option for seeding on num repeats by @bxyu-nvidia in #740
- feat: ng_run Dry run support by @bxyu-nvidia in #743
- feat: Top level UV config by @bxyu-nvidia in #745
- feat: Fix dry run by @bxyu-nvidia in #746
- fix: ng prepare data metrics conflict by @cmunley1 in #738
- add agent ref to reasoning gym dataset by @cmunley1 in #752
- feat: install environments serially in dryrun (optimization) by @terrykong in #753
- Add NewtonBench Resource Server by @Kelvin0110 in #650
- feat: Rollout infra upgrades by @bxyu-nvidia in #761
- feat: CPU Profiling by @bxyu-nvidia in #763
- Fix offending file name by @bxyu-nvidia in #766
- Remove requirement for cryptographic signature on commits by @Kipok in #767
- fix: multi-environment-training agent_ref by @cmunley1 in #788
- fix: remove newtonbench readme note on reasoning parser by @cmunley1 in #786
- fix: add model server config to ng_prepare_data in docs by @cmunley1 in #787
- Rollout collection tutorial fixes by @cwing-nvidia in #790
- docs: align tutorial time by @cmunley1 in #791
- docs: Move environment best practices from contributing to environment tutorials section by @bxyu-nvidia in #785
- fix: typos in verifiers agent readme by @cmunley1 in #755
- docs: clarify rollout vs trajectory definitions by @cwing-nvidia in #798
- fix: include all environments in docs by @fsiino-nvidia in #757
- Improve concepts docs and add docs for agent and resources server by @cwing-nvidia in #796
- docs: Provide guidance on NeMo RL files in GRPO tutorial by @bxyu-nvidia in #802
- Add tutorial links on docs home by @cwing-nvidia in #805
- docs: Fixes to Create Training Environment tutorial by @bxyu-nvidia in #806
- Explain what the test script in NeMo RL validates by @cwing-nvidia in #809
- docs: Add determinism tip by @bxyu-nvidia in #807
- feat: Rollout collection fixes by @bxyu-nvidia in #795
- docs: remove server components summary from homepage by @cwing-nvidia in #810
- docs: Improve vLLM Model Server docs by @bxyu-nvidia in #811
- feat: E2E rollout collection with LocalVLLMModel by @bxyu-nvidia in #762
- fix: pipecleaning 0.1.1 envs for rl by @fsiino-nvidia in #768
- docs: move Multi-Environment Training to Training Tutorials by @cwing-nvidia in #817
- feat: Adds GenRM Response API Model with support for custom roles used in pairwise response comparison. by @ffrujeri in #674
- feat: Support vLLM>=0.16.0 reasoning field by @bxyu-nvidia in #816
- feat: per task chat template and extra body args by @cmunley1 in #672
- bugfix: Do not crash with json decoding error by @activatedgeek in #770
- fix: reward profiling not require usage by @cmunley1 in #824
- feat: Upstream Nemotron 3 Super envs by @bxyu-nvidia in #825
- feat: Upstream Nemotron 3 Super Part 2 by @bxyu-nvidia in #826
- fix: allow nested responses create params overrides by @cmunley1 in #827
- pypi compatibility by @cmunley1 in #649
- docs: faq monotonic trajectory by @cmunley1 in #613
- docs: Slightly improve create training environment by @bxyu-nvidia in #831
- docs: redirect sweep by @lbliii in #804
- docs: undo trl until stable by @cmunley1 in #832
- docs: consistent resources server naming by @cmunley1 in #836
- docs: pin working unsloth version by @cmunley1 in #835
- Swap readme table columns by @fsiino-nvidia in #853
New Contributors
- @atefehsz made their first contribution in #561
- @roclark made their first contribution in #582
- @stephencge made their first contribution in #563
- @ananthsub made their first contribution in #574
- @3mei made their first contribution in #648
- @srogawski-nvidia made their first contribution in #667
- @yashaswikarnati made their first contribution in #680
- @jfarris-nvidia made their first contribution in #734
- @terrykong made their first contribution in #753
- @Kelvin0110 made their first contribution in #650
- @Kipok made their first contribution in #767
- @activatedgeek made their first contribution in #770
Full Changelog: https://github.com/NVIDIA-NeMo/Gym/commits/v0.2.0