Skip to content

v0.2.0

Choose a tag to compare

@bxyu-nvidia bxyu-nvidia released this 11 Mar 15:03
· 2 commits to v0.2.0 since this release
3e587db

Release Summary

NeMo Gym v0.2.0 ships alongside the NVIDIA Nemotron 3 Super model release, open sourcing the RL environments and corresponding datasets used during training. Highlights:

  • 17 new training environments across coding, math, science, reasoning, agentic tasks, and safety.
  • Integrations with Future House Aviary, Open-Thought Reasoning Gym, and Prime Intellect Verifiers let you use environments from these libraries directly within NeMo Gym
  • End-to-end rollout collection with a locally managed vLLM server
  • Install directly from PyPI with pip install nemo-gym

First-Time Contributors

We welcomed 15 new contributors to this release! Here are a few highlights:

  • @sidnarayanan added the Aviary integration to enable training on any Aviary environment, a library of interactive RL environments spanning math, science, biology, and more
  • @3mei added the text-to-SQL environment to generate SQL queries from natural language across multiple SQL dialects
  • @Kelvin0110 added the NewtonBench environment to discover scientific laws through interactive experimentation

Thank you to all the new contributors for helping make NeMo Gym better!

Major Features & Improvements

New Environments

  • Added 17 new resources servers spanning:
    • Coding: Text to SQL (#648), SWE RL Gen (#561), SWE RL LLM Judge (#561)
    • Math: Lean4 Mathematical Proofs (#563)
    • Science: Aviary (#55), NewtonBench (#650)
    • Reasoning: MultiChallenge (#654), ARC-AGI (#105), Reasoning Gym (#113)
    • Agent tasks: xLAM Function Calling (#262), Tavily Search (#825), Single Step Tool Use with Argument Comparison (#825), Terminus Judge (#594), NeMo Skills Tools (#571)
    • Safety: Jailbreak Detection (#825), Over Refusal Detection (#825)
    • RLHF: Generative Reward Model Compare (#674)
  • Added 5 new agent servers: Aviary agent (#55), proof refinement agent (#563), SWE agents (#343), tool simulation agent (#826), and verifiers agent (#573)

Environment Library Integrations
Combine environments from other libraries with NeMo Gym environments

  • Future House Aviary (#55, #590)
  • Open-Thought Reasoning Gym (#113)
  • Prime Intellect Verifiers (#573)

Model Serving

  • Local vLLM model server with end-to-end rollout collection without an external API (#558, #762)
  • vLLM 0.16+ support for the reasoning field in responses (#816)
  • VLLMModel chat template kwargs support (#538, #636)
  • Per-task chat template and extra body args, enabling per-task control of reasoning mode and thinking budget (#672)

Rollout Collection & Profiling

  • New ng_reward_profile command to compute per-task pass rates and aggregate metrics (#83, #621)
  • CPU profiling for rollout performance analysis (#763)
  • Add option for seeding on num_repeats for rollouts (#740)

Infrastructure & Developer Experience

  • PyPI compatibility: install via pip install nemo-gym (#649)
  • Dry run mode: ng_run +dryrun=true to validate configs and install environments without starting servers (#743)
  • ng_status command to list running servers and their health (#290)
  • Server stdout/stderr redirection with server name prefixes (#703)
  • FastAPI worker support for higher throughput across multiple workers (#566)

Model Recipes

  • Nemotron 3 Nano training recipe (#699)
  • Nemotron 3 Super training recipe (#863)

Deprecation Notices

  • Deprecated ng_viewer due to a Gradio security vulnerability. We plan to revisit rollout viewing with a more robust solution in a future release.

Bug Fixes

  • Fixed 0.1.1 environments to work correctly with RL training pipelines (#768)
  • Fixed crash when server receives malformed JSON during rollout collection (#770)
  • Fixed dry run mode failing (#746)
  • Fixed nested responses_create_params overrides not merging correctly from CLI (#827)
  • Fixed ng_prepare_data failing when multiple environments define overlapping metrics (#738)
  • Fixed reward profiling failing when model response doesn't include usage stats (#824)
  • Fixed NeMo-Skills python tool to use HTTP calls instead of subprocess execution (#606)
  • Bumped Pillow and other packages to address security vulnerabilities (#667, #739)
  • ng_dump_config now redacts API key values from output (#567)

Documentation

  • New training tutorials: Unsloth training with NeMo Gym, multi-environment training
  • New environment tutorials: creating a training environment, custom data preparation, integrating external environment libraries, environment best practices
  • Model recipes: reproduce the training for Nemotron 3 Nano and Nemotron 3 Super
  • Concepts & architecture overhaul: rewrote concepts docs, added architecture diagrams, added agent server and resources server docs
  • Training approaches: added training approaches docs page covering SFT, RL (GRPO), and RLVR
  • Ecosystem page: revamped ecosystem page with training framework integrations and environment library integrations
  • Infrastructure: added SWE RL infrastructure case study, deployment topology docs
  • Quality pass: redirect sweep, style guide sweep, consistent naming, FAQ additions, broken link fixes

Looking Ahead

  • VLM support: add support for VLM models and environments with images, e.g. browser environments and computer use agent (CUA) environments
  • Benchmark environments: add popular OSS environments such as OSWorld, Tau Bench, BrowseComp
  • Integrate existing agents: integrate popular existing agents, e.g. coding harnesses, as well as agents developed via popular agent frameworks, e.g. LangGraph
  • Environment tutorials: incorporate more complex agentic loops during training such as multi-turn conversation and user modeling

Release Assets

GitHub Release: https://github.com/NVIDIA-NeMo/Gym/releases/tag/v0.2.0
Container: nvcr.io/nvidia/nemo-rl:v0.5.0.nemotron_3_super

What's Changed

New Contributors

Full Changelog: https://github.com/NVIDIA-NeMo/Gym/commits/v0.2.0