Use cases, pain points, and background
Persona A: General model improvement
Goal: make a model generally smarter
Product: model
Agent Use: the agent is disposable scaffolding, a way to generate diverse training trajectories across many environments. This persona is largely agnostic about which agent harness the model eventually runs in at deployment time. They will primarily use generic multi-step agent and multi-turn agents during rollout collection.
Persona B: Agent-specific model improvement
Goal: make a model better in a specific harness
Product: model + agent
Agent Use: the user has a specific agent they care about; it's what their product runs on or what their users interact with. They want to train a model that performs well in that agent's specific patterns: its tools, its prompts, its interaction style, etc.
Description
Reference agents (Persona A)
Gym-native reference agents that work with many of our environments for rollout orchestration
- Multi-step (simple agent)
- Multi-turn
Agent specific integrations (Persona B)
For users who need to train a model for a specific agent(s)
Two flavors:
- Pre-built: e.g. OpenClaw, Hermes, OpenHands, KiloCode,
Cline
- Bring your own: integration path for custom agents
built with framework (e.g. LangGraph, CrewAI), or raw Python
Child issues
Reference agents
Pre-built agent integrations
Framework integration paths
Docs
Use cases, pain points, and background
Persona A: General model improvement
Goal: make a model generally smarter
Product: model
Agent Use: the agent is disposable scaffolding, a way to generate diverse training trajectories across many environments. This persona is largely agnostic about which agent harness the model eventually runs in at deployment time. They will primarily use generic multi-step agent and multi-turn agents during rollout collection.
Persona B: Agent-specific model improvement
Goal: make a model better in a specific harness
Product: model + agent
Agent Use: the user has a specific agent they care about; it's what their product runs on or what their users interact with. They want to train a model that performs well in that agent's specific patterns: its tools, its prompts, its interaction style, etc.
Description
Reference agents (Persona A)
Gym-native reference agents that work with many of our environments for rollout orchestration
Agent specific integrations (Persona B)
For users who need to train a model for a specific agent(s)
Two flavors:
Cline
built with framework (e.g. LangGraph, CrewAI), or raw Python
Child issues
Reference agents
Pre-built agent integrations
Framework integration paths
Docs
agent-server/index.md(should provide context and summarize supported integrations)agent-server/multi-step-agent.mdAdd multi-turn agent server and tic-tac-toe environment #996agent-server/multi-turn-agent.mdAdd multi-turn agent server and tic-tac-toe environment #996