Democratizing Reinforcement Learning for LLMs
-
Updated
Apr 17, 2026 - Python
Democratizing Reinforcement Learning for LLMs
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
Official repository of DARE: Diffusion Large Language Models Alignment and Reinforcement Executor
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
qwen3-base family of models RL on gsm8k using verl, is there an RL power law on downstream tasks?
PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory
Using automated curriculum learning to enhance LLM's RL training process.
[ACL 2026 main] DGPO: Distillation-Guided Policy Optimization for Preserving Agentic RAG Capabilities
A list of uv environments templates for LLM development.
AVR: Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
Sample for Fine-Tuning LLMs & VLMs
Training Script using VeRL for multi-turn GPRO w/ MCP tool-calling
[ACL 2026 Findings] SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context
Add a description, image, and links to the verl topic page so that developers can more easily learn about it.
To associate your repository with the verl topic, visit your repo's landing page and select "manage topics."