Skip to content

RL-Trained LLM for End-to-End Data Recipe Generation #1760

@arhamm1

Description

@arhamm1

What:
Add integration with DataChef (arXiv:2602.11089, Feb 2026) — a 32B LLM trained via RL to generate complete end-to-end NeMo Curator pipeline specifications (synthesis strategy, filter chain, mixing ratios) given a target benchmark and base model. Exposes a DataChefRecipeGenerator that outputs a valid NeMo Curator config YAML.

Why:
DataChef achieves 66.7 on AIME'25 for a Qwen3-1.7B math-adapted model — surpassing the official Qwen3 post-training checkpoint for the same base model. It matches human expert curation across 6 held-out tasks. The RL-trained recipe generator eliminates the manual trial-and-error of pipeline design, which is the primary bottleneck in practice.

Definition of Done:

  • DataChefRecipeGenerator under nemo_curator/recipe/
  • Interface: accepts target_benchmark: str, base_model_id: str, available_data_sources: List[str], compute_budget_tokens: int
  • Calls DataChef API (hosted or local) with structured prompt encoding the above
  • Parses DataChef output into a valid NeMo Curator pipeline config YAML
  • Config validation: runs a dry-run of the generated pipeline on 1M token sample before full execution
  • Proxy reward integration: evaluates generated recipe quality on a fast proxy before committing to full run
  • Fallback: if DataChef unavailable, outputs a best-practice template config for the domain
  • Tutorial: generate and execute a math-specialization recipe using DataChef → NeMo Curator pipeline
  • Integration test: generated YAML is parseable and passes NeMo Curator config validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions