RL-Trained LLM for End-to-End Data Recipe Generation

What:           
Add integration with DataChef (arXiv:2602.11089, Feb 2026) — a 32B LLM trained via RL to generate complete end-to-end NeMo Curator pipeline specifications (synthesis strategy, filter chain, mixing ratios) given a target benchmark and base model. Exposes a DataChefRecipeGenerator that outputs a valid NeMo Curator config YAML.
                                                                                                                                                               
Why:            
DataChef achieves 66.7 on AIME'25 for a Qwen3-1.7B math-adapted model — surpassing the official Qwen3 post-training checkpoint for the same base model. It matches human expert curation across 6 held-out tasks. The RL-trained recipe generator eliminates the manual trial-and-error of pipeline design, which is the primary bottleneck in practice. 
                                                                                                                                                               
Definition of Done:
  - DataChefRecipeGenerator under nemo_curator/recipe/
  - Interface: accepts target_benchmark: str, base_model_id: str, available_data_sources: List[str], compute_budget_tokens: int
  - Calls DataChef API (hosted or local) with structured prompt encoding the above                                             
  - Parses DataChef output into a valid NeMo Curator pipeline config YAML                                                                                      
  - Config validation: runs a dry-run of the generated pipeline on 1M token sample before full execution                                                       
  - Proxy reward integration: evaluates generated recipe quality on a fast proxy before committing to full run                                                 
  - Fallback: if DataChef unavailable, outputs a best-practice template config for the domain                                                                  
  - Tutorial: generate and execute a math-specialization recipe using DataChef → NeMo Curator pipeline                                                         
  - Integration test: generated YAML is parseable and passes NeMo Curator config validation      

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL-Trained LLM for End-to-End Data Recipe Generation #1760

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RL-Trained LLM for End-to-End Data Recipe Generation #1760

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions