Make prompt dataloader `num_workers`/`persistent_workers` configurable by jamesbraza · Pull Request #1746 · NovaSky-AI/SkyRL

jamesbraza · 2026-06-03T22:36:05Z

Adds a data.dataloader config section exposing num_workers and persistent_workers(previously hardcoded) with backward-compatible defaults.

Closes #1744.

gemini-code-assist

Code Review

This pull request introduces a new DataLoaderConfig to configure num_workers and persistent_workers for training and evaluation dataloaders, replacing previous hardcoded values. It also adds validation rules and unit tests to verify the configuration resolution. The reviewer recommends disabling persistent_workers for evaluation dataloaders (is_train=False) to prevent unnecessary resource retention and potential delays in releasing worker processes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

The prompt DataLoader's num_workers and persistent_workers were hardcoded (num_workers auto-derived from whether the inference HTTP endpoint is on), so a run could not opt into in-process loading. Add a data.dataloader config section exposing both, documented via field metadata. num_workers defaults to None and is auto-derived in SkyRLTrainConfig.__post_init__ (0 with the HTTP endpoint, else 8), preserving prior behavior; persistent_workers defaults to False. __post_init__ validates the resolved values: persistent_workers requires num_workers > 0, and num_workers must be None or >= 0. Setting num_workers=0 loads the in-memory dataset in-process, avoiding the worker respawn (and venv re-import it triggers) at every epoch boundary that can hang silently when a shared filesystem such as NFS is overwhelmed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

build_dataloader now reads the resolved data.dataloader.num_workers and persistent_workers instead of hardcoding the worker count and deriving the multiprocessing context from whether the inference HTTP endpoint is on. The spawn start method is derived from the effective worker count (spawn only when num_workers > 0), which preserves the prior default and makes num_workers=0 valid. Resolves NovaSky-AI#1744 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread skyrl/train/utils/trainer_utils.py

jamesbraza and others added 2 commits June 3, 2026 15:52

jamesbraza force-pushed the dataloader-worker-config branch from d4ae38f to 164987d Compare June 3, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make prompt dataloader `num_workers`/`persistent_workers` configurable#1746

Make prompt dataloader `num_workers`/`persistent_workers` configurable#1746
jamesbraza wants to merge 2 commits into
NovaSky-AI:mainfrom
EdisonScientific:dataloader-worker-config

jamesbraza commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesbraza commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant