Skip to content

Instantiate metric recorders on rank 0 only#1072

Merged
uralik merged 1 commit intomainfrom
metrics
Feb 27, 2025
Merged

Instantiate metric recorders on rank 0 only#1072
uralik merged 1 commit intomainfrom
metrics

Conversation

@cbalioglu
Copy link
Copy Markdown
Contributor

This PR ensures that MetricRecorder instances are constructed only on rank 0. Some recorders such as WandbRecorder has the side effect of setting up their environment during construction which causes redundant/duplicate artifacts to be generated. The actual metric records are not affected though since we always write them on rank 0.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 27, 2025
Copy link
Copy Markdown
Contributor

@zyaoj zyaoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@uralik uralik merged commit 64af489 into main Feb 27, 2025
15 checks passed
@cbalioglu cbalioglu deleted the metrics branch February 27, 2025 20:17
uralik pushed a commit that referenced this pull request Apr 3, 2025
* Add Skywork Reward Model

* reorder

* working checkpoint with skywork

* unify VllmReward class

* cleanup wrap text

* actually not quite working. needs dp gather and scatter

* add gather scatter for vllm rm

* working checkpoint

* working checkpoint

* updates

* Instantiate metric recorders on rank 0 only (#1072)

* fix gather bug

* cleanup

* comment

* add grpo (runnable, but not increasing rewards yet

* log outputs

* cleannup

* rename

* merge

* fixes

* testing online dpo after merge

* bug fix

* fixing merge errors

* fix bugs

* merged with online_training

* update grpo

* cleanup

* fix grpo bug

* cleanup

* cleanup

* isort/black

* move vllm generate_rewards to its own function

* refactor how reward models use prompt_batch

* remove breakpoint

* working chkpt

* remove irrelevant skywork stuff

---------

Co-authored-by: jacklanchantin user <jacklanchantin@submit-2.fair-aws-h100-1.hpcaas>
Co-authored-by: Can Balioglu <cbalioglu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants