policy: add HistoryBuffer for past-observation context#199
Open
lucas-maes wants to merge 1 commit intomainfrom
Open
policy: add HistoryBuffer for past-observation context#199lucas-maes wants to merge 1 commit intomainfrom
lucas-maes wants to merge 1 commit intomainfrom
Conversation
Introduce HistoryBuffer, a per-env ring buffer over batched info dicts, used by WorldModelPolicy when history_len > 1 to feed strided history into the planner. Actions are aggregated as macro-blocks of length action_block via the block_keys argument so the planner sees one block per stride point. Auto-derived history buffer max_len uses history_len * action_block, which is the smallest size that yields history_len full action blocks (the strided formula was one short for block keys). _prepare_info now runs before the buffer append so processed tensors are stored — avoids re-applying the action scaler to block-aggregated shapes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HistoryBuffer: per-env ring buffer over batched info dicts, with strided history retrieval and macro-block aggregation for action keys (block_keys).WorldModelPolicynow maintains one whenhistory_len > 1and feeds strided history into the planner. Auto-derivedmax_len = history_len * action_block— the smallest size that yieldshistory_lenfull action blocks (the strided formula was one short for block keys)._prepare_inforuns before the buffer append, so the buffer stores already-processed tensors. Avoids re-applying the action scaler to block-aggregated shapes (which previously raisedX has 10 features, but StandardScaler is expecting 2 features as input).docs/api/buffer.mdpage wired into nav; short "Observation history" subsection inquick_start.md.Test plan
pytest tests/(733 passed, 6 skipped)mkdocs build(no new warnings)history_len > 1, action_block > 1to confirm action history reacheshistory_lenblocks (previously capped athistory_len - 1).🤖 Generated with Claude Code