Skip to content

policy: add HistoryBuffer for past-observation context#199

Open
lucas-maes wants to merge 1 commit intomainfrom
buffer
Open

policy: add HistoryBuffer for past-observation context#199
lucas-maes wants to merge 1 commit intomainfrom
buffer

Conversation

@lucas-maes
Copy link
Copy Markdown
Collaborator

Summary

  • New HistoryBuffer: per-env ring buffer over batched info dicts, with strided history retrieval and macro-block aggregation for action keys (block_keys).
  • WorldModelPolicy now maintains one when history_len > 1 and feeds strided history into the planner. Auto-derived max_len = history_len * action_block — the smallest size that yields history_len full action blocks (the strided formula was one short for block keys).
  • _prepare_info runs before the buffer append, so the buffer stores already-processed tensors. Avoids re-applying the action scaler to block-aggregated shapes (which previously raised X has 10 features, but StandardScaler is expecting 2 features as input).
  • Docs: new docs/api/buffer.md page wired into nav; short "Observation history" subsection in quick_start.md.

Test plan

  • pytest tests/ (733 passed, 6 skipped)
  • mkdocs build (no new warnings)
  • Eval with history_len > 1, action_block > 1 to confirm action history reaches history_len blocks (previously capped at history_len - 1).

🤖 Generated with Claude Code

Introduce HistoryBuffer, a per-env ring buffer over batched info dicts,
used by WorldModelPolicy when history_len > 1 to feed strided history
into the planner. Actions are aggregated as macro-blocks of length
action_block via the block_keys argument so the planner sees one block
per stride point.

Auto-derived history buffer max_len uses history_len * action_block,
which is the smallest size that yields history_len full action blocks
(the strided formula was one short for block keys).

_prepare_info now runs before the buffer append so processed tensors
are stored — avoids re-applying the action scaler to block-aggregated
shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant