Skip to content

Add benchmark budget calibration workflow#168

Merged
AbdelStark merged 1 commit intomainfrom
issue-146-budget-calibration
May 5, 2026
Merged

Add benchmark budget calibration workflow#168
AbdelStark merged 1 commit intomainfrom
issue-146-budget-calibration

Conversation

@AbdelStark
Copy link
Copy Markdown
Owner

Problem

Benchmark budgets need a reviewable calibration path from preserved baseline reports before maintainers change release budget files.

Closes #146.

Changes

  • Added worldforge.benchmark_calibration to read preserved benchmark JSON reports, record source report SHA-256 digests, capture baseline context, and generate loadable candidate budget files.
  • Added scripts/calibrate_benchmark_budgets.py to write budget-calibration.json, candidate-budgets.json, and a Markdown review report without modifying existing budget files.
  • Added regression coverage for candidate generation, review diffs, malformed inputs, script execution, and preservation of existing budget failure behavior.
  • Documented the threshold-loosening review rule in benchmarking, operations, playbooks, the claim evidence map, changelog, and the WF-B6 roadmap tracker.

Validation

  • uv lock --check
  • uv run ruff check src tests examples scripts
  • uv run ruff format --check src tests examples scripts
  • uv run python scripts/generate_provider_docs.py --check
  • uv run pytest tests/test_benchmark.py tests/test_benchmark_budget_calibration.py -q (58 passed)
  • uv run pytest tests/test_docs_site.py tests/test_benchmark_budget_calibration.py tests/test_benchmark.py -q (73 passed)
  • uv run mkdocs build --strict
  • uv run pytest (758 passed)
  • uv run --extra harness pytest --cov=src/worldforge --cov-report=term-missing --cov-fail-under=90 (758 passed, 90.10% coverage)
  • bash scripts/test_package.sh (679 passed, 79 skipped)
  • uv build --out-dir dist --clear --no-build-logs

@AbdelStark AbdelStark merged commit 23a1c4a into main May 5, 2026
8 checks passed
@AbdelStark AbdelStark deleted the issue-146-budget-calibration branch May 5, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WF-B6: Calibrate benchmark budgets from preserved baselines

1 participant