Skip to content

Add SSTB QBI split inputs to us-data#701

Open
PavelMakarchuk wants to merge 5 commits intomainfrom
draft/sstb-qbi-inputs
Open

Add SSTB QBI split inputs to us-data#701
PavelMakarchuk wants to merge 5 commits intomainfrom
draft/sstb-qbi-inputs

Conversation

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

Summary

Add the SSTB split inputs needed by the parallel policyengine-us QBID changes.

This PR exposes:

  • sstb_self_employment_income
  • sstb_w2_wages_from_qualified_business
  • sstb_unadjusted_basis_qualified_property

in the PUF/calibration pipeline using the existing business_is_sstb flag.

Implementation

  • Split self_employment_income into non-SSTB and SSTB pieces using the current all-or-nothing business_is_sstb indicator.
  • Expose SSTB allocable W-2 wages and UBIA fields alongside the existing aggregate QBI guard-rail variables.
  • Add the new fields to IMPUTED_VARIABLES so they flow through PUF-based calibration.
  • Add targeted calibration-list tests and document the new variables in the appendix.

Important limitation

This does not infer mixed SSTB and non-SSTB allocations within the same record. The current data pipeline only has an all-or-nothing SSTB flag, so mixed-category wage/UBIA allocation remains approximate until more granular source data or imputation is added.

Verification

  • PYTHONDONTWRITEBYTECODE=1 python -m py_compile policyengine_us_data/datasets/puf/puf.py policyengine_us_data/calibration/puf_impute.py policyengine_us_data/tests/test_calibration/test_puf_impute.py
  • pytest -q policyengine_us_data/tests/test_calibration/test_puf_impute.py (blocked here because this environment is missing torch, imported by the repo test setup)

@PavelMakarchuk PavelMakarchuk marked this pull request as ready for review April 10, 2026 14:21
Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes.

  • GitHub reports mergeable=CONFLICTING and mergeStateStatus=DIRTY; a local git merge-tree --write-tree --name-only HEAD FETCH_HEAD confirms a content conflict in policyengine_us_data/datasets/puf/puf.py. That explains the missing CI on this PR.
  • The PUF split moves SSTB Schedule C income out of self_employment_income and into sstb_self_employment_income, but SOI replication still builds business_net_profits and business_net_losses from only pe("self_employment_income") in policyengine_us_data/utils/soi.py. After this change, SSTB Schedule C profits/losses disappear from those total business-income comparisons. Please aggregate self_employment_income + sstb_self_employment_income anywhere the statistic is total Schedule C/self-employment income, or keep a separate total variable for validation.

Verified locally: uv run pytest policyengine_us_data/tests/test_calibration/test_puf_impute.py -q passed: 14 tests.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pipeline-diagrams Error Error Apr 10, 2026 8:54pm

Request Review

Copy link
Copy Markdown
Contributor

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after pushing the main merge/conflict resolution and SOI total Schedule C fix. Local verification: tests/unit/calibration/test_calibration_puf_impute.py and tests/unit/test_soi_utils.py pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants