Add entity-level HDFStore output format alongside h5py

## Motivation

The API v2 alpha and `policyengine` package's `PolicyEngineUSDataset` require **entity-level Pandas HDFStore format** (one table per entity: person, household, tax_unit, spm_unit, family, marital_unit). Currently, `-us-data` publishes only **variable-centric h5py format** (`variable/year → array`).

Converting between these formats via `create_datasets()` is extremely slow (~1hr+ per state) because it routes every variable through `sim.calculate()`, invoking the full simulation engine's dependency resolution for each variable × each year.

The UK avoids this: `-uk-data` publishes entity-level HDFStore directly, and `policyengine-uk` has `extend_single_year_dataset()` which uprates DataFrames via simple multiplication — no simulation engine needed.

## Changes

### 1. HDFStore serialization in `stacked_dataset_builder.py`

After the existing h5py serialization, `create_sparse_cd_stacked_dataset()` now also:

- **Splits `combined_df` into entity DataFrames** — classifies each variable by entity using `system.variables[var].entity.key`, deduplicates group entities by entity ID
- **Builds an uprating manifest** — records each variable's entity and uprating parameter path (from `system.variables[var].uprating`)
- **Saves as HDFStore** — `.hdfstore.h5` suffix alongside the existing `.h5` file

### 2. Upload pipeline in `publish_local_area.py`

HDFStore files are uploaded to dedicated subdirectories:
- `states_hdfstore/`
- `districts_hdfstore/`
- `cities_hdfstore/`

Both GCS and HuggingFace uploads are handled.

### 3. Comparison test

`tests/test_format_comparison.py` validates both formats contain identical data:

- Compares all ~183 variables between h5py and HDFStore
- Handles person-level (direct comparison) vs group-entity (unique value comparison)
- Tests manifest presence and entity table completeness
- Runnable as pytest or standalone CLI

```bash
pytest test_format_comparison.py --h5py-path NV.h5 --hdfstore-path NV.hdfstore.h5
# or
python -m policyengine_us_data.tests.test_format_comparison --h5py-path NV.h5 --hdfstore-path NV.hdfstore.h5
```

### HDFStore structure

```
/person          → DataFrame (all person-entity vars + entity membership IDs)
/household       → DataFrame (deduplicated by household_id)
/tax_unit        → DataFrame (deduplicated by tax_unit_id)
/spm_unit        → DataFrame (deduplicated by spm_unit_id)
/family          → DataFrame (deduplicated by family_id)
/marital_unit    → DataFrame (deduplicated by marital_unit_id)
/_variable_metadata → DataFrame (variable, entity, uprating columns)
/_time_period    → Series (base year)
```

## Future work

`policyengine-us` will add `extend_single_year_dataset()` to consume the HDFStore directly, enabling instant year projection without the simulation engine. The embedded uprating manifest makes each file self-describing and allows fallback when the package version doesn't exactly match the version used to build the dataset.

## Branch

`add-hdfstore-output`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add entity-level HDFStore output format alongside h5py #567

Motivation

Changes

1. HDFStore serialization in `stacked_dataset_builder.py`

2. Upload pipeline in `publish_local_area.py`

3. Comparison test

HDFStore structure

Future work

Branch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add entity-level HDFStore output format alongside h5py #567

Description

Motivation

Changes

1. HDFStore serialization in stacked_dataset_builder.py

2. Upload pipeline in publish_local_area.py

3. Comparison test

HDFStore structure

Future work

Branch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. HDFStore serialization in `stacked_dataset_builder.py`

2. Upload pipeline in `publish_local_area.py`