Skip to content

fix: PositionalEncoding Shape Mismatch on Odd Dimensions#609

Open
agam263 wants to merge 1 commit intogc-os-ai:mainfrom
agam263:fix-pe-odd-dimension
Open

fix: PositionalEncoding Shape Mismatch on Odd Dimensions#609
agam263 wants to merge 1 commit intogc-os-ai:mainfrom
agam263:fix-pe-odd-dimension

Conversation

@agam263
Copy link
Copy Markdown

@agam263 agam263 commented May 1, 2026


Fix: PositionalEncoding Shape Mismatch on Odd Dimensions

fix #607

📖 Summary

This PR resolves a persistent runtime crash in the PositionalEncoding layer that occurred whenever the model dimension (d_model) was an odd number. The fix ensures that the layer correctly handles arbitrary embedding dimensions, making the AptaTrans architecture more flexible and robust.

🔍 Technical Root Cause

The PositionalEncoding layer generates fixed sinusoids to inject positional information into embeddings. The implementation uses a frequency-divisor vector (div_term) that is shared between sine and cosine operations:

  1. Index Split: Sine waves are assigned to even indices (0, 2, 4...) and cosine waves to odd indices (1, 3, 5...).
  2. The Mismatch:
    • The size of div_term is calculated based on torch.arange(0, d_model, 2), which has a length of ceil(d_model / 2).
    • If d_model is 128 (even):
      • Even indices: 64 slots
      • Odd indices: 64 slots
      • div_term: 64 values (No error)
    • If d_model is 127 (odd):
      • Even indices: 64 slots
      • Odd indices: 63 slots
      • div_term: 64 values
  3. The Crash: The assignment pe[0, :, 1::2] = torch.cos(position * div_term) fails for odd d_model because PyTorch cannot fit 64 values into a tensor with 63 slots.

🛠️ Proposed Changes

1. Corrected Frequency Mapping

Adjusted the cosine assignment to correctly slice the div_term to match the number of available odd-indexed slots.

  • Logic: Used div_term[: d_model // 2] to ensure the number of frequencies exactly matches the number of odd positions, regardless of whether d_model is even or odd.

2. Dimensionality Regression Tests

Introduced pyaptamer/aptatrans/tests/test_pe_robustness.py to safeguard against future regressions:

  • Odd/Even Verification: Tests both standard (even) and non-standard (odd) dimensions.
  • Extreme Edge Cases: Verified stability for minimal dimensions (e.g., d_model=1).

⚠️ Impact & Risks

  • Architectural Flexibility: Developers can now experiment with any embedding size without the model crashing during the initialization of the Transformer blocks.
  • Stability: This is a safe change that does not modify the output for existing models where d_model is even.

✅ Verification Results

  • Unit Tests: pytest pyaptamer/aptatrans/tests/test_pe_robustness.py -> Passed.
  • Inference Stability: Verified that the change preserves identical output values for even-dimension embeddings.

@direkkakkar319-ops
Copy link
Copy Markdown
Contributor

hi @agam263 your pr looks similar to #527

probably correct as you patched on the older file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]Positional Encoding Shape Mismatch on Odd Dimensions

2 participants