Skip to content

Vendor-neutral reference: NVFP4 vs OCP MXFP4 block-structure parameters (request for confirmation) #3105

@gHashTag

Description

@gHashTag

Hi TransformerEngine maintainers,

I am compiling a vendor-neutral numeric format catalog (84 formats, 13
families) with bit-exact conformance vectors. The catalog is open and
lives at https://github.com/gHashTag/t27. NVFP4 is on the near-term
roadmap (Track 2) but I would like to ground its row entry in
parameters confirmed by the upstream implementer rather than guessed
from blog posts. This issue is an information request, not a bug
report.

What I have so far

Based on the public NVIDIA developer blog
(https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/)
and code references in TransformerEngine, I have populated the
following parameter table for NVFP4 alongside its closest OCP MX
counterpart (MXFP4):

Parameter OCP MX MXFP4 NVIDIA NVFP4
Element layout S1E2M1 (4 bits) S1E2M1 (4 bits)
Block size 32 elements 16 elements
Scale format E8M0 (8-bit) FP8 E4M3 (8-bit)
Scale exponent bits 8 (pure exponent) 4
Scale mantissa bits 0 3
Scale dynamic range 2^-127 to 2^127 ~2^-9 to 448
Scale granularity per decade 1 (power-of-two) 8 (3-bit mantissa)
Bits/element including scale 4 + 8/32 = 4.25 4 + 8/16 = 4.50

The element layout S1E2M1 is bit-identical between the two formats;
they diverge at the block-and-scale level. Three structural
consequences follow:

  1. NVFP4 resolves intra-block dynamic range 8x more finely than MXFP4
    within its representable range (3-bit mantissa on FP8 E4M3 scale).
  2. NVFP4 cannot represent per-block scales outside FP8 E4M3 range
    (saturates at 448, underflows below ~2^-9) without higher-level
    rescaling; MXFP4 spans a much wider scale range via E8M0.
  3. Effective bits per element differ: 4.25 (MXFP4) vs 4.50 (NVFP4),
    a 5.9% overhead delta in NVFP4 that any compression-ratio
    comparison should account for.

Specific requests

If a maintainer could confirm or correct any of the following, that
would close out the row and let me publish a sister conformance pack
to the existing MXFP4 pack:

(a) Block size confirmation. Is 16 elements per block the only
supported block size, or is it a default with alternatives?

(b) Scale format confirmation. Is FP8 E4M3 (with the standard
fn saturation flag, no infinities) the canonical scale
encoding? Are there variants that use FP8 E5M2 instead?

(c) Encoding endianness. When 16 four-bit elements are packed
into 64 bits, are the first element bits in the most-significant
or least-significant nibble?

(d) Reference vectors. Does TransformerEngine ship any unit
tests with documented input/output bit-patterns that I can use
as ground-truth boundary vectors (NaN, +/-Inf-equivalent
saturation, smallest normal, smallest subnormal, denormal-block
behavior)?

(e) Round-trip behavior on out-of-range scale. When a tensor's
natural per-block scale would land outside the FP8 E4M3
representable range, is the recommended behavior (i) clamp the
scale and saturate the elements, (ii) error out, or (iii)
something else?

What I will do with confirmed answers

Open a small PR (catalog row + conformance pack) on
gHashTag/t27, with full attribution to this issue and a
cross-link back to the relevant TransformerEngine references. The
pack will follow the same shared row schema as the existing six
packs (GF16, MXFP4 element, BF16, FP8 E4M3, FP8 E5M2, E8M0 block
scale), with honest abs_error reporting (no overflow-to-Inf
masked as a match).

Background and methodology are documented in a 16-page methodology
paper (Trinity S^3 AI, 2026-06-08, file
paper3-methodology-2026-06-08-v3-trinity.pdf, SHA-256
f31f5dd243afc7b2ba4a423859a1e1dc67036c3a93affab30acc8d02f0a15eef)
that I plan to upload to arXiv this week.

Async only -- no rush, no specific deadline. If the relevant
maintainer is on vacation or sprint-locked, a one-line "ping us back
in N weeks" is a fine answer.

Thank you for the open release of NVFP4 documentation and for the
maintained NVFP4 reference implementation in TransformerEngine.

-- Dmitrii Vasilev
Trinity S^3 AI
admin@t27.ai
GitHub: @gHashTag

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions