Adaptive Compression

Hello team,

I recently came across the SemiAnalysis article “Vera Rubin: Extreme Co-Design as an Evolution” (https://newsletter.semianalysis.com/p/vera-rubin-extreme-co-design-an-evolution) where Adaptive Compression for transformer workloads was discussed. The article mentions significant speedups (50 PFLOPS vs 35 FLOPS), but I could not find detailed information on how this is implemented in the Transformer Engine.

Now that GTC 2026 has concluded, I wanted to ask for clarification on the following:

1. Could you provide more details on the implementation of Adaptive Compression in Transformer Engine?  
2. Specifically, how is sparsity identified and exploited dynamically? 
3. Are there any public code examples, demos, or documentation illustrating this feature?

Any guidance or pointers would be greatly appreciated, as I am interested in evaluating and experimenting with this feature for transformer model acceleration.

Thank you for your time and support.

Best regards,  
Guanchen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive Compression #2807

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adaptive Compression #2807

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions