Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

docs(readme): update latest news
#3121 opened Jun 11, 2026 by sbhavani Collaborator Loading…
6 of 13 tasks
[PyTorch] Update cuBLASLt grouped gemm filter org-contribution
#3119 opened Jun 11, 2026 by yaox12 Member Loading…
1 of 13 tasks
TE EP integration to MoEBlock
#3116 opened Jun 10, 2026 by tdophung Collaborator Draft
13 tasks
[JAX] Collective Gemm test fixes
#3115 opened Jun 10, 2026 by jberchtold-nvidia Collaborator Loading…
13 tasks
Abstract CUDA hardcodes into configurable te_device_type / te_platform community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3113 opened Jun 10, 2026 by lxd-cumt Loading…
Add entrypoint for flagos multi-backend plugin system community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3107 opened Jun 9, 2026 by lxd-cumt Loading…
[PyTorch][torch.compile] Remove process group from quantizers
#3104 opened Jun 8, 2026 by pggPL Collaborator Loading…
3 of 12 tasks
Quantization support for GroupedTensor: FP8 per-tensor community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3102 opened Jun 7, 2026 by int-smart Contributor Loading…
11 of 13 tasks
Introduce Mega-C++ to reduce CPU overhead community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3099 opened Jun 6, 2026 by zhongbozhu Collaborator Draft
1 of 15 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3095 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3093 opened Jun 5, 2026 by cael-ling Contributor Loading…
3 of 13 tasks
Added thd cudnn guard community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3092 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
Make NVTE tensor handle pool size configurable community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3090 opened Jun 5, 2026 by lhb8125 Contributor Draft
fix(topk): fix UB and prevent vector load splitting in standalone_topk community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3088 opened Jun 5, 2026 by solos Loading…
5 of 13 tasks
[JAX] Fix norm workspace on global shapes
#3085 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
[JAX] MoEBlock tutorial
#3084 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
add attention docs
#3081 opened Jun 4, 2026 by sudhakarsingh27 Member Draft
13 tasks
[Common] Pack attention arguments as structs
#3079 opened Jun 3, 2026 by cyanguwa Collaborator Draft
13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3069 opened Jun 1, 2026 by cassiewilliam Contributor Loading…
6 of 8 tasks
fix unfused padding causal sdpa community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3063 opened May 31, 2026 by hungryGeek16 Loading…
[JAX] Grouped quant+GEMM custom partitioning rules
#3058 opened May 28, 2026 by jberchtold-nvidia Collaborator Loading…
8 of 13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors. community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3057 opened May 28, 2026 by plugyawn Loading…
7 of 13 tasks
ProTip! no:milestone will show everything without a milestone.