Skip to content

RFC 8899 DPLPMTUD design document#1619

Draft
jasikpark wants to merge 2 commits intoslackhq:masterfrom
jasikpark:dplpmtud-design
Draft

RFC 8899 DPLPMTUD design document#1619
jasikpark wants to merge 2 commits intoslackhq:masterfrom
jasikpark:dplpmtud-design

Conversation

@jasikpark
Copy link
Copy Markdown
Collaborator

@jasikpark jasikpark commented Mar 4, 2026

Summary

Design document analyzing options for implementing RFC 8899 DPLPMTUD (Datagram Packetization Layer Path MTU Discovery) in Nebula.

  • Analyzes 4 implementation approaches, recommending extending the existing Test packet mechanism with new TestProbe/TestProbeReply subtypes
  • Covers the RFC 8899 state machine adaptation (DISABLED → BASE → SEARCHING → SEARCH_COMPLETE, plus ERROR state)
  • Addresses overlay IPv6 minimum MTU constraints (1280 byte floor → BASE_PLPMTU of 1360)
  • Details DF bit handling across Linux/Darwin/FreeBSD/Windows
  • Proposes phased rollout: probe mechanism + MTU enforcement → optimizations → relay-aware probing
  • Backwards compatible with older Nebula versions (graceful degradation to BASE_PLPMTU)

This is a design doc only — no code changes. Looking for feedback on the approach before implementation.

Context

Nebula currently uses a static tun.mtu (default 1300), which is conservative for most paths. DPLPMTUD would discover the actual per-peer path MTU via active probing, enabling jumbo frames on LAN/datacenter paths and detecting constrained paths (carrier WiFi, VPN-in-VPN) without manual tuning.

Unlike #16, this would implement MTU detection by probing with larger and larger packets until a failure to receive an ACK, rather than relying on ICMP "must fragment" responses.

🤖 Generated with Claude Code

Analyzes options for implementing RFC 8899 Datagram PLPMTUD in Nebula,
recommending extension of the existing Test packet mechanism with new
TestProbe/TestProbeReply subtypes. Covers overlay IPv6 MTU floor
constraints, phased rollout plan, and backwards compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nbrownus
Copy link
Copy Markdown
Collaborator

nbrownus commented Mar 5, 2026

It would be awesome to support PMTU and my desire and the primary focus of this plan is increasing a paths MTU, which is opposite of what ICMP does. This creates an awkward situation that was glossed over in the plan.

ICMP can reduce a path mtu and a paths mtu is reset every so often to reprobe for a limit being removed, the max PMTU is derived from the route table or the tun device.

This means we have 2 realistic options to enable mtu growth:

  1. Run the tun device at the max possible MTU, modify TCP mss in flight, use PTB to draw the PMTU down when needed (every new tunnel, whenever roaming occurs). Recovery from a lowered PMTU will take time and that's the worst part since we'd start from a small MTU and work up. UDP overlay would also suffer a bit since it would rely heavily on PTB packets.
  2. Create or update a route mtu for every vpn addr that has a larger than default PMTU. This should have an immediate effect on PMTU but it means managing many /32 and/or /128 routes. I don't think this will work on Windows either, will probably need another approach.

A 3rd option, which I am not stoked about at all, is to have nebula do its own packet fragmentation and run a large MTU on the tun.

- Add relay overhead (extra 32B: second Nebula header + AEAD tag) and
  TCP underlay (+4B length prefix) to the overhead budget table
- Rewrite tun.mtu interaction section to address the core problem: PLPMTUD
  discovers larger path capacity, but the OS won't send larger packets unless
  the TUN MTU is raised. Analyzes three options: large TUN + per-peer PTB
  (recommended), per-peer route MTU, and Nebula-layer fragmentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JackDoan
Copy link
Copy Markdown
Collaborator

JackDoan commented Mar 5, 2026

@nbrownus your third option presents some interesting opportunities wrt using sendmmsg though

@JackDoan
Copy link
Copy Markdown
Collaborator

JackDoan commented Mar 5, 2026

it could be kinda spicy to have hosts report their MTU per-remote-addr (+ default route) to the lighthouse, but that opens quite a can of worms

@jasikpark
Copy link
Copy Markdown
Collaborator Author

jasikpark commented Mar 5, 2026

@nbrownus your third option presents some interesting opportunities wrt using sendmmsg though

sendmmsg only exists on linux though, right?

@nbrownus
Copy link
Copy Markdown
Collaborator

nbrownus commented Mar 5, 2026

sendmmsg or not, fragmentation handling is all sorts of fun and I would look hard at offloading the entire underlay packet handling to QUIC to avoid having to answer packet reassembly and retransmission directly.

@jasikpark
Copy link
Copy Markdown
Collaborator Author

Someone did spec out a version of QUIC using Noise instead of TLS 😁 https://github.com/quic-noise-wg/quic-noise-spec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants