More doc stuff (#965)

kixelated · web-flow · commit 609a434df5a3 · 2026-02-17T00:07:19.000Z
diff --git a/doc/.vitepress/config.ts b/doc/.vitepress/config.ts
@@ -50,7 +50,11 @@ export default defineConfig({
 						{
 							text: "Standards",
 							link: "/concept/standard/",
-							items: [{ text: "MoqTransport", link: "/concept/standard/moq-transport" }],
+							items: [
+								{ text: "MoqTransport", link: "/concept/standard/moq-transport" },
+								{ text: "MSF", link: "/concept/standard/msf" },
+								{ text: "LOC", link: "/concept/standard/loc" },
+							],
 						},
 						{
 							text: "Use Cases",
@@ -59,7 +63,8 @@ export default defineConfig({
 								{ text: "Contribution", link: "/concept/use-case/contribution" },
 								{ text: "Distribution", link: "/concept/use-case/distribution" },
 								{ text: "Conferencing", link: "/concept/use-case/conferencing" },
-								{ text: "Exotic", link: "/concept/use-case/exotic" },
+								{ text: "AI", link: "/concept/use-case/ai" },
+								{ text: "Other", link: "/concept/use-case/other" },
 							],
 						},
 					],
diff --git a/doc/concept/index.md b/doc/concept/index.md
@@ -8,34 +8,8 @@ Welcome to my favorite section.
 MoQ has been a multi-year journey to solve some very real problems in the industry and now it's time to flex the design.
 
 ## Layers
-
-The design philosophy of MoQ is to make things simple, composable, and customizable.
+MoQ is carefully broken into layers to make it simple, composable, and customizable.
 We don't want you to hit a brick wall if you deviate from the standard path (*ahem* WebRTC).
-We also want to benefit from economies of scale (like HTTP), utilizing generic libraries and tools whenever possible.
-
-To accomplish this, MoQ is broken into layers:
-
-```text
-┌─────────────────┐
-│   Application   │   🏢 Your business logic
-│                 │    - authentication, non-media tracks, etc.
-├─────────────────┤
-│  Media Format   │   🎬 Media-specific encoding/streaming
-│     (hang)      │     - codecs, containers, catalog
-├─────────────────├
-│  MoQ Transport  │  🚌 Generic pub/sub transport
-│   (moq-lite)    │     - broadcasts, tracks, groups, frames
-├─────────────────┤
-│  WebTransport   │  🌐 Browser-compatible QUIC
-│                 │     - HTTP/3 handshake
-├─────────────────┤
-|      QUIC       |  🌐 Underlying transport protocol
-│                 │     - streams, datagrams, prioritization, etc.
-└─────────────────┘
-```
-
-You get to choose which layers you want to use and which layers you want to replace.
-It's like a cake but reusable.
 
 See [Layers](/concept/layer/) for more information.
 
diff --git a/doc/concept/layer/index.md b/doc/concept/layer/index.md
@@ -1,11 +1,36 @@
 ---
 title: Layering
-description: It's like a cake but reusable.
+description: It's like a cake; you choose if you want frosting.
 ---
 
 # Layers
-You need to have some understanding of the responsibility and purpose of each layer to best utilize MoQ.
-Let's dive in, starting at the bottom of the stack.
+The design philosophy of MoQ is to make things simple, composable, and customizable.
+We don't want you to hit a brick wall if you deviate from the standard path (*ahem* WebRTC).
+We also want to benefit from economies of scale (like HTTP), utilizing generic libraries and tools whenever possible.
+
+To accomplish this, MoQ is broken into layers:
+
+```text
+┌─────────────────┐
+│   Application   │   🏢 Your business logic
+│                 │    - authentication, non-media tracks, etc.
+├─────────────────┤
+│  Media Format   │   🎬 Media-specific encoding/streaming
+│     (hang)      │     - codecs, containers, catalog
+├─────────────────├
+│  MoQ Transport  │  🚌 Generic pub/sub transport
+│   (moq-lite)    │     - broadcasts, tracks, groups, frames
+├─────────────────┤
+│  WebTransport   │  🌐 Browser-compatible QUIC
+│                 │     - HTTP/3 handshake
+├─────────────────┤
+|      QUIC       |  🌐 Underlying transport protocol
+│                 │     - streams, datagrams, prioritization, etc.
+└─────────────────┘
+```
+
+You get to choose which layers you want to use and which layers you want to replace.
+It's like a cake; you choose if you want frosting.
 
 ## QUIC
 QUIC is the core protocol that powers HTTP/3, designed to fix head-of-line blocking that plagues TCP and thus HTTP/2.
diff --git a/doc/concept/layer/moq-lite.md b/doc/concept/layer/moq-lite.md
@@ -4,10 +4,29 @@ description: A fraction of the calories with none of the fat.
 ---
 
 # moq-lite
-A subset of the [MoqTransport](/concept/standard/moq-transport) specification.
-The useless/optional cruft has been removed so more time can be spent on the core functionality.
+[moq-lite](https://www.ietf.org/archive/id/draft-lcurley-moq-lite-02.html) is a subset of the [MoqTransport](/concept/standard/moq-transport) specification.
+The goal is to keep the core transport layer simple and focused on practical use-cases.
 
-See the draft: [draft-lcurley-moq-lite](https://www.ietf.org/archive/id/draft-lcurley-moq-lite-02.html).
+There's too much fringe functionality in the MoqTransport draft that's not practical to implement.
+Most of it is specific to Cisco's implementation and bizarre requirements anyway.
+
+## Compatibility
+Keep in mind that moq-lite is forward compatible with the IETF draft.
+For every moq-lite API, there's a corresponding moq-transport API.
+So fortunately, it doesn't matter if I get hit by a bus and moq-lite ceases to exist.
+
+The moq.dev libraries negotiate the `moq-lite` or `moq-transport` version as part of the QUIC/WebTransport handshake (via ALPN).
+When `moq-transport` wire format is negotiated, we implement a compatibility layer that enforces the moq-lite API.
+For example, if there's a gap in a group (valid in moq-transport), we drop the tail of the group instead of erroring.
+
+|---------------|---------------|-----------|----------------------------------------------------------------------|
+| client        | relay         | supported |                                                                      |
+|---------------|---------------|:---------:|----------------------------------------------------------------------|
+| moq-lite      | moq-lite      | ✅        |                                                                      |
+| moq-lite      | moq-transport | ✅        |                                                                      |
+| moq-transport | moq-lite      | ⚠️        | Can't use moq-transport specific features.                           |
+| moq-transport | moq-transport | ⚠️        | Depends on the implementation; nobody has implemented every feature. |
+|---------------|---------------|-----------|----------------------------------------------------------------------|
 
 ## Definitions
 - **Broadcast** - A named and discoverable collection of **tracks** from a single publisher.
@@ -22,15 +41,14 @@ It's less ambiguous and closer to media terminology:
 
 ## Major Differences
 The main goal is to reduce complexity and make the protocol easier to implement.
-When a feature has limited use-cases, it's removed (for now).
 
-- **No Request IDs**: A bidirectional stream for each request to avoid HoLB.
+- **No Request IDs**: A bidirectional stream for each request to avoid HoLB. (NOTE: likely to be upstreamed into moq-transport)
 - **No Push**: A subscriber must explicitly subscribe to each track.
-- **No FETCH**: The plan is to use HTTP for VOD instead of reinventing the wheel.
+- **No FETCH**: Use HTTP for VOD instead of reinventing the wheel.
 - **No Joining Fetch**: Subscriptions start at the latest group, not the latest frame.
 - **No sub-groups**: SVC layers should be separate tracks.
-- **No gaps**: Makes life easier for a relay.
+- **No gaps**: Makes life much easier for the relay and every application.
 - **No object properties**: Encode your metadata into the frame payload.
 - **No pausing**: Unsubscribe if you don't want a track.
 - **No binary names**: Uses UTF-8 strings instead of arrays of byte arrays.
-- **No datagrams**: Maybe in the future.
+- **No datagrams**: Maybe one day.
diff --git a/doc/concept/standard/loc.md b/doc/concept/standard/loc.md
@@ -0,0 +1,13 @@
+---
+title: LOC - Low Overhead Container
+description: A low-overhead container format for MoQ.
+---
+
+# LOC - Low Overhead Container
+We originally wanted to use [CMAF](/concept/standard/msf) but there's a lot of overhead.
+Like 100 bytes per frame sort of overhead (`moof` + `mdat`), the type of overhead that kills audio-only streams.
+
+LOC is a super simple container format that's designed to be lightweight.
+It's similar to the [hang container](../layer/hang) and we'll probably merge them in the future.
+
+[See the draft](https://www.ietf.org/archive/id/draft-ietf-moq-loc-00.html) for the latest details.
diff --git a/doc/concept/standard/msf.md b/doc/concept/standard/msf.md
@@ -0,0 +1,14 @@
+---
+title: MSF - MoQ Streaming Format
+description: A catalog format for MoQ.
+---
+
+# MSF - MoQ Streaming Format
+HLS/DASH playlists suck.
+WebRTC SDP is even worse.
+MSF is a replacement for both, utilizing MoQ live streams.
+
+[MSF](https://www.ietf.org/archive/id/draft-ietf-moq-msf-00.html) is a catalog format for MoQ.
+It's similar to the [hang catalog](../layer/hang) and we'll probably merge them in the future.
+
+[See the draft](https://www.ietf.org/archive/id/draft-ietf-moq-msf-00.html) for the latest details.
diff --git a/doc/concept/use-case/ai.md b/doc/concept/use-case/ai.md
@@ -0,0 +1,70 @@
+---
+title: AI
+description: Welcome to the future, old man.
+---
+
+# AI
+Hopefully you had this square on your buzzword bingo card.
+
+WebRTC is a great protocol for conferencing, but it's not designed for AI.
+But I haven't personally worked in this space either so take my suggestions with a grain of salt.
+
+## Latency
+Inference is still quite slow and expensive, even for the big players.
+If you're going to spend >300ms and literal dollars on expensive inference, you want at least *some* reliability guarantees.
+
+Unfortunately, WebRTC will never try to retransmit audio packets.
+A single lost packet will cause noticeable audio distortion.
+And if you have the audacity to generate audio/video separately, WebRTC won't synchronize them for you.
+Frames are rendered on receipt, so unless you introduce a delay, audio will be out of sync with video.
+
+One of the core tenets of MoQ is adjustable latency.
+The viewer (and thus your application) controls how long it's willing to wait for content before it gets skipped/desynced.
+The latency budget of the network protocol can match the latency budget of the application.
+
+## On-Demand
+MoQ is pull-based, so nothing is transmitted over the network until there's at least one subscriber.
+You can further extend this by not generating/encoding content either.
+
+Both of these were mentioned briefly on the [contribution](/concept/use-case/contribution) page if you want to read more.
+
+### Inference
+If you want to save compute resources, you can defer inference until it's actually needed.
+
+For example, let's say you're publishing a `captions` track populated by Whisper or something.
+If nobody has enabled captions, then nobody will subscribe to the `captions` track.
+You can stop generating the track (or use a smaller model) until it's actually requested.
+
+### Simulcast
+If you want to save bandwidth, you can publish media in a format expected by the AI model.
+
+For example, let's say you're doing object detection on a bunch of security cameras.
+The model inputs video at 360p and 10fps or something like that, so that's what you publish.
+But if a human (those still exist) wants to audit the full video, you can separately serve the full resolution video.
+Since this is on-demand, you will only encode/transmit the 1080p video when it's actually needed.
+
+## Browser Control
+One of the perks of using WebSockets/MoQ instead of WebRTC is that you get full control over the media pipeline.
+
+[WebCodecs](https://developer.mozilla.org/en-US/docs/Web/API/WebCodecs_API) is used to encode/decode media within the browser.
+- For video, you use [VideoFrame](https://developer.mozilla.org/en-US/docs/Web/API/VideoFrame) which directly maps to a texture on the GPU. You can use WebGPU to perform inference, encoding, rendering, etc without ever touching the CPU.
+- For audio, you get [AudioData](https://developer.mozilla.org/en-US/docs/Web/API/AudioData) which is (usually) just a float32 array. You control exactly how these are processed, captured, emitted, etc.
+
+It's more work to do this instead of using a `<video>` element of course, but it opens the door to more possibilities.
+Run additional inference in the browser, render your media to textures on a model, etc.
+
+And note that all of this is possible with WebRTC and [insertable streams](https://developer.mozilla.org/en-US/docs/Web/API/Insertable_Streams_for_MediaStreamTrack_API).
+However, you're really not gaining much by using WebRTC only for networking... just use MoQ instead.
+
+## Non-Media
+MoQ is not just for media.
+
+Send your prompts over the same WebTransport connection as the media.
+Or send non-media stuff like vertex data for 3D models, separate from the texture data.
+It's a versatile protocol with a wide range of use-cases.
+
+## Simplicity
+You're working with AI, so you're probably building something new.
+
+If you don't want to deal with SDP, or connections that take 10 RTTs, or unsupported media encodings, or STUN/TURN servers, then give MoQ a try.
+It's a lot closer to WebSockets than WebRTC, but with the ability to skip and scale.
diff --git a/doc/concept/use-case/contribution.md b/doc/concept/use-case/contribution.md
@@ -33,3 +33,77 @@ That's an over-generalization of course, but it's very interesting to see the di
 SRT is built into modern production equipment (hardware) while RTMP is used in consumer software.
 
 Why? IDK.
+
+## Pull vs Push
+Existing contribution protocols are push-based.
+Even Youtube's weird HLS ingest thing operates via POST requests.
+
+However, MoQ is fundamentally a pull-based protocol.
+Technically, MoqTransport supports push too (via PUBLISH), but hear me out for a second.
+
+### The Push Problem
+I would say there is one major problem with push: **There's no "optional" content.**
+
+When a publisher creates multiple tracks, like 360p and 1080p, it needs to simultaneously encode and transmit both tracks.
+There's no way of knowing if anything downstream *actually* wants the 1080p track; it might go straight to `/dev/null` on the media server.
+
+This doesn't matter for huge events like a concert or sports game.
+With enough viewers, we can assume that at least one viewer will want the content.
+But it can be a significant cost for long-tail content that nobody watches.
+
+For example, consider a facility with hundreds of security cameras.
+We might be able to afford uploading 360p for every camera (recording to disk), but anything more than that would over-saturate the network.
+Ideally, we could only stream 1080p from individual cameras when a human wants a closer look...
+
+### The Pull Solution
+The first thing a MoQ viewer does is subscribe to the `catalog.json` track for a broadcast.
+This lists all of the available tracks and their properties.
+
+If a viewer wants the 1080p track, it subscribes to it.
+The subscription makes its way upstream (combining with duplicates) until one subscription reaches the publisher.
+When no more viewers want the 1080p track, the subscription is cancelled.
+
+The publisher won't transmit a track until there's an active subscription, saving bandwidth.
+The publisher can go the extra mile and not even encode the content without a subscription, saving compute.
+This is especially useful for expensive AI models, for example only running whisper when captions are needed.
+
+Note that media services can also benefit from the same behavior.
+If nobody currently wants the 1080p track, then don't transcode it.
+The "publisher" in this case is any entity that understands the media format on top of MoQ.
+
+## Multiple Connections
+Another issue with push-based protocols is that each connection is expensive.
+If every connection needs its own copy of the content, we quickly run out of bandwidth.
+Redundant ingest is mostly limited to large events that have bandwidth to spare (active-active).
+
+Once again, MoQ solves this via the pull model.
+A publisher can establish multiple connections that *might* be used.
+A subscription will only be issued if the connection needs a specific track.
+
+For example, a service can implement primary/secondary ingest via two connections to separate endpoints.
+All subscriptions are issued over the primary connection but if it fails, the subscriptions are moved to the secondary connection.
+The endpoints don't even have to be part of the same CDN and MoQ publisher is completely oblivious; it just knows it was told to connect to two URLs.
+
+Another example is P2P streaming.
+A client can establish a connection to each peer, transmitting tracks as requested.
+If one peer has the video minimized, then it can unsubscribe from the video track and save bandwidth.
+Again there's no business logic for this built into MoQ: it's automatic.
+
+But what about clients that don't support P2P?
+Each client can also establish a connection to a MoQ CDN as a fallback.
+This works because the client discovers all available broadcasts available on a connection via the built-in [announce mechanism](/feature/announce).
+If two connections can serve the same content, the subscription goes to the "best" connection (ie. P2P > CDN).
+
+## Economies of Scale
+A subtle problem with contribution protocols is that they're not used for distribution.
+
+This might silly: "of course distribution and contribution are different!"
+But when you really sit down and break down the requirements, they're not that different.
+One is client-server while the other is server-client, one is 1:1 while the other is 1:N.
+
+By designing a protocol that works for both contribution and distribution, we can share implementations and optimizations.
+There are other benefits of supporting 1:N too, as mentioned in the previous section, so it seems like a no-brainer.
+
+The other way we benefit from economies of scale is by using QUIC.
+We're not implementing our own UDP-based protocol and rediscovering the rough edges of the internet all over again.
+A QUIC library with BBR will out-perform the system TCP stack and likely out-perform any custom UDP thing (ex. SRT).
diff --git a/doc/concept/use-case/index.md b/doc/concept/use-case/index.md
@@ -7,4 +7,5 @@ description: How MoQ should be used in the wild
 - [Contribution](/concept/use-case/contribution): A publisher (ex. OBS) sends data to a service (ex. Twitch).
 - [Distribution](/concept/use-case/distribution): A service (ex. Twitch) distributes data to viewers.
 - [Conferencing](/concept/use-case/conferencing): A service (ex. Zoom) facilitates a conference between multiple participants.
-- [Exotic](/concept/use-case/exotic): Some ideas for other use cases that might be viable.
+- [AI](/concept/use-case/ai): Generative AI, overlays, voice agents, and more.
+- [Other](/concept/use-case/other): Some ideas for other use cases that might be viable.
diff --git a/doc/concept/use-case/other.md b/doc/concept/use-case/other.md
diff --git a/doc/feature/announce.md b/doc/feature/announce.md
diff --git a/doc/package.json b/doc/package.json
@@ -4,8 +4,7 @@
 	"private": true,
 	"type": "module",
 	"scripts": {
-		"dev": "vitepress dev",
-		"check": "vitepress build",
+		"dev": "vitepress dev --open",
 		"build": "vitepress build",
 		"preview": "vitepress preview",
 		"deploy": "vitepress build && wrangler deploy"