feat(ai-proxy): add max_stream_duration_ms and max_response_bytes safeguards by nic-6443 · Pull Request #13250 · apache/apisix

nic-6443 · 2026-04-17T14:48:37Z

What this does

Adds two opt-in configuration knobs to ai-proxy and ai-proxy-multi to protect the gateway from a runaway upstream LLM service:

max_stream_duration_ms — wall-clock cap on total streaming response duration.
max_response_bytes — cap on total bytes read from the upstream for a single response (streaming or non-streaming).

Both are opt-in (no default) — existing deployments are unaffected.

Why

The existing timeout field is fed to httpc:set_timeout(), which is a per-socket-operation timeout (connect / send / read-one-block). It does not bound the total duration of a streaming response. If an upstream LLM has a bug that causes it to continuously emit valid SSE tokens without ever sending a terminator ([DONE], message_stop, response.completed), parse_streaming_response sits in an uncapped while true loop, pinning the worker at ~100% CPU indefinitely and degrading availability for all other traffic on that worker.

Behavior on abort

Streaming, limit hit mid-stream (bytes already flushed): stop feeding chunks and force-close the upstream httpc (close() + res._httpc = nil, so we don't pool a half-drained connection). nginx closes the downstream connection at end of content phase. The client detects truncation via the missing protocol-specific terminator. We intentionally do not synthesize a per-protocol "graceful error" SSE frame: we support three client protocols (OpenAI chat, Anthropic messages, OpenAI responses) with different terminators, and a missing terminator is the standard SSE way any mid-stream network failure is communicated to clients.
Streaming, limit hit before any output: return 504 (duration) or 502 (size) so on_error / fallback / retry hooks can kick in like any other upstream failure.
Non-streaming, Content-Length exceeds cap: pre-check the header, force-close the connection, return 502 without ever reading the body.
Non-streaming, chunked / no Content-Length: post-read size check catches the oversized body and returns 502.
ctx.var.llm_request_done = true is set on abort so downstream filters (e.g. moderation plugins that defer work until completion) finalize their state.
A core.log.warn line is emitted on every abort (aborting AI stream: <limit> exceeded; bytes=X duration_ms=Y route_id=Z) so log-based alerting can surface the event. No new Prometheus metric — the log line is sufficient and avoids expanding the plugin's metric surface.

Caveat (documented)

Both limits are best-effort: they are enforced after each chunk is read from the upstream, so the byte cap can overshoot by up to one upstream chunk (≈8 KiB in practice) and the duration cap can overshoot by up to one chunk's processing time. This is acceptable for the failure mode we are defending against (runaway streams produce tens of MB/s, so a one-chunk overshoot is negligible compared to "run forever").

Testing

New t/plugin/ai-proxy-stream-limits.t with a mock upstream that either streams OpenAI chat SSE chunks forever (no [DONE]) or returns a 100 KB body with matching Content-Length. Covers:

max_stream_duration_ms=500 → request aborted in <5 s with the expected log line.
max_response_bytes=2048 → request aborted in <5 s with the expected log line.
Non-streaming max_response_bytes=1024 vs 100 KB upstream response → 502 + expected log line.
Schema validation rejects max_stream_duration_ms: 0.

luacheck passes on all three modified Lua files.

Docs

Added rows to the config tables in docs/en/latest/plugins/ai-proxy.md, ai-proxy-multi.md, and their Chinese translations, with a clarifying note that timeout only bounds per-socket-operation timeouts and the new fields are needed to bound total stream duration / total bytes read.

…eguards Adds two opt-in configuration knobs to ai-proxy and ai-proxy-multi plugins to protect the gateway from runaway upstream LLM services: - max_stream_duration_ms: wall-clock cap on total streaming response duration. When exceeded, the upstream connection is force-closed. - max_response_bytes: cap on total bytes read from upstream for a single response (streaming or non-streaming). For non-streaming responses, pre-checks Content-Length; for streaming, enforces after each chunk. The existing `timeout` field only bounds per-socket-operation timeouts (connect/send/read block), which does not protect against an upstream that continuously emits valid SSE events forever. That failure mode can pin a worker at 100% CPU indefinitely and degrade availability for other traffic on the same worker. Both fields are opt-in (no default); existing deployments are unaffected. When a limit is hit mid-stream after bytes have been flushed to the client, the gateway stops feeding chunks and closes the upstream connection; the client observes a truncated SSE stream (missing the protocol-specific terminator such as [DONE], message_stop, or response.completed). When the limit is hit before any output has been produced (e.g. the converter has skipped all upstream events so far), 504 is returned so on_error / fallback policies can kick in. Adds an integration test with a mock upstream that either streams forever or returns an oversized Content-Length, and a schema validation case.

Copilot

Pull request overview

Adds opt-in safeguards to the ai-proxy / ai-proxy-multi gateway to prevent runaway upstream LLM responses by enforcing maximum streaming duration and maximum upstream response bytes.

Changes:

Add max_stream_duration_ms and max_response_bytes to plugin schemas and documentation (EN/ZH).
Enforce stream duration/byte limits in streaming response parsing, and add Content-Length / body-size checks for non-streaming responses.
Add a new test suite covering streaming aborts and non-streaming oversized Content-Length.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`apisix/plugins/ai-proxy/base.lua`	Thread plugin config into provider parsing and propagate non-streaming parse status.
`apisix/plugins/ai-providers/base.lua`	Implement stream duration/byte abort logic and non-streaming size checks.
`apisix/plugins/ai-proxy/schema.lua`	Add new config knobs to `ai-proxy` and `ai-proxy-multi` schemas.
`t/plugin/ai-proxy-stream-limits.t`	Add regression tests for the new stream/size limits and schema validation.
`docs/en/latest/plugins/ai-proxy.md`	Document new knobs and clarify `timeout` semantics.
`docs/en/latest/plugins/ai-proxy-multi.md`	Document new knobs and clarify `timeout` semantics.
`docs/zh/latest/plugins/ai-proxy.md`	Chinese doc updates for new knobs and `timeout` clarification.
`docs/zh/latest/plugins/ai-proxy-multi.md`	Chinese doc updates for new knobs and `timeout` clarification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…on-streaming Previously parse_response called res:read_body() first and then checked the size after, which meant a runaway chunked upstream could force the worker to buffer arbitrarily many bytes before the cap tripped. Switch to reading via res.body_reader() when max_response_bytes is set, so the cap is enforced as bytes arrive, matching the streaming path's behavior.

…nused require Address Copilot review: - parse_streaming_response returns (status, error_message) when the new stream limits trip before any bytes are flushed; capture both at the call site so on_error / fallback handlers can see the reason instead of just the status code. - Remove an unused require('apisix.core') from TEST 2.

Copilot AI review requested due to automatic review settings April 17, 2026 14:48

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Apr 17, 2026

Copilot started reviewing on behalf of nic-6443 April 17, 2026 14:49 View session

nic-6443 force-pushed the fix/ai-proxy-stream-runaway-limits-1776435956 branch from 24a1433 to 913fa6a Compare April 17, 2026 14:51

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread apisix/plugins/ai-providers/base.lua Outdated

Comment thread apisix/plugins/ai-proxy/base.lua Outdated

Comment thread t/plugin/ai-proxy-stream-limits.t Outdated

Comment thread t/plugin/ai-proxy-stream-limits.t

nic-6443 added 2 commits April 17, 2026 22:55

nic-6443 requested review from AlinsRan, Baoyuantop, membphis, moonming and shreemaan-abhishek April 18, 2026 03:43

moonming approved these changes Apr 18, 2026

View reviewed changes

membphis approved these changes Apr 20, 2026

View reviewed changes

Baoyuantop approved these changes Apr 20, 2026

View reviewed changes

Baoyuantop merged commit ecbb6fe into apache:master Apr 20, 2026
34 checks passed

nic-6443 deleted the fix/ai-proxy-stream-runaway-limits-1776435956 branch April 20, 2026 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-proxy): add max_stream_duration_ms and max_response_bytes safeguards#13250

feat(ai-proxy): add max_stream_duration_ms and max_response_bytes safeguards#13250
Baoyuantop merged 3 commits intoapache:masterfrom
nic-6443:fix/ai-proxy-stream-runaway-limits-1776435956

nic-6443 commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nic-6443 commented Apr 17, 2026

What this does

Why

Behavior on abort

Caveat (documented)

Testing

Docs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants