feat(ai-rate-limiting): add expression-based limit strategy#13191
feat(ai-rate-limiting): add expression-based limit strategy#13191nic-6443 merged 7 commits intoapache:masterfrom
Conversation
Add a new 'expression' option for the limit_strategy field in ai-rate-limiting plugin, allowing users to define custom Lua arithmetic expressions for dynamic token cost calculation. When limit_strategy is set to 'expression', the plugin evaluates the user-defined cost_expr against the raw LLM API usage response fields (e.g., input_tokens, cache_creation_input_tokens, output_tokens). Missing variables default to 0, and safe math functions (abs, ceil, floor, max, min) are available. This enables use cases like: - Cache-aware billing: input_tokens + cache_creation_input_tokens - Weighted costs: input_tokens + cache_read_input_tokens * 0.1 + output_tokens - Provider-specific fields: any numeric field from the raw usage response
…alculation The open-source limit-count module includes the peek cost (1) in the remaining header during dry_run access phase, unlike the enterprise limit-count-advanced module. Adjust all expected remaining values by -1 to match this behavior.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an expression-based token cost strategy to the ai-rate-limiting plugin so users can compute rate-limit cost from provider-specific usage fields via a Lua arithmetic expression.
Changes:
- Extends
limit_strategywith"expression"and addscost_exprto the plugin schema. - Introduces sandboxed compilation/evaluation of expressions against
ctx.llm_raw_usage. - Adds a dedicated test suite covering schema validation and (non-)streaming Anthropic scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| apisix/plugins/ai-rate-limiting.lua | Adds expression strategy, schema field, and runtime expression evaluation for token-cost calculation. |
| t/plugin/ai-rate-limiting-expression.t | Adds integration tests validating expression config and Anthropic streaming/non-streaming behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Prevent raw usage fields from shadowing safe math functions (e.g., a field named 'math' or 'abs' from LLM response) - Reject non-finite values (NaN/inf) from expression results - Clamp negative expression results to 0 instead of crediting tokens - Add test for negative expression result (cache_read > input)
When expression evaluates to a negative value that gets clamped to 0, calling rate_limit() with cost=0 triggers an assertion failure in resty.limit.count's incoming function. Skip the call entirely when used_tokens is 0 since there's nothing to deduct.
|
Fixed the CI failure in TEST 13: when the expression evaluates to a negative value that gets clamped to 0, calling |
…tics The lua-resty-limit-traffic library is being upgraded from v1.0.0 to v1.2.0 in the apisix-runtime build. Key library change: incoming_new() now counts UP (returns consumed) instead of DOWN (returns remaining). Changes: - limit-count-local.lua: Convert consumed return value to remaining (remaining = limit - consumed), matching the enterprise limit-count-advanced module. When commit=false (dry_run), pass cost=0 to the library so it reads current state without deducting, eliminating the off-by-1 in remaining header. - limit-count/init.lua: Add dry_run rejection check inside local-policy branch only (not redis, which always commits and has no dry_run support). - ai-rate-limiting-expression.t: Revert remaining header expectations to match enterprise values now that dry_run shows accurate remaining.
….0 semantics" This reverts commit 98ce8f3.
Description
This PR adds the
expressionlimit strategy to theai-rate-limitingplugin.Expression strategy
The
expressionlimit strategy allows defining rate limit groups using lua-resty-expr expressions. Each group can have its owncount,time_window, and matchingexpression. When a request matches multiple groups, the first matching group is used. If no group matches, the request is passed through without rate limiting.This enables fine-grained AI token rate limiting based on request attributes (headers, query params, variables, etc.).
Example config
{ "limit_strategy": "expression", "cost_expr": "input_tokens + completion_tokens", "limit_groups": [ { "expression": [["http_x_model", "==", "gpt-4"]], "count": 500, "time_window": 60 }, { "expression": [["http_x_model", "==", "gpt-3.5"]], "count": 1000, "time_window": 60 } ] }Checklist
Note on remaining header accuracy
The
X-AI-RateLimit-Remainingheader currently shows a value that is off by 1 (e.g., 499 instead of 500) due to thelimit-countmodule deducting cost during the access-phase dry-run peek. This will be fixed in a follow-up PR after apisix-build-tools#455 merges and a newapisix-runtimeis released withlua-resty-limit-trafficv1.2.0, which supportscost=0for non-deducting peeks.