feat(service-map): server-side filtering, latency percentiles, throughput & focus#2387
feat(service-map): server-side filtering, latency percentiles, throughput & focus#2387MikeShi42 wants to merge 5 commits into
Conversation
…ce MultiSelect + neighbor expansion)
…de sizing Builds on the server-side filtering to make the map a real RED-metrics view: - Latency p50/p95/p99 computed server-side via a single GROUPING SETS query that emits both rolled-up node-level rows and per-edge rows (percentiles can't be combined client-side). Errors via countIf; drops the per-status map. - Labeled request throughput (req/s) shown alongside percentiles in node and edge tooltips. - 'Focus' action in the node tooltip drives the server-side service filter to that service and its immediate neighbors (toggles off when reselected); clicking a node selects it and reveals the tooltip rather than auto-focusing. - Node size scales with throughput (sqrt of incoming volume vs the busiest node). - Pure unit-tested helpers: rawDurationToMs, getRequestsPerSecond, formatRate, getNodeSize; aggregation tests rewritten for the new row model.
🦋 Changeset detectedLatest commit: bd20dba The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Additional context: agent branch ( Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
E2E Test Results✅ All tests passed • 192 passed • 3 skipped • 1321s
Tests ran across 4 shards in parallel. |
Deep Review✅ No critical issues found. 🟡 P2 — recommended
🔵 P3 nitpicks (15)
Reviewers (12): correctness, testing, maintainability, project-standards, agent-native, learnings, security, performance, api-contract, reliability, adversarial, kieran-typescript. Testing gaps:
|
- rawDurationToMs: drop the exponent clamp so precision < 3 (e.g. seconds) scales up correctly, matching getDurationSecondsExpression (P2 correctness). - Align durationPrecision fallback to the schema default (3) in node/edge. - Type isNodeLevel as boolean, converting at the parse boundary. - formatRate: guard non-finite/negative input. - Tests: precision<3 conversion, formatRate guards, partial-percentile hasLatency case.
|
Thanks — addressed the deep-review feedback in c9be6f8. Fixed:
Deferring (with reasons):
|
…tion Second-pass deep-review follow-ups: - Use quantiles(0.5,0.95,0.99) (single reservoir sketch) instead of three quantile() calls; parser reads the [p50,p95,p99] array. Validated against CH. - rawDurationToMs: multiply by 10^(3-precision) to avoid a fractional divisor for precision < 3. - Extract deriveDisplayMetrics() shared by node and edge tooltips (removes the duplicated precision/latency/throughput derivation). - parseInt with explicit radix 10; drop a redundant whereLanguage cast.
|
Second-pass review — addressed the bounded items in bd20dba: Fixed:
Deferring — these need a decision, not an auto-fix:
Batched into a focused follow-up (keeps this PR's diff reviewable): file splits ( Noted, not changing:
|
What & why
The Service Map showed topology plus only part of RED (request counts + error
rate). This brings it closer to a full APM-style service map: filter large maps
down to what you care about, see latency (the missing D in RED) and
throughput, and pivot/inspect a service.
Commit 1 — Server-side filtering
whereinput and a service-name MultiSelect scope whichspans/services the map is built from, with inbound/outbound neighbor
expansion so a focused service still shows its immediate dependencies.
Commit 2 — Latency percentiles, throughput, focus & node sizing
single
GROUPING SETSquery that emits both rolled-up node-level rows andper-edge rows (percentiles can't be combined client-side). Errors now come
from
countIf, dropping the old per-status map.service and its immediate neighbors (clicking again clears it). Clicking a
node just selects it / opens the tooltip — it does not auto-focus.
Testing
(
rawDurationToMs,getRequestsPerSecond,formatRate,getNodeSize).tsc, eslint, and stylelint are clean.GROUPING SETSquery was executed against a live ClickHouse to confirm itparses/runs (
GROUPING(),countIf,quantile).Notes for reviewers
(server spans grouped by caller), not client-observed round-trip latency.
quantiles) and are marked with a
~in the UI.DBServiceMapPage.tsxis now ~390 lines; extracting the filter bar into itsown component is a reasonable follow-up.
🤖 Generated with Claude Code