Skip to content

refactor(urls): centralize Runpod URL env vars#325

Open
deanq wants to merge 10 commits intomainfrom
deanq/ae-2946-centralize-url-env-vars
Open

refactor(urls): centralize Runpod URL env vars#325
deanq wants to merge 10 commits intomainfrom
deanq/ae-2946-centralize-url-env-vars

Conversation

@deanq
Copy link
Copy Markdown
Member

@deanq deanq commented Apr 22, 2026

Summary

runpod-flash was conflating the two Runpod URL env vars exposed by runpod-python:

  • RUNPOD_API_BASE_URLcontrol plane, default https://api.runpod.io. GraphQL + REST mgmt (pods, endpoints, templates).
  • RUNPOD_ENDPOINT_BASE_URLdata plane, default https://api.runpod.ai/v2. Endpoint invocations (/runsync, /run, /status, /health, /metrics). Exposed as runpod.endpoint_url_base.

Different domains serving different roles. Flash was using the wrong env or hardcoding URLs in several places, breaking dev/staging/local-mock workflows. This PR centralizes URL handling in src/runpod_flash/core/urls.py, fixes three bugs, renames Python constants for consistency, migrates two flash-only env vars with a deprecation shim where external users may exist, and adds heuristic misconfiguration warnings.

Bugs fixed

  1. core/resources/request_logs.py — read RUNPOD_API_BASE_URL (control plane) with a wrong api.runpod.ai default and appended /v2/{id}/status and /v2/{id}/metrics — data-plane paths on the control-plane host. Now uses centralized RUNPOD_ENDPOINT_URL (sourced from runpod.endpoint_url_base).
  2. core/resources/request_logs.py::_resolve_hapi_base_url (removed) — previously branched HAPI host off RUNPOD_ENV=dev and a dev-api. substring check in RUNPOD_API_BASE_URL. That implicit coupling is gone. HAPI host is now driven exclusively by RUNPOD_HAPI_URL; dev users must set it explicitly. A RuntimeWarning fires at import if RUNPOD_ENV is set to a non-prod value but no URL envs are overridden, catching the common case of a dev shell that would otherwise silently route to prod HAPI.
  3. stubs/dependency_resolver.py — generated cross-endpoint dispatch code hardcoded https://api.runpod.ai/v2, blocking dev/staging redirects. Now emits an os.environ.get("RUNPOD_ENDPOINT_BASE_URL", DEFAULT_ENDPOINT_URL).rstrip("/") lookup in the generated snippet.

New in core/urls.py

  • _env_url(new, old, default) helper — reads env; prefers new name, falls back to old with a DeprecationWarning; empty/whitespace values are treated as unset; validates http/https scheme + non-empty netloc + numeric port, raises ValueError on malformed input; strips trailing slashes.

  • Canonical Python constant naming — all primary env-sourced URLs follow the RUNPOD_*_URL convention:

    • RUNPOD_API_URL (reads unchanged env RUNPOD_API_BASE_URL)
    • RUNPOD_ENDPOINT_URL (sourced from runpod.endpoint_url_base)
    • RUNPOD_REST_API_URL (unchanged)
    • RUNPOD_HAPI_URL (env renamed from RUNPOD_HAPI_BASE_URL; no shim — unshipped)
    • RUNPOD_CONSOLE_URL (env renamed from CONSOLE_BASE_URL with deprecation shim)

    Derived constants (GRAPHQL_URL = RUNPOD_API_URL + /graphql; CONSOLE_URL = RUNPOD_CONSOLE_URL + %s endpoint-ID template) are intentionally unprefixed — they are not env vars, and a RUNPOD_* prefix would falsely imply one exists.

  • Partial-override RuntimeWarning — if any URL is overridden while others stay at prod defaults, emit one warning listing both groups. Heuristic, non-fatal.

  • RUNPOD_ENV-without-override RuntimeWarning — if RUNPOD_ENV is set to a non-prod value (dev, staging, etc.) but no URL envs are overridden, warn that requests will route to prod. Fills the gap left by removing _resolve_hapi_base_url.

  • RUNPOD_URL_MIXED_OK=1 opt-out — silences both warnings for legitimate mixed setups (e.g. dev control plane + prod HAPI).

Env var surface (unchanged where runpod-python reads them)

Env var Status
RUNPOD_API_BASE_URL Untouched (runpod-python reads)
RUNPOD_ENDPOINT_BASE_URL Untouched (runpod-python reads)
RUNPOD_REST_API_URL Untouched
RUNPOD_HAPI_BASE_URL Renamed to RUNPOD_HAPI_URL in-place (no shim — newly introduced in this PR, zero external adoption)
CONSOLE_BASE_URL Renamed to RUNPOD_CONSOLE_URL with deprecation shim (old name still works + warn)
RUNPOD_URL_MIXED_OK New — opt-out flag for partial-override and RUNPOD_ENV warnings

Follow-up (Phase 2, separate PR)

  1. In runpod-python: promote RUNPOD_API_BASE_URL read from inline (runpod/api/graphql.py:33) to a module-level attribute (runpod.api_url_base), matching the existing runpod.endpoint_url_base pattern.
  2. In flash: switch RUNPOD_API_URL from _env_url("RUNPOD_API_BASE_URL", ...) to runpod.api_url_base.rstrip("/"). Flash stops reading RUNPOD_API_BASE_URL directly. Eliminates the remaining dual-read.

Test plan

  • make quality-check — ruff format + lint + 2658 tests + 53 serial tests passing, coverage 85.91%.
  • Codegen regression test — test_stub_reads_endpoint_base_url_env and test_stub_builds_runsync_from_env_base pin that the generated stub honors RUNPOD_ENDPOINT_BASE_URL at worker runtime.
  • Request-logs regression tests — test_status_and_metrics_use_data_plane_endpoint_url and test_pod_logs_use_hapi_url capture constructed URLs so the control/data-plane conflation can't silently return.
  • Unit tests for _env_url (12), _validate_url (6), _is_opted_out (parametrized), _endpoint_domain_from_base_url (4), TestConsoleDeprecationShim (3), TestPartialOverrideWarning (5), TestRunpodEnvWithoutOverrides (4), TestUrlProfile (3), TestDerivedUrls (2) — all green.
  • Manual validation: internal dev sets all 4 URL env vars to Runpod Dev values, confirms flash + runpod-python agree on resolved URLs.
  • Manual validation: CONSOLE_BASE_URL deprecation warning fires exactly once when only the old name is set.

Dev environment setup

See Flash Dev Environment Setup in Flash Docs.

Linear

AE-2946

Spec and plan

  • Spec: docs/superpowers/specs/2026-04-22-runpod-url-env-organization-design.md
  • Plan: docs/superpowers/plans/2026-04-22-runpod-url-env-organization.md

Fix three places that conflated control-plane (RUNPOD_API_BASE_URL,
api.runpod.io) with data-plane (RUNPOD_ENDPOINT_BASE_URL, api.runpod.ai/v2)
env vars, and consolidate URL + project-wide constants into core/constants.py.

Bugs fixed:
- request_logs.py: read control-plane env with wrong api.runpod.ai default,
  then hit /v2/{id}/status and /metrics — data-plane paths on wrong host.
- request_logs._resolve_hapi_base_url: pivoted dev/prod HAPI selection off
  a control-plane env substring. Now checks RUNPOD_ENV=dev then inspects
  ENDPOINT_BASE_URL for dev-api. prefix.
- stubs/dependency_resolver.py: generated cross-endpoint dispatch code
  hardcoded https://api.runpod.ai/v2, blocking dev/staging redirects. Now
  emits an os.environ.get(RUNPOD_ENDPOINT_BASE_URL, ...) lookup.

Refactor:
- Move core/resources/constants.py -> core/constants.py (file already held
  non-resource content: Python versions, images, tarball, console URL).
- Add RUNPOD_API_BASE_URL, RUNPOD_REST_API_URL, ENDPOINT_BASE_URL (via
  runpod.endpoint_url_base), GRAPHQL_URL, HAPI_BASE_URL/DEV_HAPI_BASE_URL.
- core/api/runpod.py now imports URL constants from core/constants.py.
- Update 21 import sites across src/ and tests/.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Centralizes Runpod control-plane vs data-plane URL configuration into runpod_flash.core.constants, updates request-log fetching to use the data-plane base URL, and adjusts codegen/tests/import sites to follow the new constants location and env var semantics.

Changes:

  • Add centralized URL constants (RUNPOD_API_BASE_URL, RUNPOD_REST_API_URL, ENDPOINT_BASE_URL, GRAPHQL_URL, HAPI URLs) in core/constants.py and re-point call sites to them.
  • Fix request-log status/metrics URL construction to use the data-plane base (ENDPOINT_BASE_URL) and refine dev/prod HAPI selection logic.
  • Update stub code generation to honor RUNPOD_ENDPOINT_BASE_URL (rather than hardcoding https://api.runpod.ai/v2), and update affected imports across src/tests.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_security_gaps.py Update REST URL import to the centralized constants module.
tests/unit/test_endpoint_client.py Update ENDPOINT_DOMAIN import to centralized constants.
tests/unit/test_endpoint.py Update constants import path after constants refactor.
tests/unit/test_dotenv_loading.py Ensure core.constants is cleared on reload so env overrides affect URL constants.
tests/unit/resources/test_serverless.py Update constants import path after constants refactor.
tests/unit/resources/test_live_serverless.py Update constants import path after constants refactor.
tests/unit/resources/test_live_load_balancer.py Reload core.constants (new location) to pick up env changes in tests.
tests/unit/core/test_constants.py Update constants import path after constants refactor.
tests/unit/cli/commands/build_utils/test_manifest.py Update constants import path after constants refactor.
src/runpod_flash/stubs/load_balancer_sls.py Update constant import path after constants refactor.
src/runpod_flash/stubs/dependency_resolver.py Generated stub code now honors RUNPOD_ENDPOINT_BASE_URL with a default.
src/runpod_flash/endpoint.py Update worker defaults import; update LB domain import to centralized constants.
src/runpod_flash/core/resources/serverless.py Update constants import path after constants refactor.
src/runpod_flash/core/resources/request_logs.py Switch status/metrics calls to data-plane base URL; adjust HAPI dev/prod selection.
src/runpod_flash/core/resources/network_volume.py Update constants import path after constants refactor.
src/runpod_flash/core/resources/load_balancer_sls_resource.py Update constants import path after constants refactor.
src/runpod_flash/core/resources/live_serverless.py Update constants import path after constants refactor.
src/runpod_flash/core/resources/app.py Update constants import path after constants refactor.
src/runpod_flash/core/constants.py Introduce centralized URL constants and keep other shared constants.
src/runpod_flash/core/api/runpod.py Import GraphQL/REST URLs from centralized constants instead of reading env inline.
src/runpod_flash/cli/commands/preview.py Update constants import path after constants refactor.
src/runpod_flash/cli/commands/login.py Update constants import path after constants refactor.
src/runpod_flash/cli/commands/build_utils/manifest.py Update constants import path after constants refactor.
src/runpod_flash/cli/commands/build.py Update constants import path after constants refactor.
src/runpod_flash/init.py Update lazy exports to import constants from new centralized module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_endpoint_client.py
deanq added 4 commits April 22, 2026 12:25
Reverts the file move from the previous commit per review feedback — too
much churn for a refactor. Keeps the URL centralization, bug fixes, and
new URL constants; they now live in the original
src/runpod_flash/core/resources/constants.py.

- Restore src/runpod_flash/core/resources/constants.py (from core/).
- Restore tests/unit/core/resources/test_constants.py (from tests/unit/core/).
- Revert 21 import sites back to runpod_flash.core.resources.constants
  (absolute) and .constants / from ..constants (relative).
- core/api/runpod.py reads RUNPOD_API_BASE_URL and RUNPOD_REST_API_URL
  inline (duplicated with constants.py defaults) to avoid a circular
  import: core.resources.__init__ eagerly imports modules that re-enter
  core/api/runpod.py, so that module can't import from core.resources.constants.
The dev/prod HAPI split belongs in the env, not in the code. Collapse
HAPI_BASE_URL to a single env-driven constant (RUNPOD_HAPI_BASE_URL,
default https://hapi.runpod.net) and remove _resolve_hapi_base_url —
callers now use HAPI_BASE_URL directly. Any environment can redirect
HAPI by setting the env var, same as every other Runpod URL in this
module.
Eliminates the duplicated RUNPOD_API_BASE_URL / RUNPOD_REST_API_URL env
reads between core/api/runpod.py and core/resources/constants.py. The
duplication existed to dodge a circular import through
core/resources/__init__.py — the real fix is to put URL constants in a
leaf module with no resources deps.

- New core/urls.py holds every Runpod URL constant (RUNPOD_API_BASE_URL,
  RUNPOD_REST_API_URL, ENDPOINT_BASE_URL, GRAPHQL_URL, HAPI_BASE_URL) and
  the ENDPOINT_DOMAIN helper. Single source of truth.
- core/api/runpod.py imports from core/urls (no more inline os.environ reads
  or duplicate defaults).
- core/resources/constants.py keeps console URL + non-URL constants
  (Python versions, images, tarball, etc.).
- core/resources/request_logs.py, core/resources/load_balancer_sls_resource.py,
  endpoint.py, and 3 tests switched to importing URL constants from
  runpod_flash.core.urls.
The class attribute was a wrapper over the module-level constant with no
per-instance variation. Use the module-level GRAPHQL_URL directly in
session.post; update the two tests that asserted against the class
attribute to assert against runpod_flash.core.urls.GRAPHQL_URL — the
actual source of truth.

Also drops the `GRAPHQL_URL as _GRAPHQL_URL` import alias that only existed
to avoid a name collision with the (now removed) class attribute.
@deanq deanq changed the title refactor(constants): centralize Runpod URL env vars [AE-2946] refactor(urls): centralize Runpod URL env vars [AE-2946] Apr 22, 2026
@deanq deanq changed the title refactor(urls): centralize Runpod URL env vars [AE-2946] refactor(urls): centralize Runpod URL env vars Apr 22, 2026
… [AE-2946]

Self-review cleanup:

- Move CONSOLE_BASE_URL / CONSOLE_URL from core/resources/constants.py
  into core/urls.py — they are Runpod service URLs like the others, with
  the same env-override pattern, and splitting them across two files had
  no principled basis.
- Drop the dead __all__ and cargo-cult 'backward compat' comment from
  core/api/runpod.py. __all__ only affects wildcard imports; it did
  nothing for the actual test imports.
- Remove the unused RUNPOD_API_BASE_URL import from core/api/runpod.py.
  It was only imported to populate that fake __all__.
- Point tests/unit/core/api/test_runpod_graphql_extended.py at
  core/urls.py for RUNPOD_REST_API_URL instead of re-exporting through
  core/api/runpod.py.
- Collapse the 3-line os.environ.get(...).rstrip() split in
  stubs/dependency_resolver.py codegen into a single readable line.
  rstrip dropped — the default is clean and the env is internal.
Copy link
Copy Markdown
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review — PR 325

Finding 1: PR description says HAPI auto-selects dev URL; the code doesn't

The PR description states:

_resolve_hapi_base_url — now checks RUNPOD_ENV=dev first, then inspects ENDPOINT_BASE_URL for dev-api. prefix.

urls.py does neither:

HAPI_BASE_URL: str = os.environ.get(
    "RUNPOD_HAPI_BASE_URL", "https://hapi.runpod.net"
).rstrip("/")

It reads one env var with a hard prod default. No RUNPOD_ENV check, no dev-api. inspection. The old _resolve_hapi_base_url() auto-detected dev from RUNPOD_API_BASE_URL; that logic is gone. A dev/staging environment that previously relied on auto-detection will silently use the prod HAPI endpoint unless RUNPOD_HAPI_BASE_URL is also set. Either the PR description is wrong, or the RUNPOD_ENV=dev check is missing from the implementation.


Finding 2: Import-time capture is documented in one test, not enforced

All URL constants are module-level assignments that capture env vars at import time. test_dotenv_loading.py correctly clears runpod_flash.core.urls before re-importing to test overrides. But any other test that patches a URL env var without clearing the module will read stale values with no obvious failure — the test passes because the env var is set, but the constant in urls.py is already baked from the earlier import. A module-level comment in urls.py noting "these constants are captured at import time; tests must clear this module before patching URL env vars" would help future test authors avoid the trap.


Finding 3: No test for the dependency_resolver.py codegen change

The generated dispatcher now emits:

_base = _os.environ.get("RUNPOD_ENDPOINT_BASE_URL", "https://api.runpod.ai/v2")
_url = f"{_base}/{_endpoint_id}/runsync"

This is the right fix, but the test suite has no automated case verifying that the generated snippet compiles and uses the env var. The PR test plan marks "Codegen smoke test" as checked, but it isn't in the test suite. A regression here (e.g., the _os alias not being set up before _base is referenced) would only be caught at runtime in a worker.


The request_logs.py URL fix is correct — ENDPOINT_BASE_URL already includes /v2, so {ENDPOINT_BASE_URL}/{endpoint_id}/status/{request_id} produces the right path. The import migration across 21 sites is mechanical and consistent.

@deanq deanq removed the request for review from rutvik-runpod April 22, 2026 22:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/runpod_flash/stubs/dependency_resolver.py
Comment thread src/runpod_flash/endpoint.py
Comment thread src/runpod_flash/core/urls.py
@deanq
Copy link
Copy Markdown
Member Author

deanq commented Apr 23, 2026

@runpod-Henrik — addressing the three findings:

Finding 1 (HAPI auto-detection removed silently): Addressed two ways.

  • Removed the implicit coupling; HAPI host is now driven only by RUNPOD_HAPI_URL.
  • Added _check_runpod_env_without_overrides() at src/runpod_flash/core/urls.py:197-220 that fires a RuntimeWarning when RUNPOD_ENV is set to a non-prod value but no URL envs are overridden — catches exactly the silent-dev-to-prod case you described. RUNPOD_URL_MIXED_OK=1 opt-out for legitimate mixed setups.
  • PR description updated to remove the stale "auto-detects dev" claim and describe the new design.
  • Tests at tests/unit/core/test_urls.py::TestRunpodEnvWithoutOverrides (4 cases).

Finding 2 (import-time capture needs test-author comment): Addressed.

  • Added a test-authors note in the module docstring at src/runpod_flash/core/urls.py:37-43: "all URL constants are captured at module import time. A test that sets a URL env var via monkeypatch.setenv will not affect the already-imported constants. To test override behavior, delete runpod_flash.core.urls from sys.modules and re-import."
  • Points to tests/unit/core/test_urls.py::_reload_urls_module as the canonical pattern.

Finding 3 (no codegen test): Addressed.

  • tests/unit/test_dependency_resolver.py::test_stub_reads_endpoint_base_url_env asserts the generated stub contains RUNPOD_ENDPOINT_BASE_URL and the default URL string.
  • tests/unit/test_dependency_resolver.py::test_stub_builds_runsync_from_env_base compiles the generated stub and verifies the runtime env-read shape.

Additional hardening since your review:

  • _env_url treats empty/whitespace as unset; validates scheme/netloc/port; raises on malformed input.
  • _endpoint_domain_from_base_url raises on non-empty input with empty netloc instead of silently falling back to prod.
  • RUNPOD_URL_MIXED_OK=1 opt-out flag for both warnings.
  • _URL_PROFILE single-sources default vs resolved URL pairing (no drift between DEFAULT_* constants and the partial-override check).

All 4 Copilot threads addressed and resolved. Quality gate green (2658 unit + 53 serial tests, 85.91% coverage).

Dev environment setup documented: Flash Dev Environment Setup.

@deanq deanq requested review from KAJdev and jhcipar April 23, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants