Skip to content

Proposal: Centralized Configuration System for Xagent #259

@tanbro

Description

@tanbro

Proposal: Centralized Configuration System for Xagent

Status: Draft
Author: @tanbro
Created: 2026-04-03
Related Issues: #243, #246, #252
Related PRs: #235, #247


TL;DR

Problem: Xagent's configuration is scattered across 20+ files with inconsistent environment variable naming (LANCEDB_DIR vs XAGENT_UPLOADS_DIR), hardcoded paths, and no type safety. PR #247 improves this but doesn't go far enough.

Solution: A centralized configuration system with:

  • Single entry point - from xagent.config import PathConfig, DatabaseConfig, SandboxConfig
  • Type-safe - IDE autocomplete, validation at startup
  • Unified naming - All env vars use XAGENT_* prefix with __ for nesting
  • Config file + Environment variables - TOML for complex deployments, env vars for containers/secrets
  • Internally modular - Independent modules per domain, still centrally managed

Breaking Changes: All env var names will change to follow XAGENT_* prefix convention. A transition period with deprecation warnings will be provided.

Key Decision Points Requiring Community Discussion:

  1. 🤔 Internal organization: Single monolithic class vs. modular xagent/config/ package?
  2. 🤔 Config file location: ./config.toml (project) vs /etc/xagent/config.toml (system)?
  3. 🤔 Config file format: TOML (recommended) vs YAML vs JSON5?

Recommended: Approach B (Pydantic-First) - Score 9.4/10. Best balance of type safety, unified naming, and modular organization support.

Migration Effort: Medium-High (more than incremental fixes, but solid foundation for future growth)


Executive Summary

Xagent currently suffers from a fragmented configuration architecture that causes path inconsistencies, environment variable handling problems, and deployment complexity. This proposal analyzes the current state, identifies pain points, and defines a centralized configuration system.

Proposal Objectives

  1. Single source of truth for all configuration

    • Eliminate scattered os.getenv() calls across the codebase
    • Resolve circular dependency issues (core vs web modules)
    • Make configuration changes maintainable
  2. IDE support and type safety

    • Enable autocomplete for configuration options
    • Catch configuration errors at type-check time, not runtime
    • Provide clear type information for each configuration value
  3. Configuration file + Environment variables

    • Support TOML configuration files for complex deployments
    • Environment variables for container/secrets (with precedence)
  4. Unified environment variable naming

    • All env vars follow XAGENT_* prefix convention
    • Nested config uses __ separator (e.g., XAGENT_PATHS__STORAGE_ROOT)
    • Deprecation warnings for old naming schemes
  5. Modular organization (proposed for discussion)

    • Configuration organized by domain (paths, database, sandbox, etc.)
    • Independent development - no merge conflicts on single config file
    • Clear separation of concerns with namespace access (e.g., config.paths.storage_root)

Part 1: Problem Analysis

1.1 Current State Assessment

The Xagent codebase manages configuration through multiple disconnected mechanisms:

Mechanism 1: Scattered Environment Variables

  • Configuration scattered across 20+ files using os.getenv() calls
  • No central documentation of available configuration options
  • No validation of configuration values
# Example from src/xagent/core/workspace.py:50
base_dir: str = "uploads"  # Hardcoded, ignores XAGENT_UPLOADS_DIR

# Example from src/xagent/web/api/files.py
UPLOADS_DIR = Path(os.getenv("XAGENT_UPLOADS_DIR", "uploads"))

Mechanism 2: Database-Backed Settings

  • system_settings table exists but is underutilized
  • Primarily used for internal flags rather than user-facing configuration

Mechanism 3: Module-Level Constants

  • Individual modules define their own configuration constants
  • Leads to duplication and inconsistency
# src/xagent/providers/vector_store/lancedb.py:66
def get_default_lancedb_dir():
    # Hardcoded path computation duplicated from config
    return Path(os.getenv("LANCEDB_DIR", "~/.xagent/data/lancedb")).expanduser()

1.2 Symptom Analysis

Symptom Evidence Impact
Path Inconsistency Issue #243: 6+ files use hardcoded "uploads" XAGENT_UPLOADS_DIR ignored in core modules
Naming Confusion Issue #252: LANCEDB_DIR vs LANCEDB_PATH Unclear which env var to use
Data Scattered Issue #246: Data in ./data, ~/.xagent, src/xagent/web/ Multiple bind-mounts needed for containers
Portability Issues PR #235: Absolute paths in database Breaks when changing XAGENT_UPLOADS_DIR
Configuration Drift PR #247: 20+ files need updating for config changes High maintenance cost

1.3 Dependency Issues

Circular Dependency Problem:

src/xagent/web/config.py  →  Defines UPLOADS_DIR
src/xagent/core/workspace.py  →  Needs uploads directory but CANNOT import from web/

This forces core modules to hardcode paths, creating the inconsistency problem.


1.4 Why PR #247 is Not Enough

Status: PR #247 (feat/unified-configuration-module) is in the feat/unified-configuration-module branch, not yet merged to main.

What PR #247 Does:

  • Creates src/xagent/core/config.py with 15+ configuration functions
  • Provides centralized access to path and sandbox configuration
  • Updates 20+ files to use the new functions
  • Maintains complete backward compatibility - no behavior or default value changes, even when there are obvious problems

What PR #247 Does NOT Solve:

Problem 1: Breaking Changes Required for Unified Naming

Unified XAGENT_* env var naming (FR-6) is fundamentally incompatible with existing env var names:

  • LANCEDB_DIRXAGENT_PATHS__LANCEDB_PATH
  • UPLOADS_DIRXAGENT_PATHS__UPLOADS_DIR
  • SANDBOX_IMAGEXAGENT_SANDBOX__IMAGE

Any solution that properly implements FR-6 will break existing deployments. The question is not whether to break compatibility, but how to manage the transition.

Problem 2: Hardcoded Variable Names and Configuration Reading Function Combination

# PR #247 implementation
UPLOADS_DIR = "XAGENT_UPLOADS_DIR"           # Hardcoded variable name
WEB_STATIC_DIR = "XAGENT_WEB_STATIC_DIR"     # Hardcoded variable name
# ... 15+ hardcoded constants

def get_uploads_dir() -> Path:
    env_dir = os.getenv(UPLOADS_DIR)         # Repeated pattern
    if env_dir:
        return Path(env_dir)
    return get_web_dir() / "uploads"         # Hardcoded default value

def get_web_static_dir() -> Path:
    env_dir = os.getenv(WEB_STATIC_DIR)      # Same repeated pattern
    if env_dir:
        return Path(env_dir)
    return get_web_dir() / "static"

Problems:

  • Each configuration item requires a new function, repeating the same pattern
  • Adding new config = adding new function = high maintenance cost
  • Environment variable names, default values, and validation logic are scattered
  • Cannot see all configuration items and their defaults at a glance

Problem 2: Hardcoded Default Values and Validation Rules

# Default values hardcoded in function body, inconsistent rules
def get_uploads_dir() -> Path:
    return web_dir / "uploads"              # Relative to web directory

def get_lancedb_path() -> Path:
    return Path("data/lancedb")              # Relative to CWD!
    # Comment admits: "Default to ./data/lancedb, which is **relative** to cwd"

def get_database_url() -> str:
    db_path = get_default_sqlite_db_path()   # Relative to home
    return f"sqlite:///{db_path}"

Problems:

  • Default values are scattered, cannot be viewed centrally
  • Path rules are inconsistent (source-relative vs cwd-relative vs home-relative)
  • Validation logic if env_var: return env_var is repeated 15+ times
  • Cannot express dependencies between configuration items

Problem 3: No Support for Nested Configuration

# PR #247 only supports flat configuration
get_storage_root()      # → Path
get_uploads_dir()       # → Path
get_sandbox_cpus()      # → Optional[int]  # Problem: caller must handle None
get_sandbox_memory()    # → Optional[int]  # Problem: caller must handle None
get_sandbox_env()       # → dict
get_sandbox_volumes()   # → list[tuple]

Cannot Express:

# Desired structured configuration
[paths]
storage_root = "~/.xagent"

[sandbox]
image = "xprobe/xagent-sandbox:latest"
[sandbox.resources]
cpus = 2
memory = "4g"

Missing Capabilities:

  • ❌ Dependencies/references between configuration items
  • ❌ Namespace grouping (sandbox.*)
  • ❌ Structured configuration for lists/dicts

Problem 4: Weak Type Checking

# Type errors only discovered at runtime
def get_sandbox_cpus() -> Optional[int]:
    env_str = os.getenv(SANDBOX_CPUS)
    if env_str:
        try:
            return int(env_str)               # Runtime conversion
        except ValueError:
            logger.warning(f"Invalid {SANDBOX_CPUS} value: {env_str}")
    return None  # Caller must handle None!

Problems:

  • Type errors only discovered at runtime
  • Caller must handle None even though there's a meaningful default
  • Cannot enforce ranges (cpus must be > 0)
  • IDE has no knowledge of default values and valid value ranges
  • Cannot validate at config load time, only at runtime

Problem 5: No Configuration File Support

# Only supports environment variables, no configuration file support
def get_uploads_dir() -> Path:
    env_dir = os.getenv(UPLOADS_DIR)  # Only source
    if env_dir:
        return Path(env_dir)
    return web_dir / "uploads"

Missing:

  • ❌ Configuration file support (TOML/YAML/JSON/...)
  • ❌ Multi-level precedence (env vars > config file > defaults)

1.5 Root Cause Analysis

Symptom Root Cause PR #247 Fixes It?
Path inconsistency Wrong configuration definition approach, not lack of centralization ❌ Improvement only, not fundamental fix
No type checking Lack of type-safe configuration definition ❌ No type checking
No config file Architecture doesn't support file loading ❌ No file loading
Cannot nest Function-based API is fundamentally flat ❌ No nesting support

Conclusion: PR #247 is an "improvement" but doesn't fundamentally solve the configuration management problem.

What's really needed: Transition from "function-based API" to "declarative configuration definition".



Part 2: Pain Points

2.1 Development Pain Points

  1. No Single Source of Truth

    • Finding where a configuration value is set requires grepping the entire codebase
    • Changing default behavior requires updating multiple files
  2. No Type Safety

    • os.getenv() returns Optional[str] with no validation
    • Type errors only discovered at runtime
  3. Poor Developer Experience

    • No autocomplete for configuration options
    • No inline documentation when accessing config values

2.2 Deployment Pain Points

  1. Environment Variable Overload

    • 15+ environment variables to manage
    • No configuration file support for complex deployments
  2. Container Deployment Complexity

    • Data scattered across multiple directories requires multiple bind-mounts
    • Path resolution depends on working directory
  3. Configuration Validation

    • Invalid configuration only discovered at runtime
    • No startup validation to catch misconfiguration early

2.3 Operational Pain Points

  1. Configuration Documentation

    • example.env is comprehensive but separated from code
    • No programmatic way to list all available configuration
  2. Migration Path

    • No clear path for evolving configuration over time
    • Risk of breaking existing deployments

Part 3: Requirements

3.1 Functional Requirements (Aligned with Project Goals)

FR-1: Centralized Configuration

Goal: Single source of truth

  • Req: Single module provides all configuration access
  • Why: Eliminates configuration scatter and resolves circular dependencies
  • Acceptance:
    • All configuration access goes through xagent.config module
    • No direct os.getenv() calls in business logic
    • Single import point: from xagent.config import get_uploads_dir, get_storage_root, ...
    • Configuration module has no internal xagent dependencies (NFR-1)

Related Issues: #243 (path inconsistency), #252 (naming confusion)


FR-2: Type Hints & Type Checking

Goal: IDE support and type safety

  • Req: Configuration values have type hints and are validated
  • Why: Current os.getenv() returns Optional[str] with no validation
  • Acceptance:
    • All configuration functions have proper type hints
    • Type errors caught at type-check time (mypy) and config load time
    • IDE autocomplete shows available configuration options
    • Return types are specific (e.g., Path, not str)
    • Generic type support for complex configuration

Example:

# Before (no type hints, no autocomplete)
uploads_dir = os.getenv("XAGENT_UPLOADS_DIR", "uploads")  # type: str | None

# After (type hints, autocomplete)
def get_uploads_dir() -> Path:
    """Get the uploads directory path."""
    ...
# IDE shows: get_uploads_dir() -> Path

FR-3: Configuration File + Environment Variables

Goal: Multi-source configuration loading

  • Req: Support both configuration files and environment variables with precedence
  • Why:
    • Configuration files for complex deployments and documentation
    • Environment variables for containers/secrets/overrides
  • Acceptance:
    • Configuration file support: TOML (primary format)
    • Precedence: Environment variables > Config file > Code defaults
    • Multiple config file locations: project overrides user overrides system

Configuration File Locations (loaded in order, later overrides earlier):

  1. /etc/xagent/config.toml (system-wide, lowest precedence)
  2. ~/.config/xagent/config.toml (user-specific, medium precedence)
  3. ./config.toml (project-specific, highest precedence)

Example Config File:

# config.toml
[paths]
storage_root = "~/.xagent"
uploads_dir = "~/.xagent/uploads"
lancedb_path = "~/.xagent/lancedb"

[logging]
level = "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL

Environment Variable Override:

# Override any config value via environment variable
export XAGENT_PATHS__STORAGE_ROOT=/data
export XAGENT_LOG_LEVEL=DEBUG

FR-4: Path Configuration Centralization

Goal: Resolve path inconsistency issues


FR-5: Sensitive Configuration Handling

Goal: Secure handling of keys and passwords

  • Req: Special handling for sensitive configuration values (API keys, passwords, secrets)
  • Why: Secrets should never appear in logs, error messages
  • Acceptance:
    • Automatic redaction of sensitive values in logs and error messages
    • Sensitive values only from environment variables (never from config files)
    • Clear documentation marks which values are sensitive
    • Optional: Integration with secret managers (future)

FR-6: Unified Environment Variable Naming

Goal: Consistent, predictable environment variable naming

  • Req: All environment variables follow XAGENT_* prefix convention with nested support via __
  • Why: Current env vars are inconsistent (LANCEDB_DIR vs XAGENT_UPLOADS_DIR), no unified prefix
  • Acceptance:
    • All new env vars use XAGENT_* prefix
    • Nested configuration uses __ separator (e.g., XAGENT_PATHS__STORAGE_ROOT)
    • This is a breaking change - old env var names will not work
    • Migration guide documents old → new name mappings

Industry Convention Verification:

System Nested Delimiter Example
pydantic-settings __ (default) export MY_APP__DATABASE__HOST=localhost
dynaconf __ export DYNACONF_NESTED__LEVEL__KEY=1
Flask config No native support Uses flat class attributes
Django settings No native support Uses flat module-level variables

Key Implementation Notes:

  • pydantic-settings: Uses env_nested_delimiter (defaults to __)
  • Environment variables are always SCREAMING_SNAKE_CASE (industry standard)
  • On Windows, env vars are always case-insensitive (Python os module limitation)

Example Migration:

# Old (inconsistent, deprecated)
LANCEDB_DIR=~/.xagent/lancedb
UPLOADS_DIR=./uploads
XAGENT_UPLOADS_DIR=./uploads  # inconsistent with LANCEDB_*

# New (unified, recommended)
XAGENT_PATHS__STORAGE_ROOT=~/.xagent
XAGENT_PATHS__UPLOADS_DIR=~/.xagent/uploads
XAGENT_PATHS__LANCEDB_PATH=~/.xagent/lancedb

FR-7: Migration Path

Goal: Smooth transition for existing deployments

  • Req: Clear migration path from old env var names to new unified naming
  • Why: FR-6 introduces breaking changes (renaming all env vars)
  • Acceptance:
    • Transition period: old env var names supported with deprecation warnings
    • Migration guide with old → new mappings (e.g., LANCEDB_DIRXAGENT_PATHS__LANCEDB_PATH)
    • Tooling to help identify deprecated configuration usage
    • Clear documentation of breaking changes and deprecation timeline
    • Note: This is not backward compatibility - it's a structured transition to new naming

FR-8: Modular Organization (Proposed - Needs Discussion)

Goal: Independent development and maintenance of configuration domains

  • Req: Configuration organized by domain/feature into independent modules
  • Why:
    • Multiple developers can work on different config areas without merge conflicts
    • Clear separation of concerns (paths, database, sandbox, LLM, etc.)
    • Easier to locate and modify specific configuration
    • Scalable for growing projects
  • Acceptance:
    • Configuration organized under xagent.config package
    • Each domain has its own config class (e.g., PathConfig, DatabaseConfig)
    • Access via namespace: from xagent.config import PathConfig, DatabaseConfig
    • Usage: paths = PathConfig(); print(paths.storage_root)
    • No circular dependencies between config modules

Example Structure:

xagent/config/
  __init__.py          # Aggregates all config modules
  paths.py             # PathConfig for storage, uploads, etc.
  database.py          # DatabaseConfig for DB connections
  sandbox.py           # SandboxConfig for container config
  llm.py               # LLMConfig for model providers

Usage:

# Option 1: Direct import
from xagent.config import PathConfig, DatabaseConfig

# Usage
paths = PathConfig()
database = DatabaseConfig()
storage_root = paths.storage_root

# Option 2: Namespace access
from xagent import config
storage_root = config.paths.storage_root

Not All Approaches Support This:

Approach Modular Support Notes
pydantic-settings ✅✅✅ Each module independent BaseSettings
Hydra ✅✅ Designed for config composition
dynaconf Supports multiple validators/sections
Flask style ⚠️ Only via inheritance, not true modular
Django style ⚠️ django-split-settings splits files, not namespaces
PR #247 ⚠️ Can have separate functions, no class organization

3.2 Requirements Summary (Goal Alignment)

Goal Requirement(s) Priority
Centralized Configuration FR-1 (Centralized Interface), FR-4 (Path Centralization) P0
Type Hints & Type Checking FR-2 (Type Hints & Type Checking) P0
Multi-Source Loading FR-3 (Multi-Source Loading) P0
Path Centralization FR-4 (Path Configuration) P0
Unified Naming FR-6 (Unified Environment Variable Naming) P0
Modular Organization FR-8 (Modular Organization) P0
Security Handling FR-5 (Sensitive Configuration) P1
Migration Path FR-7 (Migration Path) P0

Priority Legend:

  • P0 (Critical): Must have — blocks deployment without it
  • P1 (High): Should have — significant impact if missing
  • P2 (Future): Optional — deferred to future release

3.3 Non-Functional Requirements

NFR-1: No Circular Dependencies (P0)

  • Configuration module must not import from other xagent modules
  • Core modules need configuration without circular imports

NFR-2: Performance (P1)

  • Configuration loaded once at startup, cached in memory
  • Fast access (< 1μs per access)

NFR-3: Testability (P1)

  • Clear mechanism for overriding config in tests
  • Test fixtures for common scenarios

NFR-4: Error Handling (P1)

  • Clear error messages for invalid configuration
  • Early validation at startup (fail-fast)

NFR-5: Maintainability (P2)

  • Easy to add new configuration options
  • Clear patterns for defining new values

3.2 Non-Functional Requirements

NFR-1: No Circular Dependencies

  • Req: Configuration module must not import from other xagent modules
  • Why: Core modules need configuration without creating circular imports
  • Acceptance:
    • xagent.config has no internal xagent dependencies
    • Can be imported by any module without import cycles
    • Uses only standard library and external dependencies (pydantic, etc.)

NFR-2: Performance

  • Req: Configuration access must be fast with minimal overhead
  • Why: Configuration accessed frequently during execution
  • Acceptance:
    • Configuration loaded once at startup
    • In-memory caching (no repeated file I/O)
    • Fast attribute access (< 1μs per access)
    • Lazy loading for expensive resources (e.g., network connections)

NFR-3: Testability

  • Req: Configuration must be easily mockable/overrideable for testing
  • Why: Tests need to run with different configurations without side effects
  • Acceptance:
    • Clear mechanism for overriding config in tests
    • Test fixtures for common configuration scenarios
    • Isolated test configuration (no file system pollution)
    • Easy to reset configuration between tests

NFR-4: Error Handling (P1)

  • Clear error messages for invalid configuration
  • Early validation at startup (fail-fast)

NFR-5: Maintainability (P2)

  • Easy to add new configuration options
  • Clear patterns for defining new values

Part 4: Configuration Scope

Based on the analysis and goals, the centralized configuration system should cover:

4.1 Path Configuration (P0 - Critical)

  • XAGENT_STORAGE_ROOT - Root for all data (~/.xagent)
  • XAGENT_UPLOADS_DIR - User upload files
  • XAGENT_WEB_STATIC_DIR - Static web assets
  • XAGENT_LANCEDB_PATH - Vector database path
  • XAGENT_DATABASE_URL - Primary database connection
  • XAGENT_EXTERNAL_UPLOAD_DIRS - Additional upload directories
  • XAGENT_EXTERNAL_SKILLS_LIBRARY_DIRS - Additional skills directories

4.2 Service Configuration (P0 - Critical)

  • XAGENT_LOG_LEVEL - Logging verbosity (DEBUG/INFO/WARNING/ERROR/CRITICAL)
  • Database connection settings
  • Memory store configuration (LanceDB vs in-memory)

4.3 Sandbox Configuration (P1 - High)

  • SANDBOX_IMAGE - Container image
  • SANDBOX_CPUS, SANDBOX_MEMORY - Resource limits
  • SANDBOX_ENV, SANDBOX_VOLUMES - Mounts and environment

4.4 External Service Configuration (P1 - High)

  • LLM provider API keys and endpoints
  • Embedding provider settings
  • Search provider credentials

Note: These are typically stored as environment variables (secrets), not in config files


Part 5: Configuration System Options (Neutral Analysis)

This section provides a neutral comparison of common Python configuration management approaches, without assuming which is best for xagent.

5.1 Overview of Configuration Approaches

Approach Type Dependencies Complexity Industry Adoption
1. pydantic-settings Declarative, class-based pydantic, pydantic-settings Medium High (FastAPI, modern apps)
2. Flask Style Class-based inheritance None Low High (Flask ecosystem)
3. Django Style Module-based None Low-Medium High (Django ecosystem)
4. dynaconf Multi-format loader dynaconf Medium Medium
5. Hydra Config composition hydra, omegaconf High Medium (ML/Research)
6. Current PR #247 Function-based None Low N/A (xagent-specific)

5.2 Detailed Analysis

Option 1: pydantic-settings

Description: Type-safe configuration using Pydantic models with validation.

Example:

from pathlib import Path
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field

class XagentConfig(BaseSettings):
    """Complete xagent configuration."""
    model_config = SettingsConfigDict(
        env_file=".env",
        env_nested_delimiter="__",  # XAGENT_PATHS__STORAGE_ROOT
        env_prefix="XAGENT_",  # Remove prefix from env var names
    )

    storage_root: Path = Path.home() / ".xagent"
    uploads_dir: Path = storage_root / "uploads"
    log_level: str = Field(default="INFO", pattern="^DEBUG|INFO|WARNING|ERROR|CRITICAL$")

config = XagentConfig()
# Access: config.storage_root, config.uploads_dir

Environment Variable Mapping:

# With env_prefix="XAGENT_" and env_nested_delimiter="__"
export XAGENT_STORAGE_ROOT=/data
export XAGENT_PATHS__STORAGE_ROOT=/data  # If using nested models
export XAGENT_LOG_LEVEL=DEBUG

Characteristics:

  • Type safety with automatic validation
  • Fail-fast at startup
  • IDE autocomplete support
  • Nested model support with __ delimiter
  • Multi-source loading (env, .env, JSON, CLI)

Trade-offs:

  • (+) Strong typing and validation
  • (+) Self-documenting
  • (+) Industry standard for modern Python
  • (+) Native __ nested delimiter support
  • (-) Additional dependency
  • (-) Learning curve
  • (-) Validation adds startup time

References: Pydantic Settings Documentation


Option 2: Flask Style Config

Description: Class-based configuration with inheritance for environment separation.

Example:

class Config:
    DEBUG = False
    SECRET_KEY = 'default-key'
    DATABASE_URL = 'sqlite:///app.db'

class DevelopmentConfig(Config):
    DEBUG = True
    DATABASE_URL = 'sqlite:///dev.db'

class ProductionConfig(Config):
    DEBUG = False
    SECRET_KEY = os.getenv('SECRET_KEY')

# Usage
config = DevelopmentConfig

Characteristics:

  • Simple class attributes
  • Environment inheritance
  • No external dependencies
  • Environment-based separation

Trade-offs:

  • (+) Simple and intuitive
  • (+) Zero dependencies
  • (+) Easy environment separation
  • (+) Type hints supported (mypy compatible)
  • (-) No runtime validation
  • (-) Manual env var handling
  • (-) Config is Python code that can execute arbitrary logic

References: Flask Configuration


Option 3: Django Style Settings

Description: Single module with Python-based configuration, optionally split across files.

Example:

# settings.py
DEBUG = False
SECRET_KEY = 'your-secret-key'
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': BASE_DIR / 'db.sqlite3',
    }
}

# Or using django-split-settings
from split_config.tools import include, optional

include(
    'components/base.py',
    'components/database.py',
    optional('local_config.py'),
)

Characteristics:

  • Python module-based
  • Supports complex nested structures
  • File splitting for organization
  • Programmable configuration

Trade-offs:

  • (+) Flexible and powerful
  • (+) No dependencies (base)
  • (+) Supports complex structures
  • (+) Type hints supported (mypy compatible)
  • (-) No runtime validation
  • (-) Config is Python code that can execute arbitrary logic

References: django-split-settings


Option 4: dynaconf

Description: Multi-format configuration loader with remote config support.

Example:

from dynaconf import Dynaconf

settings = Dynaconf(
    settings_files=['settings.yaml', '.secrets.yaml'],
    environments=True,
    env='development'
)

# Access
settings.database_url
settings.get('database_url', 'default')

Characteristics:

  • Multi-format (YAML, TOML, JSON, INI, .env)
  • Environment separation (dev, staging, prod)
  • Remote config support (Redis, Vault)
  • Variable expansion

Trade-offs:

  • (+) Multiple format support
  • (+) Remote configuration
  • (+) Variable expansion
  • (-) Additional dependency
  • (-) No built-in type validation
  • (-) Learning curve

Option 5: Hydra

Description: Configuration composition framework for complex applications.

Example:

import hydra
from omegaconf import DictConfig

@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(cfg.database.url)
    print(cfg.model.learning_rate)

Characteristics:

  • Powerful config composition
  • Experiment tracking
  • Command-line override
  • Multi-config file combination

Trade-offs:

  • (+) Excellent for complex configs
  • (+) Experiment tracking
  • (+) CLI overrides
  • (-) Steep learning curve
  • (-) Overkill for simple apps
  • (-) Heavy dependency

Option 6: Current PR #247 (Function-Based)

Description: Function-based API with hardcoded defaults and environment variable reading.

Example:

def get_uploads_dir() -> Path:
    env_dir = os.getenv("XAGENT_UPLOADS_DIR")
    if env_dir:
        return Path(env_dir)
    return get_web_dir() / "uploads"

def get_sandbox_cpus() -> Optional[int]:  # Problematic: caller must handle None
    env_str = os.getenv("SANDBOX_CPUS")
    if env_str:
        try:
            return int(env_str)
        except ValueError:
            logger.warning(f"Invalid SANDBOX_CPUS: {env_str}")
    return None  # Should have a default, not None

Characteristics:

  • Function-based access
  • Hardcoded defaults per function
  • Runtime validation (try/except)
  • No nested support
  • Problem: Optional[int] forces callers to handle None even when there's a meaningful default

Trade-offs:

  • (+) Simple, zero dependencies
  • (+) Type hints on return values
  • (+) Backward compatible
  • (-) Code repetition
  • (-) No config file support
  • (-) No nested configuration
  • (-) Defaults scattered

5.3 Comparison Matrix (Detailed)

Dimension pydantic-settings Flask Style Django Style dynaconf Hydra PR #247
Type Safety ✅✅✅ ⚠️ ⚠️
Validation ✅✅✅ ⚠️ ⚠️
Config Files ✅✅✅ ✅✅
Env Separation ✅✅✅ ✅✅✅
Nested Config ✅✅ ⚠️ ✅✅✅
Nested Env Var Delimiter ✅✅✅ ✅✅✅
Modular Organization ✅✅✅ ⚠️ ⚠️ ✅✅ ⚠️
Zero Deps
Learning Curve Medium Low Low Medium High Low
Industry Adoption High (growing) High (Flask) High (Django) Medium Medium (ML) N/A

Legend: ✅✅✅ Excellent | ✅ Good | ⚠️ Partial | ❌ No Support

Notes:

  • Nested Env Var Delimiter: Native support for mapping env vars like XAGENT_PATHS__STORAGE_ROOT to nested config config.paths.storage_root
    • __ (double underscore) is the de facto standard (pydantic-settings, dynaconf)
    • Hydra uses . via CLI, not env vars
    • Flask/Django require manual parsing

5.4 Recommendation Framework

When choosing an approach, consider:

  1. Team familiarity: What patterns does the team know?
  2. Existing code: PR feat: unified configuration module for all path-related settings #247 uses function-based API
  3. Requirements priority: Which goals are most important?
  4. Migration cost: How much code needs to change?
  5. Dependency tolerance: Can we add new dependencies?

No approach is universally "best" — the choice depends on xagent's specific constraints and priorities.


Part 6: Proposed Solution Approaches

Based on the analysis above, three implementation approaches are considered for xagent:

6.1 Approach A: Enhanced PR #247 (Function-Based)

Technical Design

# xagent/config.py
from pathlib import Path
from typing import Optional

# Simple function-based API with validation decorators
def validate_path(value: str) -> Path:
    """Validate and normalize a path configuration value."""
    path = Path(value).expanduser().resolve()
    # Validation logic here
    return path

@config(default="~/.xagent", validator=validate_path)
def get_storage_root() -> Path:
    """Get the storage root directory."""
    env_value = os.getenv("XAGENT_STORAGE_ROOT")
    if env_value:
        return validate_path(env_value)
    return Path.home() / ".xagent"

Pros

Cons

  • Limited Validation: Validation must be manually implemented
  • No Type Safety: Type errors only discovered at runtime
  • Scattered Logic: Each function has its own validation logic
  • Poor IDE Support: No autocomplete for configuration options
  • Limited Extensibility: Adding nested configuration is awkward

Risk Assessment

Risk Probability Impact Mitigation
Validation inconsistencies Medium Medium Create validation utility functions
Type errors in production High Low Comprehensive testing
Poor developer experience Low Low Documentation

Overall Risk Level: LOW


6.2 Approach B: Pydantic-Based Settings (Transformative)

Based on: Option 1 (pydantic-settings) from the analysis above

Technical Design

# xagent/config.py
from pathlib import Path
from pydantic import Field, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict

class PathConfig(BaseSettings):
    """Path configuration with validation."""
    model_config = SettingsConfigDict(env_prefix="xagent_paths_")

    storage_root: Path = Path.home() / ".xagent"
    uploads_dir: Path = storage_root / "uploads"
    lancedb_path: Path = storage_root / "lancedb"

class XagentConfig(BaseSettings):
    """Complete xagent configuration."""
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        env_nested_delimiter="__",  # XAGENT_PATHS__STORAGE_ROOT
    )

    paths: PathConfig = Field(default_factory=PathConfig)
    log_level: str = Field(default="INFO", pattern="^DEBUG|INFO|WARNING|ERROR|CRITICAL$")

# Usage
config = XagentConfig()
storage_root = config.paths.storage_root

Pros

  • Runtime Validation: Automatic type coercion, range checks, pattern matching
  • Excellent Error Messages: Clear, actionable validation errors at startup
  • IDE Support: Autocomplete for all configuration options
  • Nested Configuration: Natural support for structured config
  • Industry Standard: Battle-tested in production systems
  • Extensibility: Easy to add new configuration options
  • Documentation: Self-documenting via Field descriptions
  • Type Safety: Type hints compatible with mypy (like Flask/Django styles)

Cons

  • High Risk: Requires significant code changes
  • Breaking Changes: Existing code must be updated
  • New Dependency: Adds pydantic-settings dependency
  • Learning Curve: Developers must learn Pydantic patterns
  • Migration Effort: All configuration access points need updating

Risk Assessment

Risk Probability Impact Mitigation
Breaking existing deployments High High Careful migration plan, extensive testing
Performance regression Low Low Pydantic is fast, caching possible
Developer learning curve Medium Low Documentation, examples
Dependency issues Low Medium Pin versions, test thoroughly

Overall Risk Level: HIGH


6.3 Approach C: Hybrid (Balanced)

Technical Design

# xagent/config.py (internal - pydantic-based)
from pathlib import Path
from pydantic_settings import BaseSettings

class _InternalSettings(BaseSettings):
    """Internal pydantic-based config for validation."""
    storage_root: Path = Path.home() / ".xagent"
    uploads_dir: Path = storage_root / "uploads"

# Singleton instance
_config = _InternalSettings()

# xagent/config.py (external - function-based for backward compatibility)
def get_storage_root() -> Path:
    """Get the storage root directory.
    
    Priority:
        1. XAGENT_STORAGE_ROOT environment variable
        2. Configuration file value
        3. Default: ~/.xagent
    """
    return _config.storage_root

def get_uploads_dir() -> Path:
    """Get the uploads directory.
    
    Priority:
        1. XAGENT_UPLOADS_DIR environment variable
        2. Configuration file value
        3. Default: {storage_root}/uploads
    """
    return _config.uploads_dir

# New object-based API (optional for gradual migration)
class config:
    """Object-based configuration access.
    
    Example:
        from xagent.config import PathConfig
        paths = PathConfig()
        storage_root = paths.storage_root
    """
    class paths:
        storage_root = property(lambda self: _config.storage_root)
        uploads_dir = property(lambda self: _config.uploads_dir)

Pros

  • Backward Compatible: Existing code continues to work
  • Type Safety: Internal pydantic validation
  • Gradual Migration: Can adopt object API over time
  • Low Risk: Function API maintains compatibility
  • Future-Proof: Foundation for full migration

Cons

  • Implementation Complexity: Maintains two APIs temporarily
  • Documentation Burden: Must document both styles
  • Potential Confusion: Two ways to access same data
  • Technical Debt: Temporary dual-API needs eventual cleanup

Risk Assessment

Risk Probability Impact Mitigation
API confusion Medium Low Clear documentation, examples
Maintenance burden Low Low Plan for deprecation timeline
Performance overhead Low Low Minimal overhead from indirection

Overall Risk Level: MEDIUM


6.4 Configuration File Format Selection

Format Pros Cons Recommendation
TOML ✅ Clean syntax for config
✅ Better than INI for nesting
✅ Growing Python support
✅ Type preservation
❌ Less common than YAML
❌ Limited expression support
RECOMMENDED
YAML ✅ Very popular in Python
✅ Powerful expressions
✅ Wide tool support
❌ Complex syntax
❌ Security concerns (unsafe load)
❌ Significant whitespace
SECONDARY
JSON5/JSONC ✅ JSON with comments/trailing commas
✅ Familiar to JS developers
✅ Wide tool support
❌ Less common in Python
❌ No built-in support
OPTIONAL
INI ✅ Simple, familiar
✅ Built-in Python support
❌ No nested structures
❌ No type preservation
FALLBACK
JSON ✅ Standard format
✅ Easy to parse
❌ No comments
❌ Verbose syntax
NOT RECOMMENDED

Recommendation: TOML (primary), with YAML as secondary, JSON5 as optional

Rationale:

  • TOML: Designed specifically for configuration files, clean syntax, Python 3.11+ built-in support (tomllib)
  • YAML: Widely used, powerful, but has security concerns with yaml.unsafe_load()
  • JSON5/JSONC: Good for projects with JS/TS frontend, allows comments and trailing commas
  • Python 3.11+ has built-in TOML support (tomllib)
  • Growing ecosystem adoption (pyproject.toml, etc.)

JSON5/JSONC Note:

  • JSON5 = JSON extended with comments, trailing commas, unquoted keys, etc.
  • JSONC = JSON with Comments (used by VS Code, TypeScript Compiler, etc.)
  • Both require external libraries (e.g., json5, jsoncomment) in Python

6.5 Migration Path Analysis

Approach A (Enhanced PR #247): Migration Effort

  • New Code: ~500 lines (validation, config file loading)
  • Modified Files: ~10 files (add validation decorators)
  • Testing Effort: Medium (test validation logic)
  • Relative Effort: ⭐ Fastest - Minimal changes to existing code

Approach B (Pydantic-First): Migration Effort

  • New Code: ~1000 lines (config classes, migration)
  • Modified Files: ~50 files (all config access points)
  • Testing Effort: High (test all changed code paths)
  • Relative Effort: ⭐⭐⭐ Highest - Most comprehensive refactoring

Approach C (Hybrid): Migration Effort

  • New Code: ~800 lines (internal config, function wrappers)
  • Modified Files: ~20 files (initial migration)
  • Testing Effort: Medium-High (test both APIs)
  • Relative Effort: ⭐⭐ Medium - Balance between A and B

5.8 Decision Framework

Updated Requirements Context:

  • FR-6: Unified XAGENT_* naming with __ for nested config (breaking change required)
  • FR-8: Modular organization required for independent development
  • FR-7: Migration path with deprecation warnings (not backward compatibility)

Based on the analysis, the recommended approach should maximize:

  1. Unified Naming & Nested Env Var Support (P0 requirements)
  2. Modular Organization (FR-8, P0 requirement)
  3. Future Extensibility
  4. Reasonable Implementation Risk

Updated Decision Matrix:

Criterion Weight A: Function B: Pydantic C: Hybrid
Unified Naming (__ separator) 25% 2 10 10
Modular Organization 25% 3 10 10
Type Safety & Validation 25% 4 10 8
Future Extensibility 15% 3 10 7
Implementation Risk 10% 10 4 7
Weighted Score 100% 4.1 9.1 8.8

Winner: Approach B (Pydantic-First) with 9.1/10

Rationale:
With FR-6 requiring unified XAGENT_* naming with __ separator AND FR-8 requiring modular organization:

  • Approach A (Function-based): Cannot easily support __ separator or true modular organization
  • Approach B (Pydantic): Native support for env_nested_delimiter="__" AND independent BaseSettings classes per module
  • Approach C (Hybrid): Supports all requirements but adds unnecessary complexity since breaking changes are acceptable

Key Differentiator: Modular Organization
Only pydantic-settings and Hydra truly support independent config modules. Given xagent's requirements, pydantic-settings is the clear winner.


Part 7: Decision Framework

This section provides a framework for choosing between the three approaches, without prescribing which is "best" for xagent.

7.1 Decision Criteria

Criterion Approach A Approach B Approach C
If type safety is the highest priority ⚠️
If minimal dependencies is required
If fast delivery is important ⚠️
If long-term maintainability matters ⚠️
If team knows pydantic ⚠️
If risk tolerance is low ⚠️

Note: All approaches require breaking changes to implement FR-6 (unified XAGENT_* naming). Backward compatibility is not a differentiator.

7.2 Questions to Guide Decision

  1. Is xagent willing to add pydantic as a dependency?

    • Yes → Consider Approach B or C
    • No → Approach A only
  2. What is the team's familiarity with pydantic?

    • High → Approach B or C
    • Low → Approach A (or C with training)
  3. What is the team's capacity and urgency?

    • Need quick wins with minimal changes → Approach A
    • Have capacity for proper refactoring → Approach B or C
  4. What is the long-term vision for xagent?

    • Stay simple, minimal dependencies → Approach A
    • Modern Python best practices → Approach B or C

7.3 Hybrid Approach (C) Consideration

Approach C is often chosen when:

  • Backward compatibility is non-negotiable
  • Team wants to adopt pydantic incrementally
  • Risk tolerance is medium
  • Long-term vision includes full pydantic migration

Trade-off: Higher initial complexity (two APIs) for smoother migration path.

7.4 Example Decision Outcomes

Scenario Recommended Approach
Startup phase, moving fast, minimal deps A
Established product, breaking changes acceptable B
Large team, gradual migration required C
New project, no legacy code B

For xagent: Approach B is recommended because:

  • Breaking changes are required for unified XAGENT_* naming (FR-6)
  • Modular organization is required (FR-8)
  • pydantic-settings provides native support for __ nested delimiter and independent config modules

Part 8: Implementation Considerations

8.1 Configuration File Format

See detailed analysis in Part 5, Section 5.2 for configuration file format options including TOML (recommended), YAML (secondary), and JSON5/JSONC (optional).

8.2 Configuration File Locations

For a multi-user web service like xagent, configuration file locations differ from desktop applications:

Options:

  • ./config.toml — Project-specific configuration (for development)
  • /etc/xagent/config.toml — System-wide configuration (for production deployments)

8.3 Environment Variable Naming Migration

Current State (Inconsistent):

Old Name Notes
LANCEDB_DIR No prefix
XAGENT_UPLOADS_DIR Has prefix, inconsistent with LANCEDB_*
UPLOADS_DIR No prefix, conflicts with above
SANDBOX_IMAGE No prefix
OPENAI_API_KEY Third-party, no prefix

Target State (Unified):

New Name Old Name(s) Deprecation
XAGENT_PATHS__STORAGE_ROOT N/A (new) -
XAGENT_PATHS__UPLOADS_DIR XAGENT_UPLOADS_DIR, UPLOADS_DIR ⚠️ Warning
XAGENT_PATHS__LANCEDB_PATH LANCEDB_DIR, LANCEDB_PATH ⚠️ Warning
XAGENT_PATHS__DATABASE_URL DATABASE_URL ⚠️ Warning
XAGENT_SANDBOX__IMAGE SANDBOX_IMAGE ⚠️ Warning
XAGENT_SANDBOX__CPUS SANDBOX_CPUS ⚠️ Warning
OPENAI_API_KEY OPENAI_API_KEY ✅ Keep (external)

Implementation Notes:

  • The __ delimiter is an industry standard used by both pydantic-settings and dynaconf
  • Old env var names should be supported with deprecation warnings for at least 2 minor releases

Implementation Strategy:

  1. Define new XAGENT_* env vars with pydantic-settings
  2. Support old env var names with deprecation warnings (using aliases or custom validators)
  3. Document migration path clearly
  4. Provide migration tooling (optional)

8.4 Module Organization

Proposed Structure:

src/xagent/config/
  __init__.py          # Aggregates all config modules, exports `paths`, `database`, etc.
  paths.py             # PathConfig class
  database.py          # DatabaseConfig class
  sandbox.py           # SandboxConfig class
  llm.py               # LLMConfig class
  ...

Import Style:

# Import classes
from xagent.config import PathConfig, DatabaseConfig
paths = PathConfig()
database = DatabaseConfig()

Migration Path:

  1. Create new src/xagent/config/ package (replacing src/xagent/core/config.py)
  2. Update all imports incrementally
  3. Remove old core/config.py after migration complete

8.5 Nested Delimiter Convention Research

Research Summary:

The __ (double underscore) convention for nested environment variables is verified as an industry standard:

System Delimiter Verification
pydantic-settings __ Uses env_nested_delimiter (default: __)
dynaconf __ Official docs show export DYNACONF_NESTED__LEVEL__KEY=1
Flask config N/A No native nested delimiter (flat structure)
Django settings N/A No native nested delimiter (flat structure)

Sources:

Key Finding: Both pydantic-settings and dynaconf use __ as the nested delimiter, making it a de facto standard for Python configuration management with environment variables.


8.6 Implementation Phases

Phase 1: Foundation

Goal: Create core configuration infrastructure with backward compatibility

Estimated Effort: ~30% of total effort

Deliverables:

  1. Create xagent/config module with:
    • Internal pydantic config classes
    • Function-based wrapper API for backward compatibility
    • Configuration file loading (TOML primary, YAML/JSON5 optional)
  2. Add pydantic-settings to dependencies
  3. Create comprehensive test suite for config module
  4. Update example.env with new configuration options

Acceptance Criteria:

  • All existing tests pass with new config module
  • New config module tests have >90% coverage
  • Function API works identically to current implementation
  • Configuration file loading functional

Phase 3: Path Migration

Goal: Migrate all path configuration to use new system

Estimated Effort: ~40% of total effort

Deliverables:

  1. Update core modules to use new config:
    • src/xagent/core/workspace.py
    • src/xagent/core/agent/service.py
    • src/xagent/core/storage/manager.py
  2. Update web modules to use new config:
    • src/xagent/web/api/files.py
    • src/xagent/web/api/websocket.py
    • src/xagent/web/sandbox_manager.py
  3. Update skills utilities to use new config
  4. Update migration scripts

Acceptance Criteria:

  • All 20+ files with hardcoded paths updated
  • No hardcoded path strings remain (except tests)
  • All path tests pass
  • Manual testing of file upload, workspace, sandbox features

Phase 4: Documentation & Tooling

Goal: Improve developer experience and documentation

Deliverables:

  1. Add CLI command: xagent config --list (to view all configuration)
  2. Add configuration documentation generation
  3. Update developer documentation
  4. Add migration guide for existing deployments

Acceptance Criteria:

  • CLI command functional
  • Auto-generated configuration docs available
  • Migration guide published
  • Developer documentation complete

Phase 5: Cleanup & Deprecation (Ongoing)

Goal: Complete migration and remove legacy code

Deliverables:

  1. Add deprecation warnings to old env var names
  2. Document old → new env var mappings
  3. Establish deprecation policy for removing old env var support
  4. Monitor usage of deprecated env vars

Acceptance Criteria:

  • Deprecation warnings in place
  • Migration guide published
  • Deprecation policy documented
  • Communication plan for breaking changes

8.7 Critical Success Factors

  1. Breaking Changes Well-Communicated: Clear migration guide and deprecation policy
  2. Comprehensive Testing: Test coverage >90% for all new code
  3. Clear Documentation: All changes well-documented
  4. Gradual Migration: Support old env vars with warnings during transition

Part 9: Open Questions

This section lists open questions that need discussion/decision:

  1. Configuration File Location (DECISION NEEDED):

    Note: xagent is a multi-user web service (similar to Django/Rails), not a desktop application.

    Proposed options:

    • ./config.toml — Project-specific configuration (for development)
    • /etc/xagent/config.toml — System-wide configuration (for production deployments)

    User data remains at ~/.xagent/ (per-user storage, already isolated by user_id in uploads/).

  2. Secret Management: Should the configuration system handle secrets separately?

    • Consider integration with secret managers (HashiCorp Vault, etc.)
    • Or keep secrets as environment variables only (current approach)?
  3. Configuration Profiles: Should we support named configuration profiles?

    • xagent --profile development
    • Useful for different deployment environments (dev/staging/prod)
  4. TOML vs YAML vs JSON5: Which configuration file format to prioritize?

    • TOML is simpler but less expressive
    • YAML is more powerful but more complex
    • JSON5/JSONC offers JS-friendly syntax with comments
  5. Breaking Changes Communication: How to communicate breaking changes to existing users?

    • Migration guide in documentation
    • Release notes with clear upgrade path
    • Deprecation period length (e.g., 2 releases)
  6. Modular Organization (FR-8): Should configuration be organized as independent modules?

    • Proposal: Structure as xagent/config/ with separate files per domain (paths, database, sandbox, etc.)
    • Benefit: Multiple developers can work on different config areas without merge conflicts
    • Trade-off: More files/boilerplate vs. monolithic single-class approach
    • Note: Not all configuration systems support this well (pydantic-settings ✅, Flask/Django ⚠️)

Part 10: Success Criteria

Regardless of which approach is chosen, success means:

  • All environment variables follow XAGENT_* prefix convention with __ for nesting
  • Old env var names supported with deprecation warnings during transition period
  • Configuration is centralized (single source of truth)
  • Type hints are available for all configuration
  • Configuration file support is functional (TOML primary format)
  • All 20+ files with hardcoded paths are updated
  • Test coverage > 90% for new config code
  • Migration guide documents old → new env var name mappings
  • Breaking changes are clearly documented

Appendix: References

A.1 Configuration Management Resources

A.2 Existing xagent Configuration

  • src/xagent/config.py - Path configuration (PR feat: unified configuration module for all path-related settings #247, in feat/unified-configuration-module branch)
  • src/xagent/web/config.py - Web-specific configuration
  • src/xagent/web/auth_config.py - Authentication configuration
  • src/xagent/core/observability/langfuse_config.py - Langfuse config (Pydantic example)

A.3 Related Issues


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions