Proposal: Centralized Configuration System for Xagent

# Proposal: Centralized Configuration System for Xagent

**Status:** Draft
**Author:** @tanbro
**Created:** 2026-04-03
**Related Issues:** #243, #246, #252
**Related PRs:** #235, #247

---

## TL;DR

**Problem:** Xagent's configuration is **scattered across 20+ files** with inconsistent environment variable naming (`LANCEDB_DIR` vs `XAGENT_UPLOADS_DIR`), hardcoded paths, and no type safety. PR #247 improves this but doesn't go far enough.

**Solution:** A **centralized** configuration system with:
- ✅ **Single entry point** - `from xagent.config import PathConfig, DatabaseConfig, SandboxConfig`
- ✅ **Type-safe** - IDE autocomplete, validation at startup
- ✅ **Unified naming** - All env vars use `XAGENT_*` prefix with `__` for nesting
- ✅ **Config file + Environment variables** - TOML for complex deployments, env vars for containers/secrets
- ✅ **Internally modular** - Independent modules per domain, still centrally managed

**Breaking Changes:** All env var names will change to follow `XAGENT_*` prefix convention. A transition period with deprecation warnings will be provided.

**Key Decision Points Requiring Community Discussion:**
1. 🤔 **Internal organization**: Single monolithic class vs. modular `xagent/config/` package?
2. 🤔 **Config file location**: `./config.toml` (project) vs `/etc/xagent/config.toml` (system)?
3. 🤔 **Config file format**: TOML (recommended) vs YAML vs JSON5?

**Recommended:** **Approach B (Pydantic-First)** - Score 9.4/10. Best balance of type safety, unified naming, and modular organization support.

**Migration Effort:** Medium-High (more than incremental fixes, but solid foundation for future growth)

---

## Executive Summary

Xagent currently suffers from a **fragmented configuration architecture** that causes path inconsistencies, environment variable handling problems, and deployment complexity. This proposal analyzes the current state, identifies pain points, and defines a centralized configuration system.

### Proposal Objectives

1. **Single source of truth for all configuration**
 - Eliminate scattered `os.getenv()` calls across the codebase
 - Resolve circular dependency issues (core vs web modules)
 - Make configuration changes maintainable

2. **IDE support and type safety**
 - Enable autocomplete for configuration options
 - Catch configuration errors at type-check time, not runtime
 - Provide clear type information for each configuration value

3. **Configuration file + Environment variables**
 - Support TOML configuration files for complex deployments
 - Environment variables for container/secrets (with precedence)

4. **Unified environment variable naming**
 - All env vars follow `XAGENT_*` prefix convention
 - Nested config uses `__` separator (e.g., `XAGENT_PATHS__STORAGE_ROOT`)
 - Deprecation warnings for old naming schemes

5. **Modular organization** (proposed for discussion)
 - Configuration organized by domain (paths, database, sandbox, etc.)
 - Independent development - no merge conflicts on single config file
 - Clear separation of concerns with namespace access (e.g., `config.paths.storage_root`)

---

## Part 1: Problem Analysis

### 1.1 Current State Assessment

The Xagent codebase manages configuration through **multiple disconnected mechanisms**:

#### Mechanism 1: Scattered Environment Variables
- Configuration scattered across 20+ files using `os.getenv()` calls
- No central documentation of available configuration options
- No validation of configuration values

```python
# Example from src/xagent/core/workspace.py:50
base_dir: str = "uploads" # Hardcoded, ignores XAGENT_UPLOADS_DIR

# Example from src/xagent/web/api/files.py
UPLOADS_DIR = Path(os.getenv("XAGENT_UPLOADS_DIR", "uploads"))
```

#### Mechanism 2: Database-Backed Settings
- `system_settings` table exists but is underutilized
- Primarily used for internal flags rather than user-facing configuration

#### Mechanism 3: Module-Level Constants
- Individual modules define their own configuration constants
- Leads to duplication and inconsistency

```python
# src/xagent/providers/vector_store/lancedb.py:66
def get_default_lancedb_dir():
 # Hardcoded path computation duplicated from config
 return Path(os.getenv("LANCEDB_DIR", "~/.xagent/data/lancedb")).expanduser()
```

### 1.2 Symptom Analysis

| Symptom | Evidence | Impact |
|---------|----------|--------|
| **Path Inconsistency** | Issue #243: 6+ files use hardcoded `"uploads"` | `XAGENT_UPLOADS_DIR` ignored in core modules |
| **Naming Confusion** | Issue #252: `LANCEDB_DIR` vs `LANCEDB_PATH` | Unclear which env var to use |
| **Data Scattered** | Issue #246: Data in `./data`, `~/.xagent`, `src/xagent/web/` | Multiple bind-mounts needed for containers |
| **Portability Issues** | PR #235: Absolute paths in database | Breaks when changing `XAGENT_UPLOADS_DIR` |
| **Configuration Drift** | PR #247: 20+ files need updating for config changes | High maintenance cost |

### 1.3 Dependency Issues

**Circular Dependency Problem:**
```
src/xagent/web/config.py → Defines UPLOADS_DIR
src/xagent/core/workspace.py → Needs uploads directory but CANNOT import from web/
```

This forces core modules to hardcode paths, creating the inconsistency problem.

---

### 1.4 Why PR #247 is Not Enough

**Status**: PR #247 (`feat/unified-configuration-module`) is in the `feat/unified-configuration-module` branch, not yet merged to main.

**What PR #247 Does**:
- Creates `src/xagent/core/config.py` with 15+ configuration functions
- Provides centralized access to path and sandbox configuration
- Updates 20+ files to use the new functions
- **Maintains complete backward compatibility** - no behavior or default value changes, even when there are obvious problems

**What PR #247 Does NOT Solve**:

#### Problem 1: Breaking Changes Required for Unified Naming

Unified `XAGENT_*` env var naming (FR-6) is fundamentally incompatible with existing env var names:
- `LANCEDB_DIR` → `XAGENT_PATHS__LANCEDB_PATH`
- `UPLOADS_DIR` → `XAGENT_PATHS__UPLOADS_DIR`
- `SANDBOX_IMAGE` → `XAGENT_SANDBOX__IMAGE`

Any solution that properly implements FR-6 will break existing deployments. The question is not whether to break compatibility, but how to manage the transition.

#### Problem 2: Hardcoded Variable Names and Configuration Reading Function Combination

```python
# PR #247 implementation
UPLOADS_DIR = "XAGENT_UPLOADS_DIR" # Hardcoded variable name
WEB_STATIC_DIR = "XAGENT_WEB_STATIC_DIR" # Hardcoded variable name
# ... 15+ hardcoded constants

def get_uploads_dir() -> Path:
 env_dir = os.getenv(UPLOADS_DIR) # Repeated pattern
 if env_dir:
 return Path(env_dir)
 return get_web_dir() / "uploads" # Hardcoded default value

def get_web_static_dir() -> Path:
 env_dir = os.getenv(WEB_STATIC_DIR) # Same repeated pattern
 if env_dir:
 return Path(env_dir)
 return get_web_dir() / "static"
```

**Problems**:
- Each configuration item requires a new function, repeating the same pattern
- Adding new config = adding new function = high maintenance cost
- Environment variable names, default values, and validation logic are scattered
- **Cannot see all configuration items and their defaults at a glance**

#### Problem 2: Hardcoded Default Values and Validation Rules

```python
# Default values hardcoded in function body, inconsistent rules
def get_uploads_dir() -> Path:
 return web_dir / "uploads" # Relative to web directory

def get_lancedb_path() -> Path:
 return Path("data/lancedb") # Relative to CWD!
 # Comment admits: "Default to ./data/lancedb, which is **relative** to cwd"

def get_database_url() -> str:
 db_path = get_default_sqlite_db_path() # Relative to home
 return f"sqlite:///{db_path}"
```

**Problems**:
- Default values are scattered, cannot be viewed centrally
- Path rules are inconsistent (source-relative vs cwd-relative vs home-relative)
- Validation logic `if env_var: return env_var` is repeated 15+ times
- **Cannot express dependencies between configuration items**

#### Problem 3: No Support for Nested Configuration

```python
# PR #247 only supports flat configuration
get_storage_root() # → Path
get_uploads_dir() # → Path
get_sandbox_cpus() # → Optional[int] # Problem: caller must handle None
get_sandbox_memory() # → Optional[int] # Problem: caller must handle None
get_sandbox_env() # → dict
get_sandbox_volumes() # → list[tuple]
```

**Cannot Express**:
```yaml
# Desired structured configuration
[paths]
storage_root = "~/.xagent"

[sandbox]
image = "xprobe/xagent-sandbox:latest"
[sandbox.resources]
cpus = 2
memory = "4g"
```

**Missing Capabilities**:
- ❌ Dependencies/references between configuration items
- ❌ Namespace grouping (`sandbox.*`)
- ❌ Structured configuration for lists/dicts

#### Problem 4: Weak Type Checking

```python
# Type errors only discovered at runtime
def get_sandbox_cpus() -> Optional[int]:
 env_str = os.getenv(SANDBOX_CPUS)
 if env_str:
 try:
 return int(env_str) # Runtime conversion
 except ValueError:
 logger.warning(f"Invalid {SANDBOX_CPUS} value: {env_str}")
 return None # Caller must handle None!
```

**Problems**:
- Type errors only discovered at runtime
- Caller must handle `None` even though there's a meaningful default
- Cannot enforce ranges (cpus must be > 0)
- IDE has no knowledge of default values and valid value ranges
- **Cannot validate at config load time, only at runtime**

#### Problem 5: No Configuration File Support

```python
# Only supports environment variables, no configuration file support
def get_uploads_dir() -> Path:
 env_dir = os.getenv(UPLOADS_DIR) # Only source
 if env_dir:
 return Path(env_dir)
 return web_dir / "uploads"
```

**Missing**:
- ❌ Configuration file support (TOML/YAML/JSON/...)
- ❌ Multi-level precedence (env vars > config file > defaults)

---

### 1.5 Root Cause Analysis

| Symptom | Root Cause | PR #247 Fixes It? |
|---------|-----------|-------------------|
| Path inconsistency | Wrong configuration definition approach, not lack of centralization | ❌ Improvement only, not fundamental fix |
| No type checking | Lack of type-safe configuration definition | ❌ No type checking |
| No config file | Architecture doesn't support file loading | ❌ No file loading |
| Cannot nest | Function-based API is fundamentally flat | ❌ No nesting support |

**Conclusion**: PR #247 is an "improvement" but doesn't fundamentally solve the configuration management problem.

**What's really needed**: Transition from "function-based API" to "declarative configuration definition".

---

---

## Part 2: Pain Points

### 2.1 Development Pain Points

1. **No Single Source of Truth**
 - Finding where a configuration value is set requires grepping the entire codebase
 - Changing default behavior requires updating multiple files

2. **No Type Safety**
 - `os.getenv()` returns `Optional[str]` with no validation
 - Type errors only discovered at runtime

3. **Poor Developer Experience**
 - No autocomplete for configuration options
 - No inline documentation when accessing config values

### 2.2 Deployment Pain Points

1. **Environment Variable Overload**
 - 15+ environment variables to manage
 - No configuration file support for complex deployments

2. **Container Deployment Complexity**
 - Data scattered across multiple directories requires multiple bind-mounts
 - Path resolution depends on working directory

3. **Configuration Validation**
 - Invalid configuration only discovered at runtime
 - No startup validation to catch misconfiguration early

### 2.3 Operational Pain Points

1. **Configuration Documentation**
 - `example.env` is comprehensive but separated from code
 - No programmatic way to list all available configuration

2. **Migration Path**
 - No clear path for evolving configuration over time
 - Risk of breaking existing deployments

---

## Part 3: Requirements

### 3.1 Functional Requirements (Aligned with Project Goals)

#### FR-1: Centralized Configuration

**Goal**: Single source of truth

- **Req**: Single module provides all configuration access
- **Why**: Eliminates configuration scatter and resolves circular dependencies
- **Acceptance**:
 - All configuration access goes through `xagent.config` module
 - No direct `os.getenv()` calls in business logic
 - Single import point: `from xagent.config import get_uploads_dir, get_storage_root, ...`
 - Configuration module has no internal xagent dependencies (NFR-1)

**Related Issues**: #243 (path inconsistency), #252 (naming confusion)

---

#### FR-2: Type Hints & Type Checking

**Goal**: IDE support and type safety

- **Req**: Configuration values have type hints and are validated
- **Why**: Current `os.getenv()` returns `Optional[str]` with no validation
- **Acceptance**:
 - All configuration functions have proper type hints
 - Type errors caught at type-check time (mypy) and config load time
 - IDE autocomplete shows available configuration options
 - Return types are specific (e.g., `Path`, not `str`)
 - Generic type support for complex configuration

**Example**:
```python
# Before (no type hints, no autocomplete)
uploads_dir = os.getenv("XAGENT_UPLOADS_DIR", "uploads") # type: str | None

# After (type hints, autocomplete)
def get_uploads_dir() -> Path:
 """Get the uploads directory path."""
 ...
# IDE shows: get_uploads_dir() -> Path
```

---

#### FR-3: Configuration File + Environment Variables

**Goal**: Multi-source configuration loading

- **Req**: Support both configuration files and environment variables with precedence
- **Why**:
 - Configuration files for complex deployments and documentation
 - Environment variables for containers/secrets/overrides
- **Acceptance**:
 - Configuration file support: **TOML** (primary format)
 - Precedence: Environment variables > Config file > Code defaults
 - Multiple config file locations: project overrides user overrides system

**Configuration File Locations** (loaded in order, later overrides earlier):
1. `/etc/xagent/config.toml` (system-wide, lowest precedence)
2. `~/.config/xagent/config.toml` (user-specific, medium precedence)
3. `./config.toml` (project-specific, highest precedence)

**Example Config File**:
```toml
# config.toml
[paths]
storage_root = "~/.xagent"
uploads_dir = "~/.xagent/uploads"
lancedb_path = "~/.xagent/lancedb"

[logging]
level = "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
```

**Environment Variable Override**:
```bash
# Override any config value via environment variable
export XAGENT_PATHS__STORAGE_ROOT=/data
export XAGENT_LOG_LEVEL=DEBUG
```

---

#### FR-4: Path Configuration Centralization

**Goal**: Resolve path inconsistency issues

- **Req**: All path configuration managed centrally with consistent semantics
- **Why**: Addresses issues #243, #246, #252
- **Acceptance**:
 - `XAGENT_STORAGE_ROOT` as default parent for all data paths
 - Consistent relative vs absolute path handling
 - Cross-platform path handling (POSIX/Windows)
 - Path normalization at load time
 - No CWD-relative paths

---

#### FR-5: Sensitive Configuration Handling

**Goal**: Secure handling of keys and passwords

- **Req**: Special handling for sensitive configuration values (API keys, passwords, secrets)
- **Why**: Secrets should never appear in logs, error messages
- **Acceptance**:
 - Automatic redaction of sensitive values in logs and error messages
 - Sensitive values only from environment variables (never from config files)
 - Clear documentation marks which values are sensitive
 - Optional: Integration with secret managers (future)

---

#### FR-6: Unified Environment Variable Naming

**Goal**: Consistent, predictable environment variable naming

- **Req**: All environment variables follow `XAGENT_*` prefix convention with nested support via `__`
- **Why**: Current env vars are inconsistent (LANCEDB_DIR vs XAGENT_UPLOADS_DIR), no unified prefix
- **Acceptance**:
 - All new env vars use `XAGENT_*` prefix
 - Nested configuration uses `__` separator (e.g., `XAGENT_PATHS__STORAGE_ROOT`)
 - **This is a breaking change** - old env var names will not work
 - Migration guide documents old → new name mappings

**Industry Convention Verification:**

| System | Nested Delimiter | Example |
|--------|------------------|---------|
| **pydantic-settings** | `__` (default) | `export MY_APP__DATABASE__HOST=localhost` |
| **dynaconf** | `__` | `export DYNACONF_NESTED__LEVEL__KEY=1` |
| **Flask config** | No native support | Uses flat class attributes |
| **Django settings** | No native support | Uses flat module-level variables |

**Key Implementation Notes:**
- **pydantic-settings**: Uses `env_nested_delimiter` (defaults to `__`)
- Environment variables are always SCREAMING_SNAKE_CASE (industry standard)
- On Windows, env vars are always case-insensitive (Python `os` module limitation)

**Example Migration**:
```bash
# Old (inconsistent, deprecated)
LANCEDB_DIR=~/.xagent/lancedb
UPLOADS_DIR=./uploads
XAGENT_UPLOADS_DIR=./uploads # inconsistent with LANCEDB_*

# New (unified, recommended)
XAGENT_PATHS__STORAGE_ROOT=~/.xagent
XAGENT_PATHS__UPLOADS_DIR=~/.xagent/uploads
XAGENT_PATHS__LANCEDB_PATH=~/.xagent/lancedb
```

---

#### FR-7: Migration Path

**Goal**: Smooth transition for existing deployments

- **Req**: Clear migration path from old env var names to new unified naming
- **Why**: FR-6 introduces breaking changes (renaming all env vars)
- **Acceptance**:
 - Transition period: old env var names supported with deprecation warnings
 - Migration guide with old → new mappings (e.g., `LANCEDB_DIR` → `XAGENT_PATHS__LANCEDB_PATH`)
 - Tooling to help identify deprecated configuration usage
 - Clear documentation of breaking changes and deprecation timeline
 - **Note**: This is not backward compatibility - it's a structured transition to new naming

---

#### FR-8: Modular Organization (Proposed - Needs Discussion)

**Goal**: Independent development and maintenance of configuration domains

- **Req**: Configuration organized by domain/feature into independent modules
- **Why**: 
 - Multiple developers can work on different config areas without merge conflicts
 - Clear separation of concerns (paths, database, sandbox, LLM, etc.)
 - Easier to locate and modify specific configuration
 - Scalable for growing projects
- **Acceptance**:
 - Configuration organized under `xagent.config` package
 - Each domain has its own config class (e.g., `PathConfig`, `DatabaseConfig`)
 - Access via namespace: `from xagent.config import PathConfig, DatabaseConfig`
 - Usage: `paths = PathConfig(); print(paths.storage_root)`
 - No circular dependencies between config modules

**Example Structure:**
```
xagent/config/
 __init__.py # Aggregates all config modules
 paths.py # PathConfig for storage, uploads, etc.
 database.py # DatabaseConfig for DB connections
 sandbox.py # SandboxConfig for container config
 llm.py # LLMConfig for model providers
```

**Usage:**
```python
# Option 1: Direct import
from xagent.config import PathConfig, DatabaseConfig

# Usage
paths = PathConfig()
database = DatabaseConfig()
storage_root = paths.storage_root

# Option 2: Namespace access
from xagent import config
storage_root = config.paths.storage_root
```

**Not All Approaches Support This:**
| Approach | Modular Support | Notes |
|----------|-----------------|-------|
| pydantic-settings | ✅✅✅ | Each module independent `BaseSettings` |
| Hydra | ✅✅ | Designed for config composition |
| dynaconf | ✅ | Supports multiple validators/sections |
| Flask style | ⚠️ | Only via inheritance, not true modular |
| Django style | ⚠️ | django-split-settings splits files, not namespaces |
| PR #247 | ⚠️ | Can have separate functions, no class organization |

---

### 3.2 Requirements Summary (Goal Alignment)

| Goal | Requirement(s) | Priority |
|------|----------------|----------|
| **Centralized Configuration** | FR-1 (Centralized Interface), FR-4 (Path Centralization) | **P0** |
| **Type Hints & Type Checking** | FR-2 (Type Hints & Type Checking) | **P0** |
| **Multi-Source Loading** | FR-3 (Multi-Source Loading) | **P0** |
| **Path Centralization** | FR-4 (Path Configuration) | **P0** |
| **Unified Naming** | FR-6 (Unified Environment Variable Naming) | **P0** |
| **Modular Organization** | FR-8 (Modular Organization) | **P0** |
| **Security Handling** | FR-5 (Sensitive Configuration) | **P1** |
| **Migration Path** | FR-7 (Migration Path) | **P0** |

**Priority Legend:**
- **P0 (Critical)**: Must have — blocks deployment without it
- **P1 (High)**: Should have — significant impact if missing
- **P2 (Future)**: Optional — deferred to future release

---

### 3.3 Non-Functional Requirements

#### NFR-1: No Circular Dependencies (P0)
- Configuration module must not import from other xagent modules
- Core modules need configuration without circular imports

#### NFR-2: Performance (P1)
- Configuration loaded once at startup, cached in memory
- Fast access (< 1μs per access)

#### NFR-3: Testability (P1)
- Clear mechanism for overriding config in tests
- Test fixtures for common scenarios

#### NFR-4: Error Handling (P1)
- Clear error messages for invalid configuration
- Early validation at startup (fail-fast)

#### NFR-5: Maintainability (P2)
- Easy to add new configuration options
- Clear patterns for defining new values

### 3.2 Non-Functional Requirements

#### NFR-1: No Circular Dependencies
- **Req**: Configuration module must not import from other xagent modules
- **Why**: Core modules need configuration without creating circular imports
- **Acceptance**:
 - `xagent.config` has no internal xagent dependencies
 - Can be imported by any module without import cycles
 - Uses only standard library and external dependencies (pydantic, etc.)

#### NFR-2: Performance
- **Req**: Configuration access must be fast with minimal overhead
- **Why**: Configuration accessed frequently during execution
- **Acceptance**:
 - Configuration loaded once at startup
 - In-memory caching (no repeated file I/O)
 - Fast attribute access (< 1μs per access)
 - Lazy loading for expensive resources (e.g., network connections)

#### NFR-3: Testability
- **Req**: Configuration must be easily mockable/overrideable for testing
- **Why**: Tests need to run with different configurations without side effects
- **Acceptance**:
 - Clear mechanism for overriding config in tests
 - Test fixtures for common configuration scenarios
 - Isolated test configuration (no file system pollution)
 - Easy to reset configuration between tests

#### NFR-4: Error Handling (P1)
- Clear error messages for invalid configuration
- Early validation at startup (fail-fast)

#### NFR-5: Maintainability (P2)
- Easy to add new configuration options
- Clear patterns for defining new values

---

## Part 4: Configuration Scope

Based on the analysis and goals, the centralized configuration system should cover:

### 4.1 Path Configuration (P0 - Critical)
- `XAGENT_STORAGE_ROOT` - Root for all data (`~/.xagent`)
- `XAGENT_UPLOADS_DIR` - User upload files
- `XAGENT_WEB_STATIC_DIR` - Static web assets
- `XAGENT_LANCEDB_PATH` - Vector database path
- `XAGENT_DATABASE_URL` - Primary database connection
- `XAGENT_EXTERNAL_UPLOAD_DIRS` - Additional upload directories
- `XAGENT_EXTERNAL_SKILLS_LIBRARY_DIRS` - Additional skills directories

### 4.2 Service Configuration (P0 - Critical)
- `XAGENT_LOG_LEVEL` - Logging verbosity (DEBUG/INFO/WARNING/ERROR/CRITICAL)
- Database connection settings
- Memory store configuration (LanceDB vs in-memory)

### 4.3 Sandbox Configuration (P1 - High)
- `SANDBOX_IMAGE` - Container image
- `SANDBOX_CPUS`, `SANDBOX_MEMORY` - Resource limits
- `SANDBOX_ENV`, `SANDBOX_VOLUMES` - Mounts and environment

### 4.4 External Service Configuration (P1 - High)
- LLM provider API keys and endpoints
- Embedding provider settings
- Search provider credentials

**Note**: These are typically stored as environment variables (secrets), not in config files

---

## Part 5: Configuration System Options (Neutral Analysis)

This section provides a neutral comparison of common Python configuration management approaches, without assuming which is best for xagent.

### 5.1 Overview of Configuration Approaches

| Approach | Type | Dependencies | Complexity | Industry Adoption |
|----------|------|--------------|------------|-------------------|
| **1. pydantic-settings** | Declarative, class-based | pydantic, pydantic-settings | Medium | High (FastAPI, modern apps) |
| **2. Flask Style** | Class-based inheritance | None | Low | High (Flask ecosystem) |
| **3. Django Style** | Module-based | None | Low-Medium | High (Django ecosystem) |
| **4. dynaconf** | Multi-format loader | dynaconf | Medium | Medium |
| **5. Hydra** | Config composition | hydra, omegaconf | High | Medium (ML/Research) |
| **6. Current PR #247** | Function-based | None | Low | N/A (xagent-specific) |

### 5.2 Detailed Analysis

#### Option 1: pydantic-settings

**Description**: Type-safe configuration using Pydantic models with validation.

**Example**:
```python
from pathlib import Path
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field

class XagentConfig(BaseSettings):
 """Complete xagent configuration."""
 model_config = SettingsConfigDict(
 env_file=".env",
 env_nested_delimiter="__", # XAGENT_PATHS__STORAGE_ROOT
 env_prefix="XAGENT_", # Remove prefix from env var names
 )

 storage_root: Path = Path.home() / ".xagent"
 uploads_dir: Path = storage_root / "uploads"
 log_level: str = Field(default="INFO", pattern="^DEBUG|INFO|WARNING|ERROR|CRITICAL$")

config = XagentConfig()
# Access: config.storage_root, config.uploads_dir
```

**Environment Variable Mapping:**
```bash
# With env_prefix="XAGENT_" and env_nested_delimiter="__"
export XAGENT_STORAGE_ROOT=/data
export XAGENT_PATHS__STORAGE_ROOT=/data # If using nested models
export XAGENT_LOG_LEVEL=DEBUG
```

**Characteristics**:
- Type safety with automatic validation
- Fail-fast at startup
- IDE autocomplete support
- Nested model support with `__` delimiter
- Multi-source loading (env, .env, JSON, CLI)

**Trade-offs**:
- (+) Strong typing and validation
- (+) Self-documenting
- (+) Industry standard for modern Python
- (+) Native `__` nested delimiter support
- (-) Additional dependency
- (-) Learning curve
- (-) Validation adds startup time

**References**: [Pydantic Settings Documentation](https://docs.pydantic.dev/latest/concepts/pydantic_settings/)

---

#### Option 2: Flask Style Config

**Description**: Class-based configuration with inheritance for environment separation.

**Example**:
```python
class Config:
 DEBUG = False
 SECRET_KEY = 'default-key'
 DATABASE_URL = 'sqlite:///app.db'

class DevelopmentConfig(Config):
 DEBUG = True
 DATABASE_URL = 'sqlite:///dev.db'

class ProductionConfig(Config):
 DEBUG = False
 SECRET_KEY = os.getenv('SECRET_KEY')

# Usage
config = DevelopmentConfig
```

**Characteristics**:
- Simple class attributes
- Environment inheritance
- No external dependencies
- Environment-based separation

**Trade-offs**:
- (+) Simple and intuitive
- (+) Zero dependencies
- (+) Easy environment separation
- (+) Type hints supported (mypy compatible)
- (-) No runtime validation
- (-) Manual env var handling
- (-) Config is Python code that can execute arbitrary logic

**References**: [Flask Configuration](https://flask.palletsprojects.com/en/stable/config/)

---

#### Option 3: Django Style Settings

**Description**: Single module with Python-based configuration, optionally split across files.

**Example**:
```python
# settings.py
DEBUG = False
SECRET_KEY = 'your-secret-key'
DATABASES = {
 'default': {
 'ENGINE': 'django.db.backends.sqlite3',
 'NAME': BASE_DIR / 'db.sqlite3',
 }
}

# Or using django-split-settings
from split_config.tools import include, optional

include(
 'components/base.py',
 'components/database.py',
 optional('local_config.py'),
)
```

**Characteristics**:
- Python module-based
- Supports complex nested structures
- File splitting for organization
- Programmable configuration

**Trade-offs**:
- (+) Flexible and powerful
- (+) No dependencies (base)
- (+) Supports complex structures
- (+) Type hints supported (mypy compatible)
- (-) No runtime validation
- (-) Config is Python code that can execute arbitrary logic

**References**: [django-split-settings](https://django-configurations.readthedocs.io/)

---

#### Option 4: dynaconf

**Description**: Multi-format configuration loader with remote config support.

**Example**:
```python
from dynaconf import Dynaconf

settings = Dynaconf(
 settings_files=['settings.yaml', '.secrets.yaml'],
 environments=True,
 env='development'
)

# Access
settings.database_url
settings.get('database_url', 'default')
```

**Characteristics**:
- Multi-format (YAML, TOML, JSON, INI, .env)
- Environment separation (dev, staging, prod)
- Remote config support (Redis, Vault)
- Variable expansion

**Trade-offs**:
- (+) Multiple format support
- (+) Remote configuration
- (+) Variable expansion
- (-) Additional dependency
- (-) No built-in type validation
- (-) Learning curve

---

#### Option 5: Hydra

**Description**: Configuration composition framework for complex applications.

**Example**:
```python
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
 print(cfg.database.url)
 print(cfg.model.learning_rate)
```

**Characteristics**:
- Powerful config composition
- Experiment tracking
- Command-line override
- Multi-config file combination

**Trade-offs**:
- (+) Excellent for complex configs
- (+) Experiment tracking
- (+) CLI overrides
- (-) Steep learning curve
- (-) Overkill for simple apps
- (-) Heavy dependency

---

#### Option 6: Current PR #247 (Function-Based)

**Description**: Function-based API with hardcoded defaults and environment variable reading.

**Example**:
```python
def get_uploads_dir() -> Path:
 env_dir = os.getenv("XAGENT_UPLOADS_DIR")
 if env_dir:
 return Path(env_dir)
 return get_web_dir() / "uploads"

def get_sandbox_cpus() -> Optional[int]: # Problematic: caller must handle None
 env_str = os.getenv("SANDBOX_CPUS")
 if env_str:
 try:
 return int(env_str)
 except ValueError:
 logger.warning(f"Invalid SANDBOX_CPUS: {env_str}")
 return None # Should have a default, not None
```

**Characteristics**:
- Function-based access
- Hardcoded defaults per function
- Runtime validation (try/except)
- No nested support
- **Problem**: `Optional[int]` forces callers to handle `None` even when there's a meaningful default

**Trade-offs**:
- (+) Simple, zero dependencies
- (+) Type hints on return values
- (+) Backward compatible
- (-) Code repetition
- (-) No config file support
- (-) No nested configuration
- (-) Defaults scattered

---

### 5.3 Comparison Matrix (Detailed)

| Dimension | pydantic-settings | Flask Style | Django Style | dynaconf | Hydra | PR #247 |
|-----------|------------------|-------------|--------------|----------|-------|---------|
| **Type Safety** | ✅✅✅ | ✅ | ✅ | ⚠️ | ✅ | ⚠️ |
| **Validation** | ✅✅✅ | ❌ | ❌ | ⚠️ | ✅ | ⚠️ |
| **Config Files** | ✅ | ❌ | ❌ | ✅✅✅ | ✅✅ | ❌ |
| **Env Separation** | ✅ | ✅ | ✅ | ✅✅✅ | ✅✅✅ | ❌ |
| **Nested Config** | ✅✅ | ⚠️ | ✅ | ✅ | ✅✅✅ | ❌ |
| **Nested Env Var Delimiter** | ✅✅✅ | ❌ | ❌ | ✅✅✅ | ✅ | ❌ |
| **Modular Organization** | ✅✅✅ | ⚠️ | ⚠️ | ✅ | ✅✅ | ⚠️ |
| **Zero Deps** | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
| **Learning Curve** | Medium | Low | Low | Medium | High | Low |
| **Industry Adoption** | High (growing) | High (Flask) | High (Django) | Medium | Medium (ML) | N/A |

**Legend**: ✅✅✅ Excellent | ✅ Good | ⚠️ Partial | ❌ No Support

**Notes:**
- **Nested Env Var Delimiter**: Native support for mapping env vars like `XAGENT_PATHS__STORAGE_ROOT` to nested config `config.paths.storage_root`
 - `__` (double underscore) is the de facto standard (pydantic-settings, dynaconf)
 - Hydra uses `.` via CLI, not env vars
 - Flask/Django require manual parsing

### 5.4 Recommendation Framework

When choosing an approach, consider:

1. **Team familiarity**: What patterns does the team know?
2. **Existing code**: PR #247 uses function-based API
3. **Requirements priority**: Which goals are most important?
4. **Migration cost**: How much code needs to change?
5. **Dependency tolerance**: Can we add new dependencies?

**No approach is universally "best"** — the choice depends on xagent's specific constraints and priorities.

---

## Part 6: Proposed Solution Approaches

Based on the analysis above, three implementation approaches are considered for xagent:

### 6.1 Approach A: Enhanced PR #247 (Function-Based)

#### Technical Design

```python
# xagent/config.py
from pathlib import Path
from typing import Optional

# Simple function-based API with validation decorators
def validate_path(value: str) -> Path:
 """Validate and normalize a path configuration value."""
 path = Path(value).expanduser().resolve()
 # Validation logic here
 return path

@config(default="~/.xagent", validator=validate_path)
def get_storage_root() -> Path:
 """Get the storage root directory."""
 env_value = os.getenv("XAGENT_STORAGE_ROOT")
 if env_value:
 return validate_path(env_value)
 return Path.home() / ".xagent"
```

#### Pros
- **Low Risk**: Builds on existing PR #247 work
- **Fast Delivery**: Minimal code changes required
- **Zero Breaking Changes**: Maintains current API surface
- **Low Learning Curve**: Familiar pattern for Python developers
- **No New Dependencies**: Uses standard library only

#### Cons
- **Limited Validation**: Validation must be manually implemented
- **No Type Safety**: Type errors only discovered at runtime
- **Scattered Logic**: Each function has its own validation logic
- **Poor IDE Support**: No autocomplete for configuration options
- **Limited Extensibility**: Adding nested configuration is awkward

#### Risk Assessment

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Validation inconsistencies | Medium | Medium | Create validation utility functions |
| Type errors in production | High | Low | Comprehensive testing |
| Poor developer experience | Low | Low | Documentation |

**Overall Risk Level: LOW**

---

### 6.2 Approach B: Pydantic-Based Settings (Transformative)

**Based on**: Option 1 (pydantic-settings) from the analysis above

#### Technical Design

```python
# xagent/config.py
from pathlib import Path
from pydantic import Field, field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict

class PathConfig(BaseSettings):
 """Path configuration with validation."""
 model_config = SettingsConfigDict(env_prefix="xagent_paths_")

 storage_root: Path = Path.home() / ".xagent"
 uploads_dir: Path = storage_root / "uploads"
 lancedb_path: Path = storage_root / "lancedb"

class XagentConfig(BaseSettings):
 """Complete xagent configuration."""
 model_config = SettingsConfigDict(
 env_file=".env",
 env_file_encoding="utf-8",
 env_nested_delimiter="__", # XAGENT_PATHS__STORAGE_ROOT
 )

 paths: PathConfig = Field(default_factory=PathConfig)
 log_level: str = Field(default="INFO", pattern="^DEBUG|INFO|WARNING|ERROR|CRITICAL$")

# Usage
config = XagentConfig()
storage_root = config.paths.storage_root
```

#### Pros
- **Runtime Validation**: Automatic type coercion, range checks, pattern matching
- **Excellent Error Messages**: Clear, actionable validation errors at startup
- **IDE Support**: Autocomplete for all configuration options
- **Nested Configuration**: Natural support for structured config
- **Industry Standard**: Battle-tested in production systems
- **Extensibility**: Easy to add new configuration options
- **Documentation**: Self-documenting via Field descriptions
- **Type Safety**: Type hints compatible with mypy (like Flask/Django styles)

#### Cons
- **High Risk**: Requires significant code changes
- **Breaking Changes**: Existing code must be updated
- **New Dependency**: Adds pydantic-settings dependency
- **Learning Curve**: Developers must learn Pydantic patterns
- **Migration Effort**: All configuration access points need updating

#### Risk Assessment

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Breaking existing deployments | High | High | Careful migration plan, extensive testing |
| Performance regression | Low | Low | Pydantic is fast, caching possible |
| Developer learning curve | Medium | Low | Documentation, examples |
| Dependency issues | Low | Medium | Pin versions, test thoroughly |

**Overall Risk Level: HIGH**

---

### 6.3 Approach C: Hybrid (Balanced)

#### Technical Design

```python
# xagent/config.py (internal - pydantic-based)
from pathlib import Path
from pydantic_settings import BaseSettings

class _InternalSettings(BaseSettings):
 """Internal pydantic-based config for validation."""
 storage_root: Path = Path.home() / ".xagent"
 uploads_dir: Path = storage_root / "uploads"

# Singleton instance
_config = _InternalSettings()

# xagent/config.py (external - function-based for backward compatibility)
def get_storage_root() -> Path:
 """Get the storage root directory.
 
 Priority:
 1. XAGENT_STORAGE_ROOT environment variable
 2. Configuration file value
 3. Default: ~/.xagent
 """
 return _config.storage_root

def get_uploads_dir() -> Path:
 """Get the uploads directory.
 
 Priority:
 1. XAGENT_UPLOADS_DIR environment variable
 2. Configuration file value
 3. Default: {storage_root}/uploads
 """
 return _config.uploads_dir

# New object-based API (optional for gradual migration)
class config:
 """Object-based configuration access.
 
 Example:
 from xagent.config import PathConfig
 paths = PathConfig()
 storage_root = paths.storage_root
 """
 class paths:
 storage_root = property(lambda self: _config.storage_root)
 uploads_dir = property(lambda self: _config.uploads_dir)
```

#### Pros
- **Backward Compatible**: Existing code continues to work
- **Type Safety**: Internal pydantic validation
- **Gradual Migration**: Can adopt object API over time
- **Low Risk**: Function API maintains compatibility
- **Future-Proof**: Foundation for full migration

#### Cons
- **Implementation Complexity**: Maintains two APIs temporarily
- **Documentation Burden**: Must document both styles
- **Potential Confusion**: Two ways to access same data
- **Technical Debt**: Temporary dual-API needs eventual cleanup

#### Risk Assessment

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| API confusion | Medium | Low | Clear documentation, examples |
| Maintenance burden | Low | Low | Plan for deprecation timeline |
| Performance overhead | Low | Low | Minimal overhead from indirection |

**Overall Risk Level: MEDIUM**

---

### 6.4 Configuration File Format Selection

| Format | Pros | Cons | Recommendation |
|--------|------|------|----------------|
| **TOML** | ✅ Clean syntax for config ✅ Better than INI for nesting ✅ Growing Python support ✅ Type preservation | ❌ Less common than YAML ❌ Limited expression support | **RECOMMENDED** |
| **YAML** | ✅ Very popular in Python ✅ Powerful expressions ✅ Wide tool support | ❌ Complex syntax ❌ Security concerns (unsafe load) ❌ Significant whitespace | **SECONDARY** |
| **JSON5/JSONC** | ✅ JSON with comments/trailing commas ✅ Familiar to JS developers ✅ Wide tool support | ❌ Less common in Python ❌ No built-in support | **OPTIONAL** |
| **INI** | ✅ Simple, familiar ✅ Built-in Python support | ❌ No nested structures ❌ No type preservation | **FALLBACK** |
| **JSON** | ✅ Standard format ✅ Easy to parse | ❌ No comments ❌ Verbose syntax | NOT RECOMMENDED |

**Recommendation: TOML (primary), with YAML as secondary, JSON5 as optional**

Rationale:
- **TOML**: Designed specifically for configuration files, clean syntax, Python 3.11+ built-in support (`tomllib`)
- **YAML**: Widely used, powerful, but has security concerns with `yaml.unsafe_load()`
- **JSON5/JSONC**: Good for projects with JS/TS frontend, allows comments and trailing commas
- Python 3.11+ has built-in TOML support (`tomllib`)
- Growing ecosystem adoption (pyproject.toml, etc.)

**JSON5/JSONC Note**:
- JSON5 = JSON extended with comments, trailing commas, unquoted keys, etc.
- JSONC = JSON with Comments (used by VS Code, TypeScript Compiler, etc.)
- Both require external libraries (e.g., `json5`, `jsoncomment`) in Python

---

### 6.5 Migration Path Analysis

#### Approach A (Enhanced PR #247): Migration Effort
- **New Code**: ~500 lines (validation, config file loading)
- **Modified Files**: ~10 files (add validation decorators)
- **Testing Effort**: Medium (test validation logic)
- **Relative Effort**: ⭐ Fastest - Minimal changes to existing code

#### Approach B (Pydantic-First): Migration Effort
- **New Code**: ~1000 lines (config classes, migration)
- **Modified Files**: ~50 files (all config access points)
- **Testing Effort**: High (test all changed code paths)
- **Relative Effort**: ⭐⭐⭐ Highest - Most comprehensive refactoring

#### Approach C (Hybrid): Migration Effort
- **New Code**: ~800 lines (internal config, function wrappers)
- **Modified Files**: ~20 files (initial migration)
- **Testing Effort**: Medium-High (test both APIs)
- **Relative Effort**: ⭐⭐ Medium - Balance between A and B

---

### 5.8 Decision Framework

**Updated Requirements Context:**
- **FR-6**: Unified `XAGENT_*` naming with `__` for nested config (breaking change required)
- **FR-8**: Modular organization required for independent development
- **FR-7**: Migration path with deprecation warnings (not backward compatibility)

Based on the analysis, the recommended approach should maximize:

1. **Unified Naming & Nested Env Var Support** (P0 requirements)
2. **Modular Organization** (FR-8, P0 requirement)
3. **Future Extensibility**
4. **Reasonable Implementation Risk**

**Updated Decision Matrix:**

| Criterion | Weight | A: Function | B: Pydantic | C: Hybrid |
|-----------|--------|-------------|-------------|-----------|
| Unified Naming (`__` separator) | 25% | 2 | 10 | 10 |
| Modular Organization | 25% | 3 | 10 | 10 |
| Type Safety & Validation | 25% | 4 | 10 | 8 |
| Future Extensibility | 15% | 3 | 10 | 7 |
| Implementation Risk | 10% | 10 | 4 | 7 |
| **Weighted Score** | 100% | **4.1** | **9.1** | **8.8** |

**Winner: Approach B (Pydantic-First) with 9.1/10**

**Rationale:**
With FR-6 requiring unified `XAGENT_*` naming with `__` separator AND FR-8 requiring modular organization:
- **Approach A (Function-based)**: Cannot easily support `__` separator or true modular organization
- **Approach B (Pydantic)**: Native support for `env_nested_delimiter="__"` AND independent `BaseSettings` classes per module
- **Approach C (Hybrid)**: Supports all requirements but adds unnecessary complexity since breaking changes are acceptable

**Key Differentiator: Modular Organization**
Only pydantic-settings and Hydra truly support independent config modules. Given xagent's requirements, pydantic-settings is the clear winner.

---

## Part 7: Decision Framework

This section provides a framework for choosing between the three approaches, without prescribing which is "best" for xagent.

### 7.1 Decision Criteria

| Criterion | Approach A | Approach B | Approach C |
|-----------|-----------|-----------|-----------|
| **If type safety is the highest priority** | ⚠️ | ✅ | ✅ |
| **If minimal dependencies is required** | ✅ | ❌ | ❌ |
| **If fast delivery is important** | ✅ | ❌ | ⚠️ |
| **If long-term maintainability matters** | ⚠️ | ✅ | ✅ |
| **If team knows pydantic** | ⚠️ | ✅ | ✅ |
| **If risk tolerance is low** | ⚠️ | ✅ | ✅ |

**Note:** All approaches require breaking changes to implement FR-6 (unified `XAGENT_*` naming). Backward compatibility is not a differentiator.

### 7.2 Questions to Guide Decision

1. **Is xagent willing to add pydantic as a dependency?**
 - Yes → Consider Approach B or C
 - No → Approach A only

2. **What is the team's familiarity with pydantic?**
 - High → Approach B or C
 - Low → Approach A (or C with training)

3. **What is the team's capacity and urgency?**
 - Need quick wins with minimal changes → Approach A
 - Have capacity for proper refactoring → Approach B or C

4. **What is the long-term vision for xagent?**
 - Stay simple, minimal dependencies → Approach A
 - Modern Python best practices → Approach B or C

### 7.3 Hybrid Approach (C) Consideration

Approach C is often chosen when:
- Backward compatibility is non-negotiable
- Team wants to adopt pydantic incrementally
- Risk tolerance is medium
- Long-term vision includes full pydantic migration

**Trade-off**: Higher initial complexity (two APIs) for smoother migration path.

### 7.4 Example Decision Outcomes

| Scenario | Recommended Approach |
|----------|---------------------|
| Startup phase, moving fast, minimal deps | A |
| **Established product, breaking changes acceptable** | **B** |
| Large team, gradual migration required | C |
| New project, no legacy code | B |

**For xagent**: Approach B is recommended because:
- Breaking changes are required for unified `XAGENT_*` naming (FR-6)
- Modular organization is required (FR-8)
- pydantic-settings provides native support for `__` nested delimiter and independent config modules

---

## Part 8: Implementation Considerations

### 8.1 Configuration File Format

See detailed analysis in Part 5, Section 5.2 for configuration file format options including TOML (recommended), YAML (secondary), and JSON5/JSONC (optional).

### 8.2 Configuration File Locations

For a multi-user web service like xagent, configuration file locations differ from desktop applications:

**Options**:
- `./config.toml` — Project-specific configuration (for development)
- `/etc/xagent/config.toml` — System-wide configuration (for production deployments)

### 8.3 Environment Variable Naming Migration

**Current State (Inconsistent):**
| Old Name | Notes |
|----------|-------|
| `LANCEDB_DIR` | No prefix |
| `XAGENT_UPLOADS_DIR` | Has prefix, inconsistent with LANCEDB_* |
| `UPLOADS_DIR` | No prefix, conflicts with above |
| `SANDBOX_IMAGE` | No prefix |
| `OPENAI_API_KEY` | Third-party, no prefix |

**Target State (Unified):**
| New Name | Old Name(s) | Deprecation |
|----------|-------------|-------------|
| `XAGENT_PATHS__STORAGE_ROOT` | N/A (new) | - |
| `XAGENT_PATHS__UPLOADS_DIR` | `XAGENT_UPLOADS_DIR`, `UPLOADS_DIR` | ⚠️ Warning |
| `XAGENT_PATHS__LANCEDB_PATH` | `LANCEDB_DIR`, `LANCEDB_PATH` | ⚠️ Warning |
| `XAGENT_PATHS__DATABASE_URL` | `DATABASE_URL` | ⚠️ Warning |
| `XAGENT_SANDBOX__IMAGE` | `SANDBOX_IMAGE` | ⚠️ Warning |
| `XAGENT_SANDBOX__CPUS` | `SANDBOX_CPUS` | ⚠️ Warning |
| `OPENAI_API_KEY` | `OPENAI_API_KEY` | ✅ Keep (external) |

**Implementation Notes:**
- The `__` delimiter is an industry standard used by both pydantic-settings and dynaconf
- Old env var names should be supported with deprecation warnings for at least 2 minor releases

**Implementation Strategy:**
1. Define new `XAGENT_*` env vars with pydantic-settings
2. Support old env var names with deprecation warnings (using aliases or custom validators)
3. Document migration path clearly
4. Provide migration tooling (optional)

---

### 8.4 Module Organization

**Proposed Structure:**
```
src/xagent/config/
 __init__.py # Aggregates all config modules, exports `paths`, `database`, etc.
 paths.py # PathConfig class
 database.py # DatabaseConfig class
 sandbox.py # SandboxConfig class
 llm.py # LLMConfig class
 ...
```

**Import Style:**
```python
# Import classes
from xagent.config import PathConfig, DatabaseConfig
paths = PathConfig()
database = DatabaseConfig()
```

**Migration Path:**
1. Create new `src/xagent/config/` package (replacing `src/xagent/core/config.py`)
2. Update all imports incrementally
3. Remove old `core/config.py` after migration complete

---

### 8.5 Nested Delimiter Convention Research

**Research Summary:**

The `__` (double underscore) convention for nested environment variables is verified as an industry standard:

| System | Delimiter | Verification |
|--------|-----------|--------------|
| pydantic-settings | `__` | Uses `env_nested_delimiter` (default: `__`) |
| dynaconf | `__` | Official docs show `export DYNACONF_NESTED__LEVEL__KEY=1` |
| Flask config | N/A | No native nested delimiter (flat structure) |
| Django settings | N/A | No native nested delimiter (flat structure) |

**Sources:**
- [Pydantic Settings Documentation](https://docs.pydantic.dev/latest/concepts/pydantic_settings/)
- [Dynaconf Documentation](https://dynaconf.com/)

**Key Finding:** Both pydantic-settings and dynaconf use `__` as the nested delimiter, making it a de facto standard for Python configuration management with environment variables.

---

### 8.6 Implementation Phases

#### Phase 1: Foundation

**Goal:** Create core configuration infrastructure with backward compatibility

**Estimated Effort:** ~30% of total effort

**Deliverables:**
1. Create `xagent/config` module with:
 - Internal pydantic config classes
 - Function-based wrapper API for backward compatibility
 - Configuration file loading (TOML primary, YAML/JSON5 optional)
2. Add `pydantic-settings` to dependencies
3. Create comprehensive test suite for config module
4. Update `example.env` with new configuration options

**Acceptance Criteria:**
- [ ] All existing tests pass with new config module
- [ ] New config module tests have >90% coverage
- [ ] Function API works identically to current implementation
- [ ] Configuration file loading functional

#### Phase 3: Path Migration

**Goal:** Migrate all path configuration to use new system

**Estimated Effort:** ~40% of total effort

**Deliverables:**
1. Update core modules to use new config:
 - `src/xagent/core/workspace.py`
 - `src/xagent/core/agent/service.py`
 - `src/xagent/core/storage/manager.py`
2. Update web modules to use new config:
 - `src/xagent/web/api/files.py`
 - `src/xagent/web/api/websocket.py`
 - `src/xagent/web/sandbox_manager.py`
3. Update skills utilities to use new config
4. Update migration scripts

**Acceptance Criteria:**
- [ ] All 20+ files with hardcoded paths updated
- [ ] No hardcoded path strings remain (except tests)
- [ ] All path tests pass
- [ ] Manual testing of file upload, workspace, sandbox features

#### Phase 4: Documentation & Tooling

**Goal:** Improve developer experience and documentation

**Deliverables:**
1. Add CLI command: `xagent config --list` (to view all configuration)
2. Add configuration documentation generation
3. Update developer documentation
4. Add migration guide for existing deployments

**Acceptance Criteria:**
- [ ] CLI command functional
- [ ] Auto-generated configuration docs available
- [ ] Migration guide published
- [ ] Developer documentation complete

#### Phase 5: Cleanup & Deprecation (Ongoing)

**Goal:** Complete migration and remove legacy code

**Deliverables:**
1. Add deprecation warnings to old env var names
2. Document old → new env var mappings
3. Establish deprecation policy for removing old env var support
4. Monitor usage of deprecated env vars

**Acceptance Criteria:**
- [ ] Deprecation warnings in place
- [ ] Migration guide published
- [ ] Deprecation policy documented
- [ ] Communication plan for breaking changes

### 8.7 Critical Success Factors

1. **Breaking Changes Well-Communicated**: Clear migration guide and deprecation policy
2. **Comprehensive Testing**: Test coverage >90% for all new code
3. **Clear Documentation**: All changes well-documented
4. **Gradual Migration**: Support old env vars with warnings during transition

---

## Part 9: Open Questions

This section lists open questions that need discussion/decision:

1. **Configuration File Location** (DECISION NEEDED):
 
 **Note**: xagent is a multi-user web service (similar to Django/Rails), not a desktop application.
 
 **Proposed options**:
 - `./config.toml` — Project-specific configuration (for development)
 - `/etc/xagent/config.toml` — System-wide configuration (for production deployments)
 
 **User data** remains at `~/.xagent/` (per-user storage, already isolated by user_id in uploads/).

2. **Secret Management**: Should the configuration system handle secrets separately?
 - Consider integration with secret managers (HashiCorp Vault, etc.)
 - Or keep secrets as environment variables only (current approach)?

3. **Configuration Profiles**: Should we support named configuration profiles?
 - `xagent --profile development`
 - Useful for different deployment environments (dev/staging/prod)

4. **TOML vs YAML vs JSON5**: Which configuration file format to prioritize?
 - TOML is simpler but less expressive
 - YAML is more powerful but more complex
 - JSON5/JSONC offers JS-friendly syntax with comments

5. **Breaking Changes Communication**: How to communicate breaking changes to existing users?
 - Migration guide in documentation
 - Release notes with clear upgrade path
 - Deprecation period length (e.g., 2 releases)

6. **Modular Organization (FR-8)**: Should configuration be organized as independent modules?
 - **Proposal**: Structure as `xagent/config/` with separate files per domain (paths, database, sandbox, etc.)
 - **Benefit**: Multiple developers can work on different config areas without merge conflicts
 - **Trade-off**: More files/boilerplate vs. monolithic single-class approach
 - **Note**: Not all configuration systems support this well (pydantic-settings ✅, Flask/Django ⚠️)

---

## Part 10: Success Criteria

Regardless of which approach is chosen, success means:

- [ ] All environment variables follow `XAGENT_*` prefix convention with `__` for nesting
- [ ] Old env var names supported with deprecation warnings during transition period
- [ ] Configuration is centralized (single source of truth)
- [ ] Type hints are available for all configuration
- [ ] Configuration file support is functional (TOML primary format)
- [ ] All 20+ files with hardcoded paths are updated
- [ ] Test coverage > 90% for new config code
- [ ] Migration guide documents old → new env var name mappings
- [ ] Breaking changes are clearly documented

---

## Appendix: References

### A.1 Configuration Management Resources

- [pydantic-settings documentation](https://docs.pydantic.dev/latest/concepts/pydantic_settings/)
- [FastAPI: Settings and Environment Variables](https://fastapi.tiangolo.com/advanced/settings/)
- [Flask: Configuration Handling](https://flask.palletsprojects.com/en/stable/config/)
- [django-split-settings documentation](https://django-configurations.readthedocs.io/)

### A.2 Existing xagent Configuration

- `src/xagent/config.py` - Path configuration (PR #247, in `feat/unified-configuration-module` branch)
- `src/xagent/web/config.py` - Web-specific configuration
- `src/xagent/web/auth_config.py` - Authentication configuration
- `src/xagent/core/observability/langfuse_config.py` - Langfuse config (Pydantic example)

### A.3 Related Issues

- Issue #243: uploads directory path inconsistency
- Issue #246: Consolidate data paths under storage_root
- Issue #252: LANCEDB_DIR vs LANCEDB_PATH naming confusion
- PR #235: Store relative paths in database
- PR #247: Unified configuration module (in `feat/unified-configuration-module` branch)

---

Symptom	Evidence	Impact
Path Inconsistency	Issue #243: 6+ files use hardcoded `"uploads"`	`XAGENT_UPLOADS_DIR` ignored in core modules
Naming Confusion	Issue #252: `LANCEDB_DIR` vs `LANCEDB_PATH`	Unclear which env var to use
Data Scattered	Issue #246: Data in `./data`, `~/.xagent`, `src/xagent/web/`	Multiple bind-mounts needed for containers
Portability Issues	PR #235: Absolute paths in database	Breaks when changing `XAGENT_UPLOADS_DIR`
Configuration Drift	PR #247: 20+ files need updating for config changes	High maintenance cost

System	Nested Delimiter	Example
pydantic-settings	`__` (default)	`export MY_APP__DATABASE__HOST=localhost`
dynaconf	`__`	`export DYNACONF_NESTED__LEVEL__KEY=1`
Flask config	No native support	Uses flat class attributes
Django settings	No native support	Uses flat module-level variables

New Name	Old Name(s)	Deprecation
`XAGENT_PATHS__STORAGE_ROOT`	N/A (new)	-
`XAGENT_PATHS__UPLOADS_DIR`	`XAGENT_UPLOADS_DIR`, `UPLOADS_DIR`	⚠️ Warning
`XAGENT_PATHS__LANCEDB_PATH`	`LANCEDB_DIR`, `LANCEDB_PATH`	⚠️ Warning
`XAGENT_PATHS__DATABASE_URL`	`DATABASE_URL`	⚠️ Warning
`XAGENT_SANDBOX__IMAGE`	`SANDBOX_IMAGE`	⚠️ Warning
`XAGENT_SANDBOX__CPUS`	`SANDBOX_CPUS`	⚠️ Warning
`OPENAI_API_KEY`	`OPENAI_API_KEY`	✅ Keep (external)

Symptom	Root Cause	PR #247 Fixes It?
Path inconsistency	Wrong configuration definition approach, not lack of centralization	❌ Improvement only, not fundamental fix
No type checking	Lack of type-safe configuration definition	❌ No type checking
No config file	Architecture doesn't support file loading	❌ No file loading
Cannot nest	Function-based API is fundamentally flat	❌ No nesting support

Approach	Modular Support	Notes
pydantic-settings	✅✅✅	Each module independent `BaseSettings`
Hydra	✅✅	Designed for config composition
dynaconf	✅	Supports multiple validators/sections
Flask style	⚠️	Only via inheritance, not true modular
Django style	⚠️	django-split-settings splits files, not namespaces
PR #247	⚠️	Can have separate functions, no class organization

Goal	Requirement(s)	Priority
Centralized Configuration	FR-1 (Centralized Interface), FR-4 (Path Centralization)	P0
Type Hints & Type Checking	FR-2 (Type Hints & Type Checking)	P0
Multi-Source Loading	FR-3 (Multi-Source Loading)	P0
Path Centralization	FR-4 (Path Configuration)	P0
Unified Naming	FR-6 (Unified Environment Variable Naming)	P0
Modular Organization	FR-8 (Modular Organization)	P0
Security Handling	FR-5 (Sensitive Configuration)	P1
Migration Path	FR-7 (Migration Path)	P0

Approach	Type	Dependencies	Complexity	Industry Adoption
1. pydantic-settings	Declarative, class-based	pydantic, pydantic-settings	Medium	High (FastAPI, modern apps)
2. Flask Style	Class-based inheritance	None	Low	High (Flask ecosystem)
3. Django Style	Module-based	None	Low-Medium	High (Django ecosystem)
4. dynaconf	Multi-format loader	dynaconf	Medium	Medium
5. Hydra	Config composition	hydra, omegaconf	High	Medium (ML/Research)
6. Current PR #247	Function-based	None	Low	N/A (xagent-specific)

Dimension	pydantic-settings	Flask Style	Django Style	dynaconf	Hydra	PR #247
Type Safety	✅✅✅	✅	✅	⚠️	✅	⚠️
Validation	✅✅✅	❌	❌	⚠️	✅	⚠️
Config Files	✅	❌	❌	✅✅✅	✅✅	❌
Env Separation	✅	✅	✅	✅✅✅	✅✅✅	❌
Nested Config	✅✅	⚠️	✅	✅	✅✅✅	❌
Nested Env Var Delimiter	✅✅✅	❌	❌	✅✅✅	✅	❌
Modular Organization	✅✅✅	⚠️	⚠️	✅	✅✅	⚠️
Zero Deps	❌	✅	✅	❌	❌	✅
Learning Curve	Medium	Low	Low	Medium	High	Low
Industry Adoption	High (growing)	High (Flask)	High (Django)	Medium	Medium (ML)	N/A

Risk	Probability	Impact	Mitigation
Validation inconsistencies	Medium	Medium	Create validation utility functions
Type errors in production	High	Low	Comprehensive testing
Poor developer experience	Low	Low	Documentation

Risk	Probability	Impact	Mitigation
Breaking existing deployments	High	High	Careful migration plan, extensive testing
Performance regression	Low	Low	Pydantic is fast, caching possible
Developer learning curve	Medium	Low	Documentation, examples
Dependency issues	Low	Medium	Pin versions, test thoroughly

Risk	Probability	Impact	Mitigation
API confusion	Medium	Low	Clear documentation, examples
Maintenance burden	Low	Low	Plan for deprecation timeline
Performance overhead	Low	Low	Minimal overhead from indirection

Format	Pros	Cons	Recommendation
TOML	✅ Clean syntax for config ✅ Better than INI for nesting ✅ Growing Python support ✅ Type preservation	❌ Less common than YAML ❌ Limited expression support	RECOMMENDED
YAML	✅ Very popular in Python ✅ Powerful expressions ✅ Wide tool support	❌ Complex syntax ❌ Security concerns (unsafe load) ❌ Significant whitespace	SECONDARY
JSON5/JSONC	✅ JSON with comments/trailing commas ✅ Familiar to JS developers ✅ Wide tool support	❌ Less common in Python ❌ No built-in support	OPTIONAL
INI	✅ Simple, familiar ✅ Built-in Python support	❌ No nested structures ❌ No type preservation	FALLBACK
JSON	✅ Standard format ✅ Easy to parse	❌ No comments ❌ Verbose syntax	NOT RECOMMENDED

Criterion	Weight	A: Function	B: Pydantic	C: Hybrid
Unified Naming (`__` separator)	25%	2	10	10
Modular Organization	25%	3	10	10
Type Safety & Validation	25%	4	10	8
Future Extensibility	15%	3	10	7
Implementation Risk	10%	10	4	7
Weighted Score	100%	4.1	9.1	8.8

Criterion	Approach A	Approach B	Approach C
If type safety is the highest priority	⚠️	✅	✅
If minimal dependencies is required	✅	❌	❌
If fast delivery is important	✅	❌	⚠️
If long-term maintainability matters	⚠️	✅	✅
If team knows pydantic	⚠️	✅	✅
If risk tolerance is low	⚠️	✅	✅

Scenario	Recommended Approach
Startup phase, moving fast, minimal deps	A
Established product, breaking changes acceptable	B
Large team, gradual migration required	C
New project, no legacy code	B

Old Name	Notes
`LANCEDB_DIR`	No prefix
`XAGENT_UPLOADS_DIR`	Has prefix, inconsistent with LANCEDB_*
`UPLOADS_DIR`	No prefix, conflicts with above
`SANDBOX_IMAGE`	No prefix
`OPENAI_API_KEY`	Third-party, no prefix

System	Delimiter	Verification
pydantic-settings	`__`	Uses `env_nested_delimiter` (default: `__`)
dynaconf	`__`	Official docs show `export DYNACONF_NESTED__LEVEL__KEY=1`
Flask config	N/A	No native nested delimiter (flat structure)
Django settings	N/A	No native nested delimiter (flat structure)

Proposal: Centralized Configuration System for Xagent #259

Description

Proposal: Centralized Configuration System for Xagent

TL;DR

Executive Summary

Proposal Objectives

Part 1: Problem Analysis

1.1 Current State Assessment

Mechanism 1: Scattered Environment Variables

Mechanism 2: Database-Backed Settings

Mechanism 3: Module-Level Constants

1.2 Symptom Analysis

1.3 Dependency Issues

1.4 Why PR #247 is Not Enough

Problem 1: Breaking Changes Required for Unified Naming

Problem 2: Hardcoded Variable Names and Configuration Reading Function Combination

Problem 2: Hardcoded Default Values and Validation Rules

Problem 3: No Support for Nested Configuration

Problem 4: Weak Type Checking

Problem 5: No Configuration File Support

1.5 Root Cause Analysis

Part 2: Pain Points

2.1 Development Pain Points

2.2 Deployment Pain Points

2.3 Operational Pain Points

Part 3: Requirements

3.1 Functional Requirements (Aligned with Project Goals)

FR-1: Centralized Configuration

FR-2: Type Hints & Type Checking

FR-3: Configuration File + Environment Variables

FR-4: Path Configuration Centralization

FR-5: Sensitive Configuration Handling

FR-6: Unified Environment Variable Naming

FR-7: Migration Path

FR-8: Modular Organization (Proposed - Needs Discussion)

3.2 Requirements Summary (Goal Alignment)

3.3 Non-Functional Requirements

NFR-1: No Circular Dependencies (P0)

NFR-2: Performance (P1)

NFR-3: Testability (P1)

NFR-4: Error Handling (P1)

NFR-5: Maintainability (P2)

3.2 Non-Functional Requirements

NFR-1: No Circular Dependencies

NFR-2: Performance

NFR-3: Testability

NFR-4: Error Handling (P1)

NFR-5: Maintainability (P2)

Part 4: Configuration Scope

4.1 Path Configuration (P0 - Critical)

4.2 Service Configuration (P0 - Critical)

4.3 Sandbox Configuration (P1 - High)

4.4 External Service Configuration (P1 - High)

Part 5: Configuration System Options (Neutral Analysis)

5.1 Overview of Configuration Approaches

5.2 Detailed Analysis

Option 1: pydantic-settings

Option 2: Flask Style Config

Option 3: Django Style Settings

Option 4: dynaconf

Option 5: Hydra

Option 6: Current PR #247 (Function-Based)

5.3 Comparison Matrix (Detailed)

5.4 Recommendation Framework

Part 6: Proposed Solution Approaches

6.1 Approach A: Enhanced PR #247 (Function-Based)

Technical Design

Pros

Cons

Risk Assessment

6.2 Approach B: Pydantic-Based Settings (Transformative)

Technical Design

Pros

Cons

Risk Assessment

6.3 Approach C: Hybrid (Balanced)

Technical Design

Pros

Cons

Risk Assessment