docs(optimizers): DSPy prompt optimization — user docs, example notebook, and navigation#5853
docs(optimizers): DSPy prompt optimization — user docs, example notebook, and navigation#5853MoShiha wants to merge 13 commits into
Conversation
Confirm role, goal, backstory, system_template, and prompt_template as stable public API (safe to read and write after construction) by updating their Field descriptions in BaseAgent. Add Agent.get_effective_system_prompt() -> str, which returns the fully rendered system prompt as it would be sent to the LLM. Respects system_template and prompt_template overrides, and immediately reflects in-place writes to role, goal, and backstory. This is the read/write seam that DSPy optimizers and other instrumentation tools need to inspect and update agent instructions without patching internals. See: crewAIInc#5818 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Implements DSPy-based prompt optimization as an optional extra (pip install 'crewai[dspy]'). DSPyOptimizer wraps a Crew as a dspy.Module and runs MIPROv2, BootstrapFewShot, or GEPA to algorithmically improve agent backstory/goal instructions, then writes the optimized instructions back to the crew in-place. - Add crewai.optimizers package with DSPyOptimizer, OptimizationResult, and AgentInstructions types; lazy import guards against missing dspy - Add 4 optimizer lifecycle events (started/trial/completed/failed) integrated with the crewai_event_bus - Before-LLM-call hook injects few-shot demos from compiled module - Hooks always cleaned up in finally block; supports MIPROv2, BootstrapFewShot, GEPA; compatible with dspy>=2.5 (tested on 3.2.1) - 28 unit tests; all quality gates pass (ruff, mypy --strict) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pyproject.toml: tighten dspy constraint to >=2.5,<4 (untested future major versions should not be auto-adopted) - optimizer_events.py: add crew_name/algorithm to OptimizationTrialCompletedEvent for event correlation across concurrent runs - dspy_optimizer.py: encode only agent.backstory in the DSPy signature (not backstory+goal) so the optimized text writes back to backstory without duplicating goal text in the rendered system prompt - dspy_optimizer.py: reset self._compiled_module = None at the start of compile() so a second call never reads stale demos from the previous run - base_agent.py: add optimizer-update note to goal/backstory field descriptions to match the existing note on role (consistency) Not implemented (with reasoning): - Empty-prompt defensive check in get_effective_system_prompt: Pydantic validators already enforce non-empty role/goal/backstory - Set _compiled_module = crew_module before teleprompter.compile(): would inject uncompiled demos into optimization trials, corrupting scores - Per-trial OptimizationTrialCompletedEvent emission: DSPy has no per-trial callback; event is reserved for future implementation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… contract DSPy teleprompters do not expose a per-trial callback, so the event can never be emitted accurately. The class is kept in optimizer_events.py for forward compatibility and will be re-exported when DSPy adds callback support. Tests updated to import the class directly from its module. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… assertions Async event delivery doesn't guarantee that the first captured event belongs to the current compile run. Use next() filtered by event content so the assertion finds the specific event rather than assuming its position. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds one-liner docstrings to all source functions/classes and test functions that were missing them. Brings combined coverage across PR files from ~22% to 100%, satisfying CodeRabbit's pre-merge check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…elds Replace bare isinstance any() check with next() filtered by algorithm and num_trials, matching the pattern used for Started and Failed event tests. Consistent with CR comment on async assertion brittleness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace length-only check with uuid.UUID() parsing and .version == 4 assertion to reject any 36-char non-UUID string. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughAdds DSPy-based offline prompt optimization (DSPyOptimizer) with teleprompter algorithms, optimizer lifecycle events, Agent.get_effective_system_prompt(), docs, example notebook, tests, and an optional ChangesDSPy Optimizer Implementation
Agent Instruction API Tests
DSPyOptimizer Tests
Documentation & Example
Sequence Diagram (high-level compilation flow): sequenceDiagram
participant User
participant DSPyOptimizer
participant DSPyTeleprompter
participant Crew
participant Agent
participant EventBus
User->>DSPyOptimizer: compile(trainset, num_trials, algorithm)
DSPyOptimizer->>Crew: run baseline kickoff
DSPyOptimizer->>EventBus: emit OptimizationStartedEvent
loop per trial
DSPyOptimizer->>DSPyTeleprompter: compile/run trial
DSPyTeleprompter->>Crew: run with injected demos (before-LLM hook)
Crew->>Agent: generate outputs using backstory
end
DSPyOptimizer->>Agent: write optimized instructions into agents
DSPyOptimizer->>Crew: run kickoff to measure optimized score
DSPyOptimizer->>EventBus: emit OptimizationCompletedEvent
DSPyOptimizer->>User: return OptimizationResult
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
examples/dspy_optimization.ipynb (1)
101-102: ⚡ Quick winUse a holdout topic for baseline vs optimized output comparison.
The current comparison uses a topic already present in
trainset, which can overstate gains in the demo output.Proposed fix
-baseline_output = crew.kickoff(inputs={"topic": "Q1 earnings call"}) +eval_topic = "customer webinar follow-up" +baseline_output = crew.kickoff(inputs={"topic": eval_topic}) ... -optimized_output = result.crew.kickoff(inputs={"topic": "Q1 earnings call"}) +optimized_output = result.crew.kickoff(inputs={"topic": eval_topic})Also applies to: 329-329
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/dspy_optimization.ipynb` around lines 101 - 102, The baseline comparison uses a topic present in trainset which can bias results; change the input to crew.kickoff (the baseline_input used to produce baseline_output) to a true holdout topic not present in trainset (e.g., define holdout_topic and use that string for both baseline and optimized runs), update the other occurrence at the second kickoff call around line 329 to use the same holdout_topic variable, and ensure any references to trainset or training examples remain unchanged so the holdout stays unseen during evaluation.docs/en/concepts/dspy-optimization.mdx (1)
176-177: ⚡ Quick winConsider using an env-configurable judge model for documentation examples.
While
claude-haiku-4-5-20251001is currently supported, using an environment variable with a current default would improve maintainability and reduce future updates when models are deprecated or rotated. Alternatively, you could use the stable aliasclaude-haiku-4-5instead of the dated version.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/concepts/dspy-optimization.mdx` around lines 176 - 177, Replace the hard-coded dated model string "claude-haiku-4-5-20251001" with an environment-configurable judge model so docs use a current default; update the example to reference an env var (e.g., JUDGE_MODEL) fallback to a stable alias like "claude-haiku-4-5" so the example shows using process.env.JUDGE_MODEL || "claude-haiku-4-5" (or the equivalent in the repo's docs examples) instead of the fixed "claude-haiku-4-5-20251001" value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/dspy_optimization.ipynb`:
- Around line 41-42: The notebook uses assert statements to check environment
variables OPENAI_API_KEY and ANTHROPIC_API_KEY which can be skipped when Python
is run with -O; replace these asserts with explicit runtime checks: call
os.getenv for "OPENAI_API_KEY" and "ANTHROPIC_API_KEY" and if a value is
missing, raise a clear exception (e.g., RuntimeError or SystemExit) or call
sys.exit with a descriptive message so the notebook fails immediately and with a
helpful error; update the two lines that reference os.getenv("OPENAI_API_KEY")
and os.getenv("ANTHROPIC_API_KEY") accordingly.
In `@lib/crewai/src/crewai/optimizers/dspy_optimizer.py`:
- Around line 79-82: Detect and prevent duplicate agent.role values before
building self.agent_predictors by adding a validation in compile(): iterate
crew.agents and collect agent.role into a set, raise a clear error if a
duplicate is found (or alternatively use a guaranteed-unique identifier such as
agent.id when constructing the dict), then build self.agent_predictors =
{agent.role: _dspy.ChainOfThought(_build_signature_for_agent(agent)) for agent
in crew.agents} only after the uniqueness check so demo injection/writeback that
uses self.agent_predictors, agent.role, and methods referenced in compile()
won't misroute.
In `@lib/crewai/tests/optimizers/test_dspy_optimizer.py`:
- Around line 460-544: Flush the global event bus before subscribing to clear
stale events (call crewai_event_bus.flush() before crewai_event_bus.on(...)) and
scope assertions to the current optimizer instance by capturing the event source
in your handlers (the first arg `src` in `_on_started`, `_on_completed`,
`_on_failed`) and only accept events where `src is optimizer` (or filter
received events by `src == optimizer`) when searching for the expected
OptimizationStartedEvent/OptimizationCompletedEvent/OptimizationFailedEvent;
keep the existing checks on algorithm/num_trials/error in addition to the source
check.
---
Nitpick comments:
In `@docs/en/concepts/dspy-optimization.mdx`:
- Around line 176-177: Replace the hard-coded dated model string
"claude-haiku-4-5-20251001" with an environment-configurable judge model so docs
use a current default; update the example to reference an env var (e.g.,
JUDGE_MODEL) fallback to a stable alias like "claude-haiku-4-5" so the example
shows using process.env.JUDGE_MODEL || "claude-haiku-4-5" (or the equivalent in
the repo's docs examples) instead of the fixed "claude-haiku-4-5-20251001"
value.
In `@examples/dspy_optimization.ipynb`:
- Around line 101-102: The baseline comparison uses a topic present in trainset
which can bias results; change the input to crew.kickoff (the baseline_input
used to produce baseline_output) to a true holdout topic not present in trainset
(e.g., define holdout_topic and use that string for both baseline and optimized
runs), update the other occurrence at the second kickoff call around line 329 to
use the same holdout_topic variable, and ensure any references to trainset or
training examples remain unchanged so the holdout stays unseen during
evaluation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ac9439ae-2f3d-4e79-99c5-1df1f6e93a22
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
docs/docs.jsondocs/en/concepts/dspy-optimization.mdxexamples/dspy_optimization.ipynblib/crewai/pyproject.tomllib/crewai/src/crewai/agent/core.pylib/crewai/src/crewai/agents/agent_builder/base_agent.pylib/crewai/src/crewai/events/types/__init__.pylib/crewai/src/crewai/events/types/optimizer_events.pylib/crewai/src/crewai/optimizers/__init__.pylib/crewai/src/crewai/optimizers/dspy_optimizer.pylib/crewai/src/crewai/optimizers/types.pylib/crewai/tests/agents/test_agent_instruction_api.pylib/crewai/tests/optimizers/__init__.pylib/crewai/tests/optimizers/test_dspy_optimizer.py
Three targeted fixes responding to coderabbitai review: 1. dspy_optimizer.py: add duplicate-role guard in compile() — two agents sharing the same role collide in agent_predictors (keyed by role), causing silent misrouting of few-shot demo injection and backstory writeback. Raises ValueError with the duplicate role names before the optimization loop starts. 2. test_dspy_optimizer.py: scope event bus assertions to the current optimizer instance — subscribe after creating the optimizer, call flush() before subscribing to drain stale events, capture (src, event) tuples, and filter with s is optimizer so tests cannot pass on events from parallel tests. 3. dspy_optimizer.py: drop redundant type: ignore[import-not-found] comment that triggers a mypy unused-ignore error when dspy is installed. 29 optimizer tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR 3 of the DSPy prompt optimization feature: user-facing documentation and a runnable end-to-end example demonstrating the optimizer API. Changes: - docs/en/concepts/dspy-optimization.mdx: Mintlify page covering quickstart, sequence diagram, algorithm comparison, metric writing guide, API reference, observability, limitations, and troubleshooting - examples/dspy_optimization.ipynb: 9-cell notebook with a two-agent email-drafting crew, pairwise LLM-judge metric, and 10 training examples - docs/docs.json: add dspy-optimization to Core Concepts navigation (between training and memory) - lib/crewai/src/crewai/optimizers/dspy_optimizer.py: drop redundant type: ignore[import-not-found] comment that caused mypy unused-ignore error All 28 optimizer unit tests pass. ruff clean. mypy clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four targeted fixes responding to coderabbitai review on the docs PR: 1. examples/dspy_optimization.ipynb: replace assert statements for env var checks with explicit RuntimeError so the notebook fails immediately with a clear message even when Python is run with -O optimizations enabled. 2. examples/dspy_optimization.ipynb: use a held-out topic (HOLDOUT_TOPIC = customer webinar follow-up) for baseline and optimized crew comparison so the demo comparison is fair and does not overstate gains by using a topic that appears in the training set. 3. examples/dspy_optimization.ipynb: use os.getenv(JUDGE_MODEL, claude-haiku-4-5) stable alias instead of pinned dated version so the notebook keeps working across minor model updates; users can still pin via env var. 4. docs/en/concepts/dspy-optimization.mdx: same JUDGE_MODEL env var pattern with claude-haiku-4-5 stable alias in the LLM-judge metric code example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
All 3 actionable items and 2 nitpicks from the CodeRabbit review are addressed: Actionable (implemented):
Nitpicks (implemented): |
Summary
This is PR 3 of 3 in the DSPy prompt optimization feature. It adds all user-facing documentation for the
DSPyOptimizerAPI introduced in PR 2 (#5842).Changes
docs/en/concepts/dspy-optimization.mdx— Mintlify concept page with:compile()flowDSPyOptimizer,OptimizationResult,AgentInstructionsexamples/dspy_optimization.ipynb— 9-cell end-to-end notebook:claude-haiku-4-5-20251001with position-bias mitigationnum_trials=20) → result inspection → comparisondocs/docs.json— addsdspy-optimizationto Core Concepts navigation betweentrainingandmemorylib/crewai/src/crewai/optimizers/dspy_optimizer.py— removesimport-not-foundfrom# type: ignorecomment; fixes mypyunused-ignoreerror whendspyis installedProof of quality
What is explicitly deferred
/prompt-injection-auditpass runs after mergeTest plan
docs/en/concepts/dspy-optimization.mdxin Mintlify dev server — all components renderdspy-optimizationappears in Core Concepts sidebar between Training and Memoryexamples/dspy_optimization.ipynb— cells run top-to-bottom with API keys setpip install 'crewai[dspy]'installs cleanly in a fresh venv🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests