[Python] Add agent-framework-azure-ai-contentunderstanding package#4829
Merged
TaoChenOSU merged 95 commits intomicrosoft:mainfrom Apr 28, 2026
Merged
Conversation
Contributor
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new optional Python connector package, agent-framework-azure-ai-contentunderstanding, integrating Azure Content Understanding (CU) into the Agent Framework as a BaseContextProvider for automatic attachment analysis and optional vector-store (file_search) indexing.
Changes:
- Introduces
ContentUnderstandingContextProviderplus supporting models and vector-store upload abstraction (FileSearchBackend/FileSearchConfig). - Adds extensive unit + integration tests and CU result fixtures, along with script + DevUI samples.
- Wires the new workspace package into
python/pyproject.tomlandpython/uv.lock.
Reviewed changes
Copilot reviewed 36 out of 38 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds the new workspace member and locks new deps (azure-ai-contentunderstanding, filetype). |
| python/pyproject.toml | Registers the package in workspace deps and adds pyright test env config. |
| python/packages/azure-ai-contentunderstanding/pyproject.toml | New package metadata, deps, and tooling config (pytest/ruff/mypy/pyright). |
| python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/init.py | Public exports for provider/models/backends. |
| python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_models.py | Defines DocumentStatus, AnalysisSection, DocumentEntry, FileSearchConfig. |
| python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_file_search.py | Adds backend abstraction for vector store upload/delete across OpenAI/Foundry clients. |
| python/packages/azure-ai-contentunderstanding/agent_framework_azure_ai_contentunderstanding/_context_provider.py | Implements CU analysis, session tracking, background analysis, MIME sniffing, and optional vector-store upload. |
| python/packages/azure-ai-contentunderstanding/tests/cu/conftest.py | Adds fixtures and mock CU client factory. |
| python/packages/azure-ai-contentunderstanding/tests/cu/test_models.py | Unit tests for enums/typed models and FileSearchConfig factories. |
| python/packages/azure-ai-contentunderstanding/tests/cu/test_context_provider.py | Comprehensive unit tests for provider flows (analysis, background, sniffing, file_search). |
| python/packages/azure-ai-contentunderstanding/tests/cu/test_integration.py | Live CU integration tests (skipped unless env var is set). |
| python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_pdf_result.json | CU PDF fixture for unit tests. |
| python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_invoice_result.json | CU invoice fixture for unit tests. |
| python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_image_result.json | CU image fixture for unit tests. |
| python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_audio_result.json | CU audio fixture for unit tests. |
| python/packages/azure-ai-contentunderstanding/tests/cu/fixtures/analyze_video_result.json | CU video fixture for unit tests. |
| python/packages/azure-ai-contentunderstanding/README.md | Package README with setup guidance and usage examples. |
| python/packages/azure-ai-contentunderstanding/LICENSE | Adds MIT license file for the new package. |
| python/packages/azure-ai-contentunderstanding/AGENTS.md | Package-specific agent/dev notes and architecture description. |
| python/packages/azure-ai-contentunderstanding/.gitignore | Ignores local-only artifacts under the package. |
| python/packages/azure-ai-contentunderstanding/samples/README.md | Top-level samples index for scripts and DevUI examples. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/01_document_qa.py | Script sample: single PDF upload + Q&A. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/02_multi_turn_session.py | Script sample: session persistence across turns. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/03_multimodal_chat.py | Script sample: PDF+audio+video parallel CU analysis. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/04_invoice_processing.py | Script sample: per-file analyzer override for invoice extraction. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/05_background_analysis.py | Script sample: short max_wait triggers background analysis + status. |
| python/packages/azure-ai-contentunderstanding/samples/01-get-started/06_large_doc_file_search.py | Script sample: CU extraction + vector-store indexing for file_search. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/agent.py | DevUI agent: CU-powered upload + chat. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/init.py | DevUI agent module export. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/01-multimodal_agent/README.md | DevUI setup/usage doc for multimodal agent. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/agent.py | DevUI agent: CU + file_search (Azure OpenAI backend). |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/init.py | DevUI agent module export. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/azure_openai_backend/README.md | DevUI setup/usage doc for Azure OpenAI file_search agent. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/agent.py | DevUI agent: CU + file_search (Foundry backend). |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/init.py | Foundry backend sample package init. |
| python/packages/azure-ai-contentunderstanding/samples/02-devui/02-file_search_agent/foundry_backend/README.md | DevUI setup/usage doc for Foundry file_search agent. |
| python/AGENTS.md | Adds the new package to the Python “Azure Integrations” index. |
yungshinlintw
commented
Mar 27, 2026
5677c0b to
9f31124
Compare
Member
eavanvalkenburg
left a comment
There was a problem hiding this comment.
left a couple of comments, the most important one is in the 05_background_analysis because it is a core question about how this should be used, let's discuss options for that and then we need some updates most likely
Add Azure Content Understanding integration as a context provider for the Agent Framework. The package automatically analyzes file attachments (documents, images, audio, video) using Azure CU and injects structured results (markdown, fields) into the LLM context. Key features: - Multi-document session state with status tracking (pending/ready/failed) - Configurable timeout with async background fallback for large files - Output filtering via AnalysisSection enum - Auto-registered list_documents() and get_analyzed_document() tools - Supports all CU modalities: documents, images, audio, video - Content limits enforcement (pages, file size, duration) - Binary stripping of supported files from input messages Public API: - ContentUnderstandingContextProvider (main class) - AnalysisSection (output section selector enum) - ContentLimits (configurable limits dataclass) Tests: 46 unit tests, 91% coverage, all linting and type checks pass.
- Replace synthetic fixtures with real CU API responses (sanitized) - Update test assertions to match real data (Contoso vs CONTOSO, TotalAmount vs InvoiceTotal, field values from real analysis) - Add --pre install note in README (preview package) - Document unenforced ContentLimits fields (max_pages, duration)
…remove wrappers - All __init__ args now keyword-only (matches FoundryChatClient pattern) - New 'client' param accepts pre-built ContentUnderstandingClient - core dep bound: >=1.0.0rc5 → >=1.0.0,<2 - Self import moved after local imports - Removed 9 static method wrappers; callsites use module functions directly - Tests updated to import derive_doc_key and format_result directly
The client was being created twice — once inside the if/else block and again unconditionally after it. The second instantiation overwrote the pre-built client path and failed type checking when credential was None.
moonbox3
reviewed
Apr 22, 2026
Package: agent-framework-azure-ai-contentunderstanding → agent-framework-azure-contentunderstanding Module: agent_framework_azure_ai_contentunderstanding → agent_framework_azure_contentunderstanding Directory: packages/azure-ai-contentunderstanding → packages/azure-contentunderstanding Per agreement with PM and MAF team to drop 'AI' from the package name.
…amespace Enables: from agent_framework.foundry import ContentUnderstandingContextProvider Exports: ContentUnderstandingContextProvider, FileSearchConfig, FileSearchBackend, AnalysisSection, DocumentStatus Updates all samples and README to use the foundry namespace import.
Member
eavanvalkenburg
left a comment
There was a problem hiding this comment.
just some cleanup to do
eavanvalkenburg
approved these changes
Apr 23, 2026
TaoChenOSU
reviewed
Apr 23, 2026
TaoChenOSU
reviewed
Apr 23, 2026
TaoChenOSU
reviewed
Apr 23, 2026
…_search sample Address review feedback from TaoChenOSU: - 05_large_doc_file_search.py: use client.client instead of manually constructing AsyncAzureOpenAI; remove openai dependency - azure_openai_backend/agent.py: import reorder only (AIProjectClient kept — required for sync vector store creation in DevUI)
When a ContentUnderstandingClient is passed via client=, the caller owns its lifecycle. Added _owns_client flag so close() only closes the client when we created it internally.
TaoChenOSU
approved these changes
Apr 28, 2026
moonbox3
added a commit
that referenced
this pull request
Apr 29, 2026
* Python: bump package versions for 1.2.2 release PATCH bump (1.2.1 -> 1.2.2) for the released cohort. Five PRs land in this window: - agent-framework-openai: fix file_search citations breaking the assistant- message history roundtrip (#5557) — drives the released-tier PATCH - agent-framework-orchestrations: [BREAKING] standardize orchestration terminal outputs as AgentResponse (#5301) - agent-framework-core, agent-framework-declarative: preserve Workflow.run() shared state across calls, accept list[Message] in declarative start executor, and coerce Enum values when serializing PowerFx symbols (#5531) - agent-framework-foundry-hosting: add hosted Durable Workflow support (#5531) - agent-framework-azure-contentunderstanding: new alpha package — Azure AI Content Understanding context provider (#4829) - dependencies: workspace package dependency refresh (#5555) Per lockstep convention, all 21 beta packages stamp 1.0.0b260429 and all 4 alpha packages (now including the new contentunderstanding) stamp 1.0.0a260429. Date stamp reflects 2026-04-29 Pacific. Every non-core package floor on agent-framework-core is raised to >=1.2.2; the new contentunderstanding package's stale >=1.0.0 floor is brought into line. Two follow-on fixes bundled to keep validate-dependency-bounds-test green at lowest-direct resolution: - Bump agent-framework-azure-contentunderstanding's azure-ai-content understanding lower bound from >=1.0.0 to >=1.0.1 (1.0.0 ships without proper typing — pyright reports 65 unknown-type errors) - Add pyright ignore comments to core/foundry/__init__.pyi for the new alpha package's type-stub imports, since alpha packages are not in core's [all] extra and therefore aren't installed at lowest-direct * Python: add #5552 to 1.2.2 CHANGELOG Add the streaming-span observability fix to the Fixed section. PR is on upstream/main but not yet pulled into origin/main; the code itself will land via the PR merge. * Python: address PR #5561 review feedback on dependency bounds Two packaging fixes flagged in review: 1. agent-framework-azure-contentunderstanding: add agent-framework-foundry as a runtime dependency. The package's README directs users to `pip install agent-framework-azure-contentunderstanding --pre` and the basic example imports `FoundryChatClient` from `agent_framework.foundry`, so the documented install path was failing with ImportError. Pulling agent-framework-foundry into deps makes the advertised entry path self-contained. 2. agent-framework-foundry: bump agent-framework-openai lower bound from >=1.1.0 to >=1.2.2,<2. Foundry imports private modules from agent_framework_openai (`_chat_client.py:22`, `_agent.py:34`), so resolvers were free to pair foundry==1.2.2 with older OpenAI versions that lack this release's coordinated Responses/history fix. Lockstep the floor with the released cohort to prevent mismatched installs. Both changes pass `validate-dependency-bounds-test` lower + upper at their respective packages.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reviewer's Guide
Closes #4942
This package adds a
BaseContextProviderimplementation that bridges Azure Content Understanding (CU) with the Agent Framework. When a user sends file attachments (PDF, images, audio, video), the provider intercepts them inbefore_run(), sends them to CU for analysis, and injects the structured results (markdown + extracted fields) back into the LLM context — so the agent can answer questions about the files without the developer writing any extraction code.Quick usage:
Suggested review order
1. Start with samples — they show the feature set and usage patterns end-to-end:
01_document_qa.pyContent.from_uri(),context_providers=[cu], and how CU results appear in the agent's response.02_multi_turn_session.pyAgentSessionpersistence — upload a file on turn 1, ask follow-up questions on turns 2–3 without re-uploading. Shows howstate["documents"]carries across turns.03_multimodal_chat.py04_invoice_processing.pyadditional_properties={"analyzer_id": "prebuilt-invoice"}to extract structured invoice fields (vendor, total, line items) instead of generic markdown.05_background_analysis.pymax_wait=0.5— file starts analyzing in the background while the agent responds immediately. Next turn resolves the pending result. Shows theanalyzing→readystatus flow.06_large_doc_file_search.pyfile_searchtool instead of injecting full content into context.2. Then review the core implementation:
_context_provider.py(1087 lines)before_run()hook, file detection/stripping, CU analysis with timeout + background fallback, output formatting, tool registration. Most important file to review._models.pyDocumentEntry,DocumentStatus,AnalysisSection,FileSearchConfigTypedDicts and enums exposed to users_file_search.pyFileSearchBackendprotocol + OpenAI/Foundry factory methods for vector store integration__init__.pypyproject.tomltests/MAF API usage (needs team alignment)
This package uses the following internal/private MAF APIs — if any of these are changing or not intended for external use, this package may need updates:
BaseContextProviderand itsbefore_run()hookSessionContext.extend_instructions(),extend_messages(),extend_tools()Content.from_data(),Content.from_uri(),Content.type,Content.media_type,Content.additional_propertiesFunctionToolfor registeringlist_documents()agent_framework._sessions.AgentSessionagent_framework._settings.load_settings()This PR adds
agent-framework-azure-ai-contentunderstanding, an optional connector package that integrates Azure Content Understanding (CU) into the Agent Framework as a context provider.What's Included
Core (
_context_provider.py,_models.py,_file_search.py)ContentUnderstandingContextProvider-- auto-analyzes file attachments (PDF, images, audio, video) via Azure CU and injects structured results (markdown, fields) into LLM contextprebuilt-documentSearch,prebuilt-audioSearch,prebuilt-videoSearch)analyzing/uploading/ready/failed)max_wait) with async background fallbackAnalysisSectionenumlist_documents()tool for status queriesapplication/octet-stream)Content.additional_properties["analyzer_id"]-- mix different analyzers in the same turn (e.g.,prebuilt-invoicefor invoices alongsideprebuilt-documentSearchfor general docs)FileSearchConfigfor vector store integration (OpenAI/Foundry backends)Samples (6 scripts + 3 DevUI)
01_document_qa.py-- Single PDF upload + Q&A02_multi_turn_session.py-- AgentSession persistence across turns03_multimodal_chat.py-- PDF + audio + video parallel analysis (5 turns)04_invoice_processing.py-- Structured field extraction with prebuilt-invoice05_background_analysis.py-- Non-blocking analysis with max_wait + status tracking06_large_doc_file_search.py-- CU extraction + vector store RAG02-devui/01-multimodal_agent-- Interactive web UI for uploading and chatting with documents/audio/video02-devui/02-file_search_agent/azure_openai_backend-- DevUI with CU + Azure OpenAI file_search RAG02-devui/02-file_search_agent/foundry_backend-- DevUI with CU + Foundry file_search RAGTests