docs(ollama): add streaming-with-tools example to OllamaChatGenerator reference#11268
Conversation
… reference Closes deepset-ai/haystack-core-integrations#3263 (follow-up). The component reference page already covers Tool Support and Streaming in separate sections, but no example shows them combined. Adds a Streaming with Tools section between the two, with an executable example verified empirically against llama3.1:8b on Ollama. Notable behavior captured in the doc: when the model invokes a tool, streamed chunks carry tool_calls deltas and chunk.content is empty; the final ChatMessage has text=None and tool_calls populated.
|
@albertodiazdurana is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
Per CONTRIBUTING.md, every PR requires a release note under releasenotes/notes/. Categorized as `enhancements` to match the shape of prior docs-only release notes (e.g., docs-cleaner-markdown-ocr-examples-...yaml).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
anakin87
left a comment
There was a problem hiding this comment.
Thank you!
I left some comments.
In addition, please also copy this change to 2.28 versioned docs (latest stable) in docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx.
| @@ -0,0 +1,4 @@ | |||
| --- | |||
There was a problem hiding this comment.
Since this change only affects docs, we don't need a release note. Please remove this file.
| :::tip[What to expect when tools fire] | ||
| When the model emits a tool call rather than free-form text, streamed chunks carry `tool_calls` deltas and `chunk.content` is empty. The final `replies[0].text` will be `None`, and `replies[0].tool_calls` holds the reconstructed call list. Plain text streaming and tool calling are mutually exclusive within a single generation step. | ||
| ::: |
There was a problem hiding this comment.
| :::tip[What to expect when tools fire] | |
| When the model emits a tool call rather than free-form text, streamed chunks carry `tool_calls` deltas and `chunk.content` is empty. The final `replies[0].text` will be `None`, and `replies[0].tool_calls` holds the reconstructed call list. Plain text streaming and tool calling are mutually exclusive within a single generation step. | |
| ::: |
this is already kinda clear, so I'd remove this tip section
- Remove releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml: docs-only change does not need a release note. - Remove the :::tip[What to expect when tools fire] admonition from docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx: the inline comments in the streaming-with-tools example already convey the same information. - Add the Streaming with Tools section to docs-website/versioned_docs/version-2.28/pipeline-components/generators/ollamachatgenerator.mdx (latest stable), byte-identical to the v3 docs section.
|
Thanks for the review @anakin87! Addressed all three comments:
Ready for another look when you have a moment. |
Related Issues
Proposed Changes:
Adds a
### Streaming with Toolssection to theOllamaChatGeneratorreference page (docs-website/docs/pipeline-components/generators/ollamachatgenerator.mdx), between the existing### Streamingsection and the## Usageheading.The section includes:
streaming_callbackandtoolsonOllamaChatGenerator.chunk.contentis empty in the streamed chunks, and the finalreplies[0].textisNonewhilereplies[0].tool_callscarries the reconstructed call list.How did you test it?
Manually verified against
OllamaChatGenerator + llama3.1:8bon Ollama, with two spike scripts:Primary spike (directive prompt invoking
get_weather): 2 chunks fired (1 carrying the tool-call delta, 1 closing).replies[0].tool_calls = [ToolCall(tool_name='get_weather', arguments={'city': 'Berlin'}, ...)].replies[0].textisNone.meta.finish_reason: stop.Backfill spike (mutual-exclusivity check): six prompts spanning directive / ambiguous / arithmetic / unrelated-topic / literary. Across all six, never observed text content and tool-call deltas in the same chunk, nor text and
tool_callstogether in the finalChatMessage. The arithmetic prompt ("What is 2+2?") produced 99 chunks of pure text (98 text-chunks, 0 tool-chunks), confirming text streaming works as expected when the model elects not to use a tool.The change is documentation-only and does not introduce code changes; no unit-test additions apply.
Notes for the reviewer
docs(ollama): ....releasenotes/notes/streaming-with-tools-ollamachatgenerator-docs-8e339d62f38ebd06.yaml(single-entryenhancements, RST inline code, matches the shape of prior docs-only release notes).Checklist
docs:.docs-website/docs/.../*.mdxandreleasenotes/notes/*.yaml. Happy to run pre-commit if a maintainer flags anything.