Summary
For MLXLanguageModel, tool calling works in respond() but not in streamResponse(). Streaming hardcodes tools: nil and ignores tool-call stream items, so callers must choose between streamed token UX and tool calling for local MLX models — but not both. Foundation models get both.
Evidence (v0.8.0, Sources/AnyLanguageModel/Models/MLXLanguageModel.swift)
streamResponse builds the input with no tools — L1095-L1100:
let userInput = makeUserInput(chat: chat, tools: nil, processing: userInputProcessing, additionalContext: additionalContext)
- The stream loop discards tool-call items — L1127-L1128:
case .info, .toolCall:
break
respond() already implements the full tool cycle for reference — passes toolSpecs (L921) and loops over collect → resolve → re-generate (L917-L1002).
Use case
On-device assistant that streams tokens and calls tools (file search, web fetch, image generation). Today, enabling tools forces the non-streaming path, so the whole reply lands at once — a noticeable UX regression for local models.
Proposed approach (reuses existing helpers)
In streamResponse:
- Pass
mlxToolSpecs(for: session) (already defined at L843) into makeUserInput instead of nil.
- In the stream loop, collect
.toolCall items instead of break-ing; when the model stops with pending calls, resolve them via the existing resolveToolCalls(_:session:) (L1454) + makeTranscriptToolCalls (L1435), append results to the transcript, and continue generating — mirroring the respond() while-loop and reusing its repeated-call guard (L733). Continue yielding text snapshots between tool rounds.
Acceptance
streamResponse with a non-empty session.tools executes tools and streams text, at behavioral parity with respond().
Summary
For
MLXLanguageModel, tool calling works inrespond()but not instreamResponse(). Streaming hardcodestools: niland ignores tool-call stream items, so callers must choose between streamed token UX and tool calling for local MLX models — but not both. Foundation models get both.Evidence (v0.8.0,
Sources/AnyLanguageModel/Models/MLXLanguageModel.swift)streamResponsebuilds the input with no tools — L1095-L1100:respond()already implements the full tool cycle for reference — passestoolSpecs(L921) and loops over collect → resolve → re-generate (L917-L1002).Use case
On-device assistant that streams tokens and calls tools (file search, web fetch, image generation). Today, enabling tools forces the non-streaming path, so the whole reply lands at once — a noticeable UX regression for local models.
Proposed approach (reuses existing helpers)
In
streamResponse:mlxToolSpecs(for: session)(already defined at L843) intomakeUserInputinstead ofnil..toolCallitems instead ofbreak-ing; when the model stops with pending calls, resolve them via the existingresolveToolCalls(_:session:)(L1454) +makeTranscriptToolCalls(L1435), append results to the transcript, and continue generating — mirroring therespond()while-loop and reusing its repeated-call guard (L733). Continue yielding text snapshots between tool rounds.Acceptance
streamResponsewith a non-emptysession.toolsexecutes tools and streams text, at behavioral parity withrespond().