-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Affects: OpenAI, Azure OpenAI, Google/Gemini, MistralAI Connectors (.NET); All Connectors (Python)
Description
When using GetChatMessageContentAsync (non-streaming) with FunctionChoiceBehavior.Auto(), any text the LLM generates before a tool call is lost to the consumer. The method only returns the final LLM response (after all tool calls are complete), even though intermediate text is added to ChatHistory.
In contrast, streaming mode (GetStreamingChatMessageContentsAsync) correctly yields all text, including content generated before tool calls.
Expected Behavior
When the LLM generates:
- Text: "I'll check the current time..."
- Tool call:
get_time() - Text: "The time is 14:36."
The consumer should receive both text segments, not just the final one.
Actual Behavior
The consumer only receives: "The time is 14:36."
The text "I'll check the current time..." is added to ChatHistory but never returned to the caller.
Empirical Evidence
We tested the same LLM prompt with identical tools in both modes:
| Mode | Response Length | Content |
|---|---|---|
| Streaming | 2502 chars | Full response including intermediate text + tool planning + final summary |
| Non-Streaming | 1652 chars | Only the final summary (missing ~850 chars of intermediate text) |
The missing text was the LLM's explanation of what it was about to do before invoking tools.
Root Cause
The auto-invoke loop in multiple connectors returns only the last response:
// ClientCore.ChatCompletion.cs, lines 159-224
for (int requestIndex = 0; ; requestIndex++)
{
// ... make API request ...
if (!functionCallingConfig.AutoInvoke || chatCompletion.ToolCalls.Count == 0)
{
return [chatMessageContent]; // ← Returns only ONE message!
}
// Process function calls - adds chatMessageContent to ChatHistory
// but does NOT return it to consumer
await this.FunctionCallsProcessor.ProcessFunctionCallsAsync(...);
// Loop continues, only last iteration's message is returned
}Related Issue: Token Usage Metadata is Incomplete
The ChatMessageContent.Metadata["Usage"] only contains usage from the last API request:
- Request 1: LLM generates text + tool call → 500 output tokens
- Request 2: LLM generates final response → 200 output tokens
- Actual cost: 700 output tokens
- Metadata reports: 200 output tokens ❌
Note: Internal telemetry counters (s_promptTokensCounter, etc.) correctly sum all tokens.
Impact
- Applications using non-streaming mode with auto function calling miss user-facing content
- The LLM's status updates or reasoning before tool calls are invisible to consumers
- Inconsistent experience between streaming and non-streaming modes
- Token usage metadata understates actual consumption
Affected Connectors
.NET
| Connector | Implementation File |
|---|---|
| OpenAI | Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs |
| Azure OpenAI | Same file (extends ClientCore) |
| Google/Gemini | Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs |
| MistralAI | Connectors.MistralAI/Client/MistralClient.cs |
Python
| Connector | Implementation File |
|---|---|
| All Connectors | connectors/ai/chat_completion_client_base.py (shared base class) |
The Python SDK has the auto-invoke loop in the base class, so all connectors (OpenAI, Azure, Google, Mistral, Anthropic, Azure AI Inference, Ollama, etc.) are affected. Python does not have an equivalent to Microsoft.Extensions.AI.
Note (.NET only): The newer Microsoft.Extensions.AI-based .NET connectors (Azure AI Inference, Ollama) use FunctionInvokingChatClient which properly aggregates all response messages and usage. The long-term solution for .NET may be migrating all connectors to this abstraction.
Environment
- SK .NET Version: 1.x (current main branch)