Skip to content

.Net: Bug: Non-streaming GetChatMessageContentAsync discards LLM text generated before tool calls during auto function calling loop #13420

@Cozmopolit

Description

@Cozmopolit

Affects: OpenAI, Azure OpenAI, Google/Gemini, MistralAI Connectors (.NET); All Connectors (Python)


Description

When using GetChatMessageContentAsync (non-streaming) with FunctionChoiceBehavior.Auto(), any text the LLM generates before a tool call is lost to the consumer. The method only returns the final LLM response (after all tool calls are complete), even though intermediate text is added to ChatHistory.

In contrast, streaming mode (GetStreamingChatMessageContentsAsync) correctly yields all text, including content generated before tool calls.


Expected Behavior

When the LLM generates:

  1. Text: "I'll check the current time..."
  2. Tool call: get_time()
  3. Text: "The time is 14:36."

The consumer should receive both text segments, not just the final one.


Actual Behavior

The consumer only receives: "The time is 14:36."
The text "I'll check the current time..." is added to ChatHistory but never returned to the caller.


Empirical Evidence

We tested the same LLM prompt with identical tools in both modes:

Mode Response Length Content
Streaming 2502 chars Full response including intermediate text + tool planning + final summary
Non-Streaming 1652 chars Only the final summary (missing ~850 chars of intermediate text)

The missing text was the LLM's explanation of what it was about to do before invoking tools.


Root Cause

The auto-invoke loop in multiple connectors returns only the last response:

// ClientCore.ChatCompletion.cs, lines 159-224
for (int requestIndex = 0; ; requestIndex++)
{
    // ... make API request ...
    
    if (!functionCallingConfig.AutoInvoke || chatCompletion.ToolCalls.Count == 0)
    {
        return [chatMessageContent];  // ← Returns only ONE message!
    }

    // Process function calls - adds chatMessageContent to ChatHistory
    // but does NOT return it to consumer
    await this.FunctionCallsProcessor.ProcessFunctionCallsAsync(...);
    
    // Loop continues, only last iteration's message is returned
}

Related Issue: Token Usage Metadata is Incomplete

The ChatMessageContent.Metadata["Usage"] only contains usage from the last API request:

  • Request 1: LLM generates text + tool call → 500 output tokens
  • Request 2: LLM generates final response → 200 output tokens
  • Actual cost: 700 output tokens
  • Metadata reports: 200 output tokens ❌

Note: Internal telemetry counters (s_promptTokensCounter, etc.) correctly sum all tokens.


Impact

  • Applications using non-streaming mode with auto function calling miss user-facing content
  • The LLM's status updates or reasoning before tool calls are invisible to consumers
  • Inconsistent experience between streaming and non-streaming modes
  • Token usage metadata understates actual consumption

Affected Connectors

.NET

Connector Implementation File
OpenAI Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs
Azure OpenAI Same file (extends ClientCore)
Google/Gemini Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs
MistralAI Connectors.MistralAI/Client/MistralClient.cs

Python

Connector Implementation File
All Connectors connectors/ai/chat_completion_client_base.py (shared base class)

The Python SDK has the auto-invoke loop in the base class, so all connectors (OpenAI, Azure, Google, Mistral, Anthropic, Azure AI Inference, Ollama, etc.) are affected. Python does not have an equivalent to Microsoft.Extensions.AI.

Note (.NET only): The newer Microsoft.Extensions.AI-based .NET connectors (Azure AI Inference, Ollama) use FunctionInvokingChatClient which properly aggregates all response messages and usage. The long-term solution for .NET may be migrating all connectors to this abstraction.


Environment

  • SK .NET Version: 1.x (current main branch)

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETIssue or Pull requests regarding .NET codebugSomething isn't workingpythonPull requests for the Python Semantic Kerneltriage

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions