.Net: Bug: Non-streaming `GetChatMessageContentAsync` discards LLM text generated before tool calls during auto function calling loop

**Affects:** OpenAI, Azure OpenAI, Google/Gemini, MistralAI Connectors (.NET); All Connectors (Python)

---

## Description

When using `GetChatMessageContentAsync` (non-streaming) with `FunctionChoiceBehavior.Auto()`, any text the LLM generates **before** a tool call is lost to the consumer. The method only returns the **final** LLM response (after all tool calls are complete), even though intermediate text is added to `ChatHistory`.

In contrast, **streaming mode** (`GetStreamingChatMessageContentsAsync`) correctly yields all text, including content generated before tool calls.

---

## Expected Behavior

When the LLM generates:
1. Text: "I'll check the current time..."
2. Tool call: `get_time()`
3. Text: "The time is 14:36."

The consumer should receive **both** text segments, not just the final one.

---

## Actual Behavior

The consumer only receives: "The time is 14:36."  
The text "I'll check the current time..." is added to `ChatHistory` but **never returned** to the caller.

---

## Empirical Evidence

We tested the **same LLM prompt** with identical tools in both modes:

| Mode | Response Length | Content |
|------|-----------------|---------|
| **Streaming** | 2502 chars | Full response including intermediate text + tool planning + final summary |
| **Non-Streaming** | 1652 chars | Only the final summary (missing ~850 chars of intermediate text) |

The missing text was the LLM's explanation of what it was about to do before invoking tools.

---

## Root Cause

The auto-invoke loop in multiple connectors returns only the **last** response:

```csharp
// ClientCore.ChatCompletion.cs, lines 159-224
for (int requestIndex = 0; ; requestIndex++)
{
    // ... make API request ...
    
    if (!functionCallingConfig.AutoInvoke || chatCompletion.ToolCalls.Count == 0)
    {
        return [chatMessageContent];  // ← Returns only ONE message!
    }

    // Process function calls - adds chatMessageContent to ChatHistory
    // but does NOT return it to consumer
    await this.FunctionCallsProcessor.ProcessFunctionCallsAsync(...);
    
    // Loop continues, only last iteration's message is returned
}
```

---

## Related Issue: Token Usage Metadata is Incomplete

The `ChatMessageContent.Metadata["Usage"]` only contains usage from the **last** API request:

- Request 1: LLM generates text + tool call → 500 output tokens
- Request 2: LLM generates final response → 200 output tokens
- **Actual cost**: 700 output tokens
- **Metadata reports**: 200 output tokens ❌

Note: Internal telemetry counters (`s_promptTokensCounter`, etc.) correctly sum all tokens.

---

## Impact

- Applications using non-streaming mode with auto function calling **miss user-facing content**
- The LLM's status updates or reasoning before tool calls are invisible to consumers
- Inconsistent experience between streaming and non-streaming modes
- Token usage metadata understates actual consumption

---

## Affected Connectors

### .NET

| Connector | Implementation File |
|-----------|---------------------|
| **OpenAI** | `Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs` |
| **Azure OpenAI** | Same file (extends `ClientCore`) |
| **Google/Gemini** | `Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs` |
| **MistralAI** | `Connectors.MistralAI/Client/MistralClient.cs` |

### Python

| Connector | Implementation File |
|-----------|---------------------|
| **All Connectors** | `connectors/ai/chat_completion_client_base.py` (shared base class) |

The Python SDK has the auto-invoke loop in the base class, so **all** connectors (OpenAI, Azure, Google, Mistral, Anthropic, Azure AI Inference, Ollama, etc.) are affected. Python does not have an equivalent to `Microsoft.Extensions.AI`.

**Note (.NET only):** The newer `Microsoft.Extensions.AI`-based .NET connectors (Azure AI Inference, Ollama) use `FunctionInvokingChatClient` which properly aggregates all response messages and usage. The long-term solution for .NET may be migrating all connectors to this abstraction.

---

## Environment

- **SK .NET Version:** 1.x (current main branch)



Connector	Implementation File
OpenAI	`Connectors.OpenAI/Core/ClientCore.ChatCompletion.cs`
Azure OpenAI	Same file (extends `ClientCore`)
Google/Gemini	`Connectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.cs`
MistralAI	`Connectors.MistralAI/Client/MistralClient.cs`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.Net: Bug: Non-streaming `GetChatMessageContentAsync` discards LLM text generated before tool calls during auto function calling loop #13420

Description

Expected Behavior

Actual Behavior

Empirical Evidence

Root Cause

Related Issue: Token Usage Metadata is Incomplete

Impact

Affected Connectors

.NET

Python

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mode	Response Length	Content
Streaming	2502 chars	Full response including intermediate text + tool planning + final summary
Non-Streaming	1652 chars	Only the final summary (missing ~850 chars of intermediate text)

.Net: Bug: Non-streaming GetChatMessageContentAsync discards LLM text generated before tool calls during auto function calling loop #13420

Description

Description

Expected Behavior

Actual Behavior

Empirical Evidence

Root Cause

Related Issue: Token Usage Metadata is Incomplete

Impact

Affected Connectors

.NET

Python

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

.Net: Bug: Non-streaming `GetChatMessageContentAsync` discards LLM text generated before tool calls during auto function calling loop #13420