C# and Vertex AI Gemini streaming API bug and workaround


A user recently reported an intermittent error with C# and Gemini 1.5 model on Vertex AI’s streaming API. In this blog post, I want to outline what the error is, what causes it, and how to avoid it with the hopes of saving some frustration for someone out there.

Error

The user reported using Google.Cloud.AIPlatform.V1 library with version 2.27.0 to use Gemini 1.5 via Vertex AI’s streaming API and running into an intermittent System.IO.IOException.

As a test, I took our GeminiQuickstart.cs, change the model from gemini-1.0-pro-vision to gemini-1.5-pro-preview-0409 and ran into the problem after running the sample a few times:

[xUnit.net 00:00:08.11]     GeminiQuickstartTest.TestGenerateContentAsync [FAIL]
  Failed GeminiQuickstartTest.TestGenerateContentAsync [7 s]
  Error Message:
   Grpc.Core.RpcException : Status(StatusCode="Unavailable", Detail="Error reading next message. IOException: The request was aborted. IOException: The response ended prematurely while waiting for the next frame from the server.", DebugException="System.IO.IOException: The request was aborted.")
---- System.IO.IOException : The request was aborted.
-------- System.IO.IOException : The response ended prematurely while waiting for the next frame from the server.
  Stack Trace:
     at Grpc.Net.Client.Internal.HttpContentClientStreamReader`2.MoveNextCore(CancellationToken cancellationToken)
   at Google.Api.Gax.Grpc.AsyncResponseStream`1.MoveNextAsync(CancellationToken cancellationToken)
   at GeminiQuickstart.GenerateContent(String projectId, String location, String publisher, String model) in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples/GeminiQuickstart.cs:line 82
   at GeminiQuickstart.GenerateContent(String projectId, String location, String publisher, String model) in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples/GeminiQuickstart.cs:line 82
   at GeminiQuickstartTest.TestGenerateContentAsync() in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples.Tests/GeminiQuickstartTest.cs:line 35
--- End of stack trace from previous location ---

Root cause

I wasn’t sure what was causing the issue but thankfully, we have the awesome Jon Skeet in our team and after some debugging, he pointed out issues 2358 and 2361 from grpc-dotnet project. Basically, there’s a bug in the interaction between .NET gRPC client + the Google L7 load balancer that causes the failure.

To summarize:

  1. The issue happens only when the streaming API is used.
  2. The issue manifests itself intermittently in Gemini 1.5 but it could technically happen in other Gemini versions too.

Fix and workarounds

The permanent fix is on the way on the .NET side: dotnet/runtime#9788 and it looks like it’ll be available in .NET 9, .NET 8, and backported to previous versions .NET 7, and .NET 6.

That’s great but what do you do in the meantime? There are a couple of options.

First, if you don’t require streaming, you can use the non-streaming API. In the GeminiQuickstart.cs sample, instead of streaming responses like this:

using PredictionServiceClient.StreamGenerateContentStream response = predictionServiceClient.StreamGenerateContent(generateContentRequest);

StringBuilder fullText = new();

AsyncResponseStream<GenerateContentResponse> responseStream = response.GetResponseStream();
await foreach (GenerateContentResponse responseItem in responseStream)
{
    fullText.Append(responseItem.Candidates[0].Content.Parts[0].Text);
}

return fullText.ToString();

You can do a non-streaming call like this:

GenerateContentResponse response = await _predictionServiceClient.GenerateContentAsync(generateContentRequest);

Of course, this might not be feasible. If you require streaming, thankfully, there are a couple more workarounds.

  1. You can specify an app switch to disable dynamic window sizing:

    AppContext.SetSwitch("System.Net.SocketsHttpHandler.Http2FlowControl.DisableDynamicWindowSizing", true);
    
  2. You can use Grpc.Core instead of Grpc.Net.Client:

    • Add a dependency to Grpc.Core version 2.46.6
    • Add a using directive for Google.Api.Gax.Grpc
    • In the PredictionServiceClientBuilder object initializer, add GrpcAdapter = GrpcCoreAdapter.Instance

Since the first option is much easier, I tried that and it works great.


Hopefully this blog post saved some frustration for someone out there and in the worst case, it’ll serve me as a reminder to remove the AppContext workaround once the permanent fix makes it to the .NET runtime 😀

As always, for any questions or feedback, feel free to reach out to me on Twitter @meteatamel.


See also