A user recently reported an intermittent error with C# and Gemini 1.5 model on Vertex AI’s streaming API. In this blog post, I want to outline what the error is, what causes it, and how to avoid it with the hopes of saving some frustration for someone out there.
Error
The user reported using Google.Cloud.AIPlatform.V1
library with version
2.27.0
to use Gemini 1.5
via Vertex AI’s streaming API and running into an
intermittent System.IO.IOException
.
As a test, I took our
GeminiQuickstart.cs,
change the model from gemini-1.0-pro-vision
to gemini-1.5-pro-preview-0409
and ran into the problem after running the sample a few times:
[xUnit.net 00:00:08.11] GeminiQuickstartTest.TestGenerateContentAsync [FAIL]
Failed GeminiQuickstartTest.TestGenerateContentAsync [7 s]
Error Message:
Grpc.Core.RpcException : Status(StatusCode="Unavailable", Detail="Error reading next message. IOException: The request was aborted. IOException: The response ended prematurely while waiting for the next frame from the server.", DebugException="System.IO.IOException: The request was aborted.")
---- System.IO.IOException : The request was aborted.
-------- System.IO.IOException : The response ended prematurely while waiting for the next frame from the server.
Stack Trace:
at Grpc.Net.Client.Internal.HttpContentClientStreamReader`2.MoveNextCore(CancellationToken cancellationToken)
at Google.Api.Gax.Grpc.AsyncResponseStream`1.MoveNextAsync(CancellationToken cancellationToken)
at GeminiQuickstart.GenerateContent(String projectId, String location, String publisher, String model) in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples/GeminiQuickstart.cs:line 82
at GeminiQuickstart.GenerateContent(String projectId, String location, String publisher, String model) in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples/GeminiQuickstart.cs:line 82
at GeminiQuickstartTest.TestGenerateContentAsync() in /Users/atamel/dev/github/meteatamel/dotnet-docs-samples/aiplatform/api/AIPlatform.Samples.Tests/GeminiQuickstartTest.cs:line 35
--- End of stack trace from previous location ---
Root cause
I wasn’t sure what was causing the issue but thankfully, we have the awesome Jon Skeet in our team and after some debugging, he pointed out issues 2358 and 2361 from grpc-dotnet project. Basically, there’s a bug in the interaction between .NET gRPC client + the Google L7 load balancer that causes the failure.
To summarize:
- The issue happens only when the streaming API is used.
- The issue manifests itself intermittently in Gemini 1.5 but it could technically happen in other Gemini versions too.
Fix and workarounds
The permanent fix is on the way on the .NET side: dotnet/runtime#9788 and it looks like it’ll be available in .NET 9, .NET 8, and backported to previous versions .NET 7, and .NET 6.
That’s great but what do you do in the meantime? There are a couple of options.
First, if you don’t require streaming, you can use the non-streaming API. In the GeminiQuickstart.cs sample, instead of streaming responses like this:
using PredictionServiceClient.StreamGenerateContentStream response = predictionServiceClient.StreamGenerateContent(generateContentRequest);
StringBuilder fullText = new();
AsyncResponseStream<GenerateContentResponse> responseStream = response.GetResponseStream();
await foreach (GenerateContentResponse responseItem in responseStream)
{
fullText.Append(responseItem.Candidates[0].Content.Parts[0].Text);
}
return fullText.ToString();
You can do a non-streaming call like this:
GenerateContentResponse response = await _predictionServiceClient.GenerateContentAsync(generateContentRequest);
Of course, this might not be feasible. If you require streaming, thankfully, there are a couple more workarounds.
-
You can specify an app switch to disable dynamic window sizing:
AppContext.SetSwitch("System.Net.SocketsHttpHandler.Http2FlowControl.DisableDynamicWindowSizing", true);
-
You can use
Grpc.Core
instead ofGrpc.Net.Client
:- Add a dependency to
Grpc.Core
version2.46.6
- Add a using directive for
Google.Api.Gax.Grpc
- In the
PredictionServiceClientBuilder
object initializer, addGrpcAdapter = GrpcCoreAdapter.Instance
- Add a dependency to
Since the first option is much easier, I tried that and it works great.
Hopefully this blog post saved some frustration for someone out there and in the worst
case, it’ll serve me as a reminder to remove the AppContext
workaround once
the permanent fix makes it to the .NET runtime 😀
As always, for any questions or feedback, feel free to reach out to me on Twitter @meteatamel.