Using Vertex AI Gemini REST API (C# and Rust)

Introduction

Back in December, Google announced Gemini, its most capable and general model so far available from Google AI Studio and Google Cloud Vertex AI.

Gemini

The Try the Vertex AI Gemini API documentation page shows instructions on how to use the Gemini API from Python, Node.js, Java, and Go.

alt_text

That’s great but what about other languages?

Even though there are no official SDKs/libraries for other languages yet, you can use the Gemini REST API to access the same functionality with a little bit more work on your part.

In this blog post, I want to take a look at an example on how to use the Gemini REST API from languages without SDK support yet: C# and Rust in this case.

Gemini REST API

There are currently two models available in the Gemini API:

Gemini Pro model (gemini-pro): Fine-tuned model to handle natural language tasks such as classification, summarization, extraction, and writing.
Gemini Pro Vision model (gemini-pro-vision): Multimodal model that supports adding image and video prompts for a text response.

Gemini API page is a great resource on learning how to make the right HTTP requests with the right parameters.

For example, to send a multi-modal (text + image) request to Gemini, you’d make an HTTP POST to gemini-pro-vision model with the right parameters in the request body:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro-vision:streamGenerateContent

{
  "contents": {
    "role": "user",
    "parts": [
      {
        "fileData": {
          "mimeType": "image/png",
          "fileUri": "gs://cloud-samples-data/ai-platform/flowers/daisy/10559679065_50d2b16f6d.jpg"
        }
      },
      {
        "text": "Describe this picture."
      }
    ]
  },
  "safety_settings": {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_LOW_AND_ABOVE"
  },
  "generation_config": {
    "temperature": 0.4,
    "topP": 1.0,
    "topK": 32,
    "maxOutputTokens": 2048
  }
}

A couple of things to watch out for:

You need to get and set an authentication token with your request.
The responses come in batches (see sample responses), so you need to extract text from each batch and combine them to get the full text.

Now, let’s take a look at how to make these requests from actual code.

Gemini REST API from C#

Let’s say we want Gemini to describe an image in Cloud Storage for us from my default language: C#.

Define prompt and image

Let’s define the prompt and the image stored on Cloud Storage:

string text = "Describe this image";
string imageUrl = "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg";
Console.WriteLine($"Text: {text}");
Console.WriteLine($"ImageUrl: {imageUrl}");

It’s an image of a cat 🙂

Cat

Construct the request payload

Construct the right request payload with prompt and image url:

private static string GeneratePayload(string text, string imageUrl)
{
    var payload = new
    {
        contents = new
        {
            role = "USER",
            parts = new object[] {
                new {text = text},
                new {file_data = new {
                        mime_type = "image/png",
                        file_uri = imageUrl
                    }
                }
            }
        },
        generation_config = new
        {
            temperature = 0.4,
            top_p = 1,
            top_k = 32,
            max_output_tokens = 2048
        }
    };
    return JsonConvert.SerializeObject(payload);
}

Send the request with auth token

Get an authentication token and send the HTTP request:

private async static Task<string> SendRequest(string payload)
{
    GoogleCredential credential = GoogleCredential.GetApplicationDefault();
    var handler = credential.ToDelegatingHandler(new HttpClientHandler());
    using HttpClient httpClient = new(handler);

    httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

    HttpResponseMessage response = await httpClient.PostAsync(EndpointUrl,
        new StringContent(payload, Encoding.UTF8, "application/json"));

    response.EnsureSuccessStatusCode();

    return await response.Content.ReadAsStringAsync();
}

Parse the response

Receive the HTTP response from Gemini and deserialize the JSON response body. JSON response body will have a list of candidates to parse through (see response body). It makes sense to create a GeminiResponse.cs class to capture this to make our lives easier in JSON deserialization:

string payload = GeneratePayload(text, imageUrl);
string response = await SendRequest(payload);
var geminiResponses = JsonConvert.DeserializeObject<List<GeminiResponse>>(response);

Finally, use some LINQ magic to combine the text in each batch into a final text:

string fullText = string.Join("", geminiResponses
    .SelectMany(response => response.Candidates)
    .SelectMany(candidates => candidates.Content.Parts)
    .Select(part => part.Text));

Console.WriteLine($"Response: {fullText}");

You can see the full sample in my GitHub repo in GenerateTextFromImageGcs.cs.

Run the sample

Run the sample:

dotnet run

Text: Describe this image
ImageUrl: gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg
Response:  A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The snow is white and it is covering the ground.

That’s a nice and pretty good description of the image from Gemini!

Gemini REST API from Rust

Of course, you can use the REST API from any language. My colleague from the Chrome DevRel team, André Bandarra, rewrote my sample in Rust. It follows the same pattern of getting an auth token, generating the request with the right parameters and combining the text from the response:

let authentication_manager = AuthenticationManager::new().await?;
let scopes = &["https://www.googleapis.com/auth/cloud-platform"];
let token = authentication_manager.get_token(scopes).await?;

let prompt = "Describe this image";
let image_url = "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg";

let payload = GenerateContentRequest {
    contents: vec![Content {
        role: "user".to_string(),
        parts: vec![
            Part::Text(prompt.to_string()),
            Part::FileData {
                mime_type: "image/jpeg".to_string(),
                file_uri: image_url.to_string(),
            },
        ],
    }],
    generation_config: Some(GenerationConfig {
        max_output_tokens: Some(2048),
        temperature: Some(0.4),
        top_p: Some(1.0),
        top_k: Some(32),
        ..Default::default()
    }),
    tools: None,
};

let resp = reqwest::Client::new()
    .post(&endpoint_url)
    .bearer_auth(token.as_str())
    .json(&payload)
    .send()
    .await?;

let response = resp.json::<GenerateContentResponse>().await?;
response.0.iter().for_each(|chunk| {
    chunk.candidates.iter().for_each(|candidate| {
        candidate.content.parts.iter().for_each(|part| {
            if let Part::Text(text) = part {
                print!("{}", text);
            }
        });
    });
});

You can check out the full sample in his repo in generate-text-from-image-gcs.rs.

Summary

Admittedly, using the Gemini REST API is not easy as it can be as other languages that have Gemini SDK support. However, with a little bit of work in making the right request and parsing the response, it’s straightforward to talk to Gemini with the REST API from any other language. To see more Gemini samples, you can check out our repos on GitHub:

As always, if you have any questions or feedback, feel free to reach out to me on Twitter @meteatamel.