Introduction
Back in December, Google announced Gemini, its most capable and general model so far available from Google AI Studio and Google Cloud Vertex AI.
The Try the Vertex AI Gemini API documentation page shows instructions on how to use the Gemini API from Python, Node.js, Java, and Go.
That’s great but what about other languages?
Even though there are no official SDKs/libraries for other languages yet, you can use the Gemini REST API to access the same functionality with a little bit more work on your part.
In this blog post, I want to take a look at an example on how to use the Gemini REST API from languages without SDK support yet: C# and Rust in this case.
Gemini REST API
There are currently two models available in the Gemini API:
- Gemini Pro model (
gemini-pro
): Fine-tuned model to handle natural language tasks such as classification, summarization, extraction, and writing. - Gemini Pro Vision model (
gemini-pro-vision
): Multimodal model that supports adding image and video prompts for a text response.
Gemini API page is a great resource on learning how to make the right HTTP requests with the right parameters.
For example, to send a multi-modal (text + image) request to Gemini, you’d make
an HTTP POST to gemini-pro-vision
model with the right parameters in the
request body:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-pro-vision:streamGenerateContent
{
"contents": {
"role": "user",
"parts": [
{
"fileData": {
"mimeType": "image/png",
"fileUri": "gs://cloud-samples-data/ai-platform/flowers/daisy/10559679065_50d2b16f6d.jpg"
}
},
{
"text": "Describe this picture."
}
]
},
"safety_settings": {
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_LOW_AND_ABOVE"
},
"generation_config": {
"temperature": 0.4,
"topP": 1.0,
"topK": 32,
"maxOutputTokens": 2048
}
}
A couple of things to watch out for:
- You need to get and set an authentication token with your request.
- The responses come in batches (see sample
responses),
so you need to extract
text
from each batch and combine them to get the full text.
Now, let’s take a look at how to make these requests from actual code.
Gemini REST API from C#
Let’s say we want Gemini to describe an image in Cloud Storage for us from my default language: C#.
Define prompt and image
Let’s define the prompt and the image stored on Cloud Storage:
string text = "Describe this image";
string imageUrl = "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg";
Console.WriteLine($"Text: {text}");
Console.WriteLine($"ImageUrl: {imageUrl}");
It’s an image of a cat 🙂
Construct the request payload
Construct the right request payload with prompt and image url:
private static string GeneratePayload(string text, string imageUrl)
{
var payload = new
{
contents = new
{
role = "USER",
parts = new object[] {
new {text = text},
new {file_data = new {
mime_type = "image/png",
file_uri = imageUrl
}
}
}
},
generation_config = new
{
temperature = 0.4,
top_p = 1,
top_k = 32,
max_output_tokens = 2048
}
};
return JsonConvert.SerializeObject(payload);
}
Send the request with auth token
Get an authentication token and send the HTTP request:
private async static Task<string> SendRequest(string payload)
{
GoogleCredential credential = GoogleCredential.GetApplicationDefault();
var handler = credential.ToDelegatingHandler(new HttpClientHandler());
using HttpClient httpClient = new(handler);
httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
HttpResponseMessage response = await httpClient.PostAsync(EndpointUrl,
new StringContent(payload, Encoding.UTF8, "application/json"));
response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync();
}
Parse the response
Receive the HTTP response from Gemini and deserialize the JSON response body.
JSON response body will have a list of candidates to parse through (see
response
body).
It makes sense to create a
GeminiResponse.cs
class to capture this to make our lives easier in JSON deserialization:
string payload = GeneratePayload(text, imageUrl);
string response = await SendRequest(payload);
var geminiResponses = JsonConvert.DeserializeObject<List<GeminiResponse>>(response);
Finally, use some LINQ magic to combine the text in each batch into a final text:
string fullText = string.Join("", geminiResponses
.SelectMany(response => response.Candidates)
.SelectMany(candidates => candidates.Content.Parts)
.Select(part => part.Text));
Console.WriteLine($"Response: {fullText}");
You can see the full sample in my GitHub repo in GenerateTextFromImageGcs.cs.
Run the sample
Run the sample:
dotnet run
Text: Describe this image
ImageUrl: gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg
Response: A cat is walking in the snow. The cat is gray and white, and it has a long tail. The cat is looking at the camera. The snow is white and it is covering the ground.
That’s a nice and pretty good description of the image from Gemini!
Gemini REST API from Rust
Of course, you can use the REST API from any language. My colleague from the Chrome DevRel team, André Bandarra, rewrote my sample in Rust. It follows the same pattern of getting an auth token, generating the request with the right parameters and combining the text from the response:
let authentication_manager = AuthenticationManager::new().await?;
let scopes = &["https://www.googleapis.com/auth/cloud-platform"];
let token = authentication_manager.get_token(scopes).await?;
let prompt = "Describe this image";
let image_url = "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg";
let payload = GenerateContentRequest {
contents: vec![Content {
role: "user".to_string(),
parts: vec![
Part::Text(prompt.to_string()),
Part::FileData {
mime_type: "image/jpeg".to_string(),
file_uri: image_url.to_string(),
},
],
}],
generation_config: Some(GenerationConfig {
max_output_tokens: Some(2048),
temperature: Some(0.4),
top_p: Some(1.0),
top_k: Some(32),
..Default::default()
}),
tools: None,
};
let resp = reqwest::Client::new()
.post(&endpoint_url)
.bearer_auth(token.as_str())
.json(&payload)
.send()
.await?;
let response = resp.json::<GenerateContentResponse>().await?;
response.0.iter().for_each(|chunk| {
chunk.candidates.iter().for_each(|candidate| {
candidate.content.parts.iter().for_each(|part| {
if let Part::Text(text) = part {
print!("{}", text);
}
});
});
});
You can check out the full sample in his repo in generate-text-from-image-gcs.rs.
Summary
Admittedly, using the Gemini REST API is not easy as it can be as other languages that have Gemini SDK support. However, with a little bit of work in making the right request and parsing the response, it’s straightforward to talk to Gemini with the REST API from any other language. To see more Gemini samples, you can check out our repos on GitHub:
As always, if you have any questions or feedback, feel free to reach out to me on Twitter @meteatamel.