Gemini-1.5 Flash — fast inferencing

Google announced today release of new Gemini-1.5 Flash model with impressively low latency. This tutorial shows step-by-step: How to get started?

2 min readMay 14, 2024

Photo by Johann Walter Bantz on Unsplash

Introduction

Google released today at the I/O conference, the new Gemini-1.5 Flash model. It is meant for high-volume “Agentic” usage, which require:

Vision perception
Low latency: 1–2 seconds
Affordable tokens
Support large context windows

I think Gemini-1.5 Flash offers a working solution for lot of developers. The Gemini API key can be generated via free / paid plan in the Google AI Studio¹.

Let’s get started!

Gemini-1.5 Flash

Let’s start by importing the required Python packages.

I will then continue by loading the Gemini API key, saved as an environmental variable in Windows. In case you never saved environmental variable in Windows/Mac, then Google it. It takes 1-minute to set up and 1–2 minutes to restart the laptop, so the change takes effect.

#!pip install google-generativeai #run once
import os
import google.generativeai as genai

genai.configure(api_key=os.environ["gemini_key"]) # to import the api key

Let’s type the prompt we will use:

prompt = "Tell me a joke"
system_prompt = "Tell funny jokes"

We will define additional API call parameters, which most important one is the max_output_tokens. These parameters are explained here².

generation_config = {
  "max_output_tokens": 400,
  "response_mime_type": "text/plain",
}

We can now make the API call using these parameters, the previously defined API key and our prompt:

model = genai.GenerativeModel(
  model_name="gemini-1.5-flash-latest",
  generation_config=generation_config,
  system_instruction=system_prompt,
)

chat_session = model.start_chat(
  history=[
  ]
)

response = chat_session.send_message(prompt)

The API has now received our API call and we have received it back. So, we can simply print the output.

print(response.text)

The output will look similar as below.

Gemini-1.5 Flash: output.

Let’s wrap up this tutorial!

Conclusions

We have now used the Gemini-1.5. Flash via Google AI Studio.

In my tests, I found the Gemini-1.5. Flash one of the fastest APIs available in the market. For example I received multiple API calls responses, which only Groq has been able to offer.

Therefore, Gemini-1.5. Flash is an excellent VLM model for agentic workflows.

References

[1] Gemini model reference. Google. https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini.

[2] Google AI Studio. Google. https://aistudio.google.com/app/.