Gemini-1.5 Flash — fast inferencing
Google announced today release of new Gemini-1.5 Flash model with impressively low latency. This tutorial shows step-by-step: How to get started?
Introduction
Google released today at the I/O conference, the new Gemini-1.5 Flash model. It is meant for high-volume “Agentic” usage, which require:
- Vision perception
- Low latency: 1–2 seconds
- Affordable tokens
- Support large context windows
I think Gemini-1.5 Flash offers a working solution for lot of developers. The Gemini API key can be generated via free / paid plan in the Google AI Studio¹.
Let’s get started!
Gemini-1.5 Flash
Let’s start by importing the required Python packages.
I will then continue by loading the Gemini API key, saved as an environmental variable in Windows. In case you never saved environmental variable in Windows/Mac, then Google it. It takes 1-minute to set up and 1–2 minutes to restart the laptop, so the change takes effect.
#!pip install google-generativeai #run once
import os
import google.generativeai as genai
genai.configure(api_key=os.environ["gemini_key"]) # to import the api key
Let’s type the prompt we will use:
prompt = "Tell me a joke"
system_prompt = "Tell funny jokes"
We will define additional API call parameters, which most important one is the max_output_tokens. These parameters are explained here².
generation_config = {
"max_output_tokens": 400,
"response_mime_type": "text/plain",
}
We can now make the API call using these parameters, the previously defined API key and our prompt:
model = genai.GenerativeModel(
model_name="gemini-1.5-flash-latest",
generation_config=generation_config,
system_instruction=system_prompt,
)
chat_session = model.start_chat(
history=[
]
)
response = chat_session.send_message(prompt)
The API has now received our API call and we have received it back. So, we can simply print the output.
print(response.text)
The output will look similar as below.
Let’s wrap up this tutorial!
Conclusions
We have now used the Gemini-1.5. Flash via Google AI Studio.
In my tests, I found the Gemini-1.5. Flash one of the fastest APIs available in the market. For example I received multiple API calls responses, which only Groq has been able to offer.
Therefore, Gemini-1.5. Flash is an excellent VLM model for agentic workflows.
References
[1] Gemini model reference. Google. https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini.
[2] Google AI Studio. Google. https://aistudio.google.com/app/.