Autonomous agents perception with Gemini 1.5 Flash

Teemu Maatta
2 min readMay 15, 2024

--

Let’s build faster perception for Autonomous Agents.

Photo by Matthew Wiebe on Unsplash

Introduction

Yesterday, the entire EU got access to Gemini’s 1.5 Flash via Google AI Studio.

In this tutorial, I will demonstrate some of the State-of-the-Art capabilities in Autonomous Agents.

Let’s get started!

Visual perception with Gemini 1.5 Flash

I start by importing the libraries.

#!pip install google-generativeai
import os
import time

import google.generativeai as genai
genai.configure(api_key=os.environ["gemini_key"])

I continue by importing variables:

url = "Haiku_de_L._M._Panero.jpg" #This image is under Wikipedia Commons, available here: https://upload.wikimedia.org/wikipedia/commons/1/12/Haiku_de_L._M._Panero.jpg and available as "Free content-icense": https://commons.wikimedia.org/wiki/Commons:Licensing
prompt = "Give opinion"
system_prompt = "Make an educated opinion"

I define a function recommended by Google, which helps importing correctly to the API the image.

def upload_file(path, mime_type=None):
file = genai.upload_file(path, mime_type=mime_type)
print(f"Uploaded file '{file.display_name}' as: {file.uri}")
return file

I continue with defining the API call.

generation_config = {
"temperature": 1,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
model_name="gemini-1.5-flash-latest",
generation_config=generation_config,
system_instruction=system_prompt,
)

I next upload the image to the Google Cloud, which is stored for free for 48 hours. The file is then deleted automatically. In the last step of the tutorial, I add a command to delete it manually.

image_file = upload_file(url, mime_type="image/jpeg")

After defining the API call, uploading the image to Google Cloud — I can now send the API call:

chat_session = model.start_chat(
history=[
{
"role": "user",
"parts": [
image_file,
],
},
]
)

response = chat_session.send_message(prompt)

I can finally print the output-object.

print(response.text)

This messages will appear the following.

Gemini 1.5 Flash: Vision perception.

File data

Gemini API stores the image files 48 hours for free, which after the files are automatically deleted from the Cloud.

Can I make an API call without having to upload again the same file? The answer is: Yes!

To make the API call, I only need to know the filename now in the Google Cloud.

file = genai.get_file(name=image_file.name)

I can now make the API call again without having to upload the same image-file again.

chat_session = model.start_chat(
history=[
{
"role": "user",
"parts": [
file,
],
},
]
)

response = chat_session.send_message(prompt)

print(response.text)
print(chat_session.history)

This wraps up our tutorial.

Conclusions

Gemini 1.5. Flash offers incredible fast speed.

We have now used Gemini API for Visual understanding together with system and user prompt.

References

[1] AI Studio. Google. 2024. https://aistudio.google.com/app/

--

--

Teemu Maatta

Author (+200k views) in Artificial General Intelligence. Autonomous Agents. Robotics. Madrid.