Autonomous agents perception with Gemini 1.5 Flash
Let’s build faster perception for Autonomous Agents.
Introduction
Yesterday, the entire EU got access to Gemini’s 1.5 Flash via Google AI Studio.
In this tutorial, I will demonstrate some of the State-of-the-Art capabilities in Autonomous Agents.
Let’s get started!
Visual perception with Gemini 1.5 Flash
I start by importing the libraries.
#!pip install google-generativeai
import os
import time
import google.generativeai as genai
genai.configure(api_key=os.environ["gemini_key"])
I continue by importing variables:
url = "Haiku_de_L._M._Panero.jpg" #This image is under Wikipedia Commons, available here: https://upload.wikimedia.org/wikipedia/commons/1/12/Haiku_de_L._M._Panero.jpg and available as "Free content-icense": https://commons.wikimedia.org/wiki/Commons:Licensing
prompt = "Give opinion"
system_prompt = "Make an educated opinion"
I define a function recommended by Google, which helps importing correctly to the API the image.
def upload_file(path, mime_type=None):
file = genai.upload_file(path, mime_type=mime_type)
print(f"Uploaded file '{file.display_name}' as: {file.uri}")
return file
I continue with defining the API call.
generation_config = {
"temperature": 1,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
}
model = genai.GenerativeModel(
model_name="gemini-1.5-flash-latest",
generation_config=generation_config,
system_instruction=system_prompt,
)
I next upload the image to the Google Cloud, which is stored for free for 48 hours. The file is then deleted automatically. In the last step of the tutorial, I add a command to delete it manually.
image_file = upload_file(url, mime_type="image/jpeg")
After defining the API call, uploading the image to Google Cloud — I can now send the API call:
chat_session = model.start_chat(
history=[
{
"role": "user",
"parts": [
image_file,
],
},
]
)
response = chat_session.send_message(prompt)
I can finally print the output-object.
print(response.text)
This messages will appear the following.
File data
Gemini API stores the image files 48 hours for free, which after the files are automatically deleted from the Cloud.
Can I make an API call without having to upload again the same file? The answer is: Yes!
To make the API call, I only need to know the filename now in the Google Cloud.
file = genai.get_file(name=image_file.name)
I can now make the API call again without having to upload the same image-file again.
chat_session = model.start_chat(
history=[
{
"role": "user",
"parts": [
file,
],
},
]
)
response = chat_session.send_message(prompt)
print(response.text)
print(chat_session.history)
This wraps up our tutorial.
Conclusions
Gemini 1.5. Flash offers incredible fast speed.
We have now used Gemini API for Visual understanding together with system and user prompt.
References
[1] AI Studio. Google. 2024. https://aistudio.google.com/app/