Tutorial — Autonomous Agents
Llama 3 via Groq API
Groq API serves the new Llama 3 model at incredible fast token per second speed.
Introduction
Llama 3 is a new State-of-the-Art model from Meta AI, which equals in performance Claude 3 and GPT-4 Turbo.
In this tutorial, we will build an “internal memory”-module for Autonomous Agents with Llama 3.
I will use Groq API in this tutorial for inference, because:
- Fastest inference speed
- Free tier
- Offers Mistral competing models within the same API documentation.
We could alternatively run Llama 3:
- Run locally a smaller 7B model on Nvidia 3060 12GB for $300,
- Rent cloud GPUs starting from $0.2 per hour for cheapest GPUs.
- Use alternative cloud service providers such as AWS/GCP/Azure.
Llama 3
I will start by importing libraries.
!pip install groq ¤ first time only
import os
import json
from datetime import datetime
from groq import Groq # Official Groq API Python package
Let’s import the API key from the “environmental variable”.
groq = Groq(
api_key=os.environ.get("groq_key"),
)
I will next define variables, which I want to associate with the “internal memory” of the autonomous agent:
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
persona = "Teemu"
home_location = "Madrid"
I have as well defined here the memory structure, which I want the Llama 3 model to follow:
schema = memory_schema = {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "The current date (YYYY-MM-DD HH-MM-SS format)"
},
"me": {
"type": "array",
"description": "My name"
},
"people": {
"type": "array",
"description": "List of people involved in the event (optional)"
},
"feeling": {
"type": "string",
"description": "The main character's feeling during the event"
},
"short_description": {
"type": "string",
"description": "A brief description of the event"
},
"weather": {
"type": "string",
"description": "Current weather conditions (e.g., sunny, rainy, cloudy)"
},
"location": {
"type": "string",
"description": "Location name (e.g., city, town)"
},
"insight": {
"type": "string",
"description": "Additional details or insights about the event"
},
"memorable_because": {
"type": "string",
"description": "The reason why the event is memorable"
}
}
}
I can now save this JSON-schema:
with open("my_schema.json", "w") as f:
json.dump(schema, f)
I can as well import this JSON-schema:
with open("my_schema.json", "r") as f:
my_schema = json.load(f)
I will now define the user prompt:
prompt_by_user = "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef."
I will next make the API call, where I have included the current time, location and persona into the system prompt and I force the Llama 3 to respond as JSON-object.
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": f"You are helpful memory recorder.\nWrite outputs in JSON in schema: {my_schema}.\nCurrent time is {now}.\nI am {persona} living in {home_location} and events may take place in more specific places inside the home location or outside it, so record precisely.\n",
#"content": "You are helpful memory recorder. Write outputs in JSON schema.\n",
#f" The JSON object must use the schema: {json.dumps(my_schema.model_json_schema(), indent=1)}",
},
{
"role": "user",
"content": "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef.",
}
],
model="llama3-70b-8192",
response_format={"type": "json_object"},
)
I can finally print the output:
print(chat_completion.choices[0].message.content)
The result will look the following. The response integrates directly within the Llama 3 model output the current time, the persona and it fills the attributes as if the person himself experienced them.
The location-attribute is particularly interesting. The location of the persona is Madrid, but since the event takes place in a restaurant called “Sushita Cafe”, it integrates the pieces of information into unified location-attribute without any third party libraries in the middle.
Let’s wrap up the tutorial.
Conclusions
We have now built a a memory module for Autonomous Agents using Llama 3 70B model.
- State-of-the-Art (SOTA) LLM
- Supports JSON-mode
- Integrates retrieved information directly into JSON-schema.
- Inferenced speed is fast with Groq API
The best part is, that Meta AI will later this year release even larger Llama 4/5 versions with 405B parameters, which initial evaluations indicate it will be SOTA-level, potentially even beyond upcoming GPT-5 model.
References
[1] LLMs. Teemu Maatta. https://github.com/tmgthb/LLMs.
[2] Jim Fan. https://twitter.com/DrJimFan/status/1781006672452038756