Tutorial — Autonomous Agents

Llama 3 via Groq API

Groq API serves the new Llama 3 model at incredible fast token per second speed.

Teemu Maatta
4 min readApr 21, 2024
Photo by 2 Bull Photography on Unsplash

Introduction

Llama 3 is a new State-of-the-Art model from Meta AI, which equals in performance Claude 3 and GPT-4 Turbo.

Nvidia’s team reported comparison between Claude 3 Opus, GPT-4 Turbo and Llama 3 70B. Source [2]

In this tutorial, we will build an “internal memory”-module for Autonomous Agents with Llama 3.

I will use Groq API in this tutorial for inference, because:

  • Fastest inference speed
  • Free tier
  • Offers Mistral competing models within the same API documentation.

We could alternatively run Llama 3:

  • Run locally a smaller 7B model on Nvidia 3060 12GB for $300,
  • Rent cloud GPUs starting from $0.2 per hour for cheapest GPUs.
  • Use alternative cloud service providers such as AWS/GCP/Azure.

Llama 3

I will start by importing libraries.

!pip install groq ¤ first time only
import os
import json
from datetime import datetime
from groq import Groq # Official Groq API Python package

Let’s import the API key from the “environmental variable”.

groq = Groq(
api_key=os.environ.get("groq_key"),
)

I will next define variables, which I want to associate with the “internal memory” of the autonomous agent:

now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
persona = "Teemu"
home_location = "Madrid"

I have as well defined here the memory structure, which I want the Llama 3 model to follow:

schema = memory_schema = {
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "The current date (YYYY-MM-DD HH-MM-SS format)"
},
"me": {
"type": "array",
"description": "My name"
},
"people": {
"type": "array",
"description": "List of people involved in the event (optional)"
},
"feeling": {
"type": "string",
"description": "The main character's feeling during the event"
},
"short_description": {
"type": "string",
"description": "A brief description of the event"
},
"weather": {
"type": "string",
"description": "Current weather conditions (e.g., sunny, rainy, cloudy)"
},
"location": {
"type": "string",
"description": "Location name (e.g., city, town)"
},
"insight": {
"type": "string",
"description": "Additional details or insights about the event"
},
"memorable_because": {
"type": "string",
"description": "The reason why the event is memorable"
}
}
}

I can now save this JSON-schema:

with open("my_schema.json", "w") as f:
json.dump(schema, f)

I can as well import this JSON-schema:

with open("my_schema.json", "r") as f:
my_schema = json.load(f)

I will now define the user prompt:

prompt_by_user = "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef."

I will next make the API call, where I have included the current time, location and persona into the system prompt and I force the Llama 3 to respond as JSON-object.

chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": f"You are helpful memory recorder.\nWrite outputs in JSON in schema: {my_schema}.\nCurrent time is {now}.\nI am {persona} living in {home_location} and events may take place in more specific places inside the home location or outside it, so record precisely.\n",
#"content": "You are helpful memory recorder. Write outputs in JSON schema.\n",
#f" The JSON object must use the schema: {json.dumps(my_schema.model_json_schema(), indent=1)}",
},
{
"role": "user",
"content": "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef.",
}
],
model="llama3-70b-8192",
response_format={"type": "json_object"},
)

I can finally print the output:

print(chat_completion.choices[0].message.content)

The result will look the following. The response integrates directly within the Llama 3 model output the current time, the persona and it fills the attributes as if the person himself experienced them.

Llama 3 via Groq API: JSON-schema response provided by the LLM.

The location-attribute is particularly interesting. The location of the persona is Madrid, but since the event takes place in a restaurant called “Sushita Cafe”, it integrates the pieces of information into unified location-attribute without any third party libraries in the middle.

Let’s wrap up the tutorial.

Conclusions

We have now built a a memory module for Autonomous Agents using Llama 3 70B model.

  • State-of-the-Art (SOTA) LLM
  • Supports JSON-mode
  • Integrates retrieved information directly into JSON-schema.
  • Inferenced speed is fast with Groq API

The best part is, that Meta AI will later this year release even larger Llama 4/5 versions with 405B parameters, which initial evaluations indicate it will be SOTA-level, potentially even beyond upcoming GPT-5 model.

References

[1] LLMs. Teemu Maatta. https://github.com/tmgthb/LLMs.

[2] Jim Fan. https://twitter.com/DrJimFan/status/1781006672452038756

--

--

Teemu Maatta
Teemu Maatta

Written by Teemu Maatta

Author (+200k views) in Artificial General Intelligence. Autonomous Agents. Robotics. Madrid.