Tutorial — Autonomous Agents

Llama 3 via Groq API

Groq API serves the new Llama 3 model at incredible fast token per second speed.

Teemu Maatta

4 min readApr 21, 2024

Introduction

Llama 3 is a new State-of-the-Art model from Meta AI, which equals in performance Claude 3 and GPT-4 Turbo.

Nvidia’s team reported comparison between Claude 3 Opus, GPT-4 Turbo and Llama 3 70B. Source [2]

In this tutorial, we will build an “internal memory”-module for Autonomous Agents with Llama 3.

I will use Groq API in this tutorial for inference, because:

Fastest inference speed
Free tier
Offers Mistral competing models within the same API documentation.

We could alternatively run Llama 3:

Run locally a smaller 7B model on Nvidia 3060 12GB for $300,
Rent cloud GPUs starting from $0.2 per hour for cheapest GPUs.
Use alternative cloud service providers such as AWS/GCP/Azure.

Llama 3

I will start by importing libraries.

!pip install groq ¤ first time only
import os
import json
from datetime import datetime
from groq import Groq # Official Groq API Python package

Let’s import the API key from the “environmental variable”.

groq = Groq(
    api_key=os.environ.get("groq_key"),
)

I will next define variables, which I want to associate with the “internal memory” of the autonomous agent:

now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
persona = "Teemu"
home_location = "Madrid"

I have as well defined here the memory structure, which I want the Llama 3 model to follow:

schema = memory_schema = {
  "type": "object",
  "properties": {
    "date": {
      "type": "string",
      "description": "The current date (YYYY-MM-DD HH-MM-SS format)"
    },
    "me": {
      "type": "array",
      "description": "My name"
    },
    "people": {
      "type": "array",
      "description": "List of people involved in the event (optional)"
    },
    "feeling": {
      "type": "string",
      "description": "The main character's feeling during the event"
    },
    "short_description": {
      "type": "string",
      "description": "A brief description of the event"
    },
    "weather": {
      "type": "string",
      "description": "Current weather conditions (e.g., sunny, rainy, cloudy)"
    },
    "location": {
      "type": "string",
      "description": "Location name (e.g., city, town)"
    },
    "insight": {
      "type": "string",
      "description": "Additional details or insights about the event"
    },
    "memorable_because": {
      "type": "string",
      "description": "The reason why the event is memorable"
    }
  }
}

I can now save this JSON-schema:

with open("my_schema.json", "w") as f:
  json.dump(schema, f)

I can as well import this JSON-schema:

with open("my_schema.json", "r") as f:
  my_schema = json.load(f)

I will now define the user prompt:

prompt_by_user = "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef."

I will next make the API call, where I have included the current time, location and persona into the system prompt and I force the Llama 3 to respond as JSON-object.

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": f"You are helpful memory recorder.\nWrite outputs in JSON in schema: {my_schema}.\nCurrent time is {now}.\nI am {persona} living in {home_location} and events may take place in more specific places inside the home location or outside it, so record precisely.\n",
            #"content": "You are helpful memory recorder. Write outputs in JSON schema.\n",
            #f" The JSON object must use the schema: {json.dumps(my_schema.model_json_schema(), indent=1)}",
        },
        {
            "role": "user",
            "content": "Today was sunny day and then rained, I went to city to have a dinner with friends and I ate the best Sushi I have ever tested in restaurant called Sushita Cafe, where my friend Paco is a chef.",
        }
    ],
    model="llama3-70b-8192",
    response_format={"type": "json_object"},
)

I can finally print the output:

print(chat_completion.choices[0].message.content)

The result will look the following. The response integrates directly within the Llama 3 model output the current time, the persona and it fills the attributes as if the person himself experienced them.

Llama 3 via Groq API: JSON-schema response provided by the LLM.

The location-attribute is particularly interesting. The location of the persona is Madrid, but since the event takes place in a restaurant called “Sushita Cafe”, it integrates the pieces of information into unified location-attribute without any third party libraries in the middle.

Let’s wrap up the tutorial.

Conclusions

We have now built a a memory module for Autonomous Agents using Llama 3 70B model.

State-of-the-Art (SOTA) LLM
Supports JSON-mode
Integrates retrieved information directly into JSON-schema.
Inferenced speed is fast with Groq API

The best part is, that Meta AI will later this year release even larger Llama 4/5 versions with 405B parameters, which initial evaluations indicate it will be SOTA-level, potentially even beyond upcoming GPT-5 model.

References

[1] LLMs. Teemu Maatta. https://github.com/tmgthb/LLMs.

[2] Jim Fan. https://twitter.com/DrJimFan/status/1781006672452038756