Claude 3 offers cost-efficient tool in SOTA level.

3 min readApr 4, 2024
Claude 3 tool use is priced in the following way. The API call is priced exactly the same as a normal API call, but an additional “system prompt tokens” are added on top:

  • Claude 3 Opus: 395 tokens
  • Claude 3 Sonnet: 159 tokens
  • Claude 3 Haiku 264 tokens.

All APIs consume additional tokens when tools are used. These extra tokens include tool parameters and tool content blocks.

Claude 3 offers compared to OpenAI equal and arguably better performance in terms of quality of the response, speed and price in non-tool use cases for LLMs and VLMs.

Based on today’s general release — Claude 3 matches with GPT-4 in the tool use performance as well.

So, let’s get started.

Tool use

Let’s import the libraries

#!pip install anthropic #first time only
import anthropic
import os
import base64
import httpx

Import the API key from the environmental variable. Start the client object.

key = "anthropic_key"
client = anthropic.Anthropic(api_key=os.getenv(key))

We are now ready to make API calls.

response =
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""
"required": ["location"]
messages=[{"role": "user", "content": "What is the weather like in San Francisco?"}]

I can now read the API initial response.


The initial response includes “stop_reason”: “tool_use”, which refers the API is waiting to receive back from the user side the tool result.

Claude 3 API response: initial response

In essence, we have so far received user request, Claude API has converted this into response, which defines a need for using a tool. So, let’s define a tool response, which we could receive back from a weather-tool API:

"role": "user",
"content": [
"type": "tool_result",
"tool_use_id": "toolu_xxxxxxxxxxxxxxxx",
"content": "65 degrees"

So, I can now send the weather tool-API result back to Claude API, so it can generate a response back to the end user. The API call is the same, except we add two additional lines of “messages”:

  • Claude API previous response with “assistant”-role
  • The weather-tool API response as a “user”-role and the “tool_use_id” of the previous Claude API call.

So, the final API call looks the following

response_final =
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The unit of temperature, either \"celsius\" or \"fahrenheit\""
"required": ["location"]
messages=[{"role": "user", "content": "What is the weather like in San Francisco?"},
{"role": "assistant", "content": response.content},
"role": "user",
"content": [
"type": "tool_result",
"content": [{"type": "text", "text": "65 degrees"}]

The result is the final response:

Claude 3 API: Final response

We have now responded to the end user.


In total, the entire flow consumed using the Claude 3 Haiku model, in total, which I think is very efficient usage of tokens:

  • 891 input tokens
  • 106 output tokens

I think this is great news, because so far only GPT-4 offered sufficient level performance in tool use.

Claude 3 offers now high-quality tool use in affordable price.


