Tutorial

Whisper API — Speech to text

Teemu Maatta
4 min readMar 1, 2023

OpenAI released new Whisper API for Speech to text. This solves the issue with the already released open source model by making run fast and without need of GPU.

Photo by Malte Helmhold on Unsplash

Introduction

Today is busy day in OpenAI¹. The company not only released ChatGPT API, but as well Whisper API². OpenAI released Whisper model earlier as an open source model.

OpenAI offers now the model access directly via API call. The API has two endpoints:

  • Transcribe audio into text
  • Translate audio into text English

The API is very convenient and necessary, because the model takes time to run on a laptop. The speed is crucial on practical applications. For example it takes few minutes to run a longer audio file with Whisper model using GPU. However, it can take an hour to run the same audio with CPU. The model becomes truly useful for most real life applicatiosn, when the model can be used in matter of milliseconds.

So, how quick is Whisper API?

  • 5 seconds of voice takes 1.07-1.59 seconds to run.
  • 15 minutes of voice takes 41 seconds to run.

The speed is impressive. These results are rough estimates, but I find it impressive to use a voice API, which responds…

--

--

Teemu Maatta

Author (+200k views) in Artificial General Intelligence. Autonomous Agents. Robotics. Madrid.