Tutorial

Whisper API — Speech to text

4 min readMar 1, 2023

OpenAI released new Whisper API for Speech to text. This solves the issue with the already released open source model by making run fast and without need of GPU.

Introduction

Today is busy day in OpenAI¹. The company not only released ChatGPT API, but as well Whisper API². OpenAI released Whisper model earlier as an open source model.

OpenAI offers now the model access directly via API call. The API has two endpoints:

Transcribe audio into text
Translate audio into text English

The API is very convenient and necessary, because the model takes time to run on a laptop. The speed is crucial on practical applications. For example it takes few minutes to run a longer audio file with Whisper model using GPU. However, it can take an hour to run the same audio with CPU. The model becomes truly useful for most real life applicatiosn, when the model can be used in matter of milliseconds.

So, how quick is Whisper API?

5 seconds of voice takes 1.07-1.59 seconds to run.
15 minutes of voice takes 41 seconds to run.

The speed is impressive. These results are rough estimates, but I find it impressive to use a voice API, which responds…

Tutorial

Whisper API — Speech to text

Introduction

Written by Teemu Maatta