OpenAI released new Whisper API for Speech to text. This solves the issue with the already released open source model by making run fast and without need of GPU.
Today is busy day in OpenAI¹. The company not only released ChatGPT API, but as well Whisper API². OpenAI released Whisper model earlier as an open source model.
OpenAI offers now the model access directly via API call. The API has two endpoints:
- Transcribe audio into text
- Translate audio into text English
The API is very convenient and necessary, because the model takes time to run on a laptop. The speed is crucial on practical applications. For example it takes few minutes to run a longer audio file with Whisper model using GPU. However, it can take an hour to run the same audio with CPU. The model becomes truly useful for most real life applicatiosn, when the model can be used in matter of milliseconds.
So, how quick is Whisper API?
- 5 seconds of voice takes 1.07-1.59 seconds to run.
- 15 minutes of voice takes 41 seconds to run.
The speed is impressive. These results are rough estimates, but I find it impressive to use a voice API, which responds with speeds used to seen with web applications.
How much it costs?
- $0.006 /minute.
I believe Whisper API will be used especially for voice search. In such use case — most queries would be less than a minute and with price of 0.006$.
Whisper API supports the following audio formats:
- m4a, mp3, mp4, mpeg, mpga, wav and webm.
So, let’s get started with Whisper API!
I start by importing the libraries. I remind, that you may need to update your local openai library.
#!pip install --upgrade openai #make sure to use the latest version
openai.api_key = os.getenv("OPENAI_API_KEY") # Make sure to add environmental variables to work
import time #remove in case do not want to count time