whisper
Version: 001
The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (automatic speech recognition) as well as translated into English (speech translation). Researchers at OpenAI developed the models to study the robustness of speech processing systems trained under large-scale weak supervision. The model version 001 corresponds to whisper large v2.
Max request data size: 25mb of audio can be converted from speech to text per API request.
Model Specifications
Last UpdatedMarch 2025
PublisherOpenAI