Word Error Rate (WER)
The standard metric used to measure the accuracy of speech recognition systems.
Word Error Rate (WER) is calculated by adding the number of substitutions, deletions, and insertions made by the AI, divided by the total number of words spoken. A WER of 5% means the system is 95% accurate. Modern AI engines have pushed WER below human parity (roughly 4-5%) for clear audio. However, background noise, thick accents, and medical/legal jargon can heavily inflate WER. CoScript utilizes state-of-the-art acoustic modeling to maintain sub-5% WER even in challenging acoustic environments.
Experience Word Error Rate with CoScript
CoScript processes all transcription natively on your desktop — no cloud audio storage, no meeting bots, no browser tabs. Try free today.
Try CoScript Free →Related Terms
Speech-to-Text (STT)
The process of converting spoken language into written text using AI-powered recognition algorithms.
Voice Recognition
AI technology that identifies and processes human speech patterns to understand spoken words.
Acoustic Model
The component of a speech recognition system that represents the relationship between an audio signal and the phonetic units of speech.