Every uploaded video runs through automatic speech-to-text after transcoding finishes. The result is a caption track attached to the video. The captions appear on the player and feed the transcript view.
What gets a caption
Spoken dialogue in any of the supported speech languages produces captions. Music, ambient noise, and silence do not. Non-speech audio cues (such as [door slams]) are not added automatically; you can add them by editing the captions.
Quality
Auto captions are good but not perfect. Accuracy varies with audio quality, accents, technical vocabulary, and background noise. Plan to edit captions before publishing if accuracy matters (for accessibility, for legal compliance, or because you do not want a typo on screen). See Edit captions.
Where captions appear
| Surface | Captions visible | Toggle |
|---|---|---|
| Share page | Yes, when toggled on | Player CC button |
| Embed | Yes, when toggled on | Player CC button |
| Transcript view | Yes, full text | Always available |
| Download | Yes, as a WebVTT file | Dashboard caption panel |
Languages
The automatic transcription detects the dominant spoken language. Mixed-language videos receive captions in whichever language dominates and may need manual cleanup for the other.
Why captions matter
Captions are not only an accessibility feature. They make videos searchable, watchable in sound-off contexts (mobile, public spaces, offices), and easier to consume for non-native speakers. See Captions as accessibility for the accessibility framing.