DocumentationCaptions and transcripts › Auto captions

Auto captions

How automatic captions are produced and where they appear.

Last updated: 2026-05-02

Every uploaded video runs through automatic speech-to-text after transcoding finishes. The result is a caption track attached to the video. The captions appear on the player and feed the transcript view.

What gets a caption

Spoken dialogue in any of the supported speech languages produces captions. Music, ambient noise, and silence do not. Non-speech audio cues (such as [door slams]) are not added automatically; you can add them by editing the captions.

Quality

Auto captions are good but not perfect. Accuracy varies with audio quality, accents, technical vocabulary, and background noise. Plan to edit captions before publishing if accuracy matters (for accessibility, for legal compliance, or because you do not want a typo on screen). See Edit captions.

Where captions appear

Surface Captions visible Toggle
Share page Yes, when toggled on Player CC button
Embed Yes, when toggled on Player CC button
Transcript view Yes, full text Always available
Download Yes, as a WebVTT file Dashboard caption panel

Languages

The automatic transcription detects the dominant spoken language. Mixed-language videos receive captions in whichever language dominates and may need manual cleanup for the other.

Why captions matter

Captions are not only an accessibility feature. They make videos searchable, watchable in sound-off contexts (mobile, public spaces, offices), and easier to consume for non-native speakers. See Captions as accessibility for the accessibility framing.

Related