Documentation › Captions and transcripts › Auto captions

Auto captions

How automatic captions are produced and where they appear.

Last updated: 2026-05-02

Every uploaded video runs through automatic speech-to-text after transcoding finishes. The result is a caption track attached to the video. The captions appear on the player and feed the transcript view.

What gets a caption

Spoken dialogue in any of the supported speech languages produces captions. Music, ambient noise, and silence do not. Non-speech audio cues (such as [door slams]) are not added automatically; you can add them by editing the captions.

Quality

Auto captions are good but not perfect. Accuracy varies with audio quality, accents, technical vocabulary, and background noise. Plan to edit captions before publishing if accuracy matters (for accessibility, for legal compliance, or because you do not want a typo on screen). See Edit captions.

Where captions appear

Surface	Captions visible	Toggle
Share page	Yes, when toggled on	Player CC button
Embed	Yes, when toggled on	Player CC button
Transcript view	Yes, full text	Always available
Download	Yes, as a WebVTT file	Dashboard caption panel

Languages

The automatic transcription detects the dominant spoken language. Mixed-language videos receive captions in whichever language dominates and may need manual cleanup for the other.

Why captions matter

Captions are not only an accessibility feature. They make videos searchable, watchable in sound-off contexts (mobile, public spaces, offices), and easier to consume for non-native speakers. See Captions as accessibility for the accessibility framing.