Add captions and subtitles to YouTube Shorts
Drop a clip into the tool above and it transcribes your speech, times every word, and lays animated captions over your vertical video — ready to burn into a 9:16 MP4 or export as a subtitle file. Everything runs in your browser, so your footage never leaves your device. Free, no watermark, no account.
Captions built for the way Shorts are watched
YouTube Shorts play in a full-screen 9:16 feed at 1080 × 1920, and viewers move through them one thumb-swipe at a time. No description shows while a Short plays, and the swipe feed has no room for an external subtitle track — if you want words on screen, they have to be part of the frame. The tool above does exactly that: it reads the audio from your clip, generates the captions, and positions them inside the vertical safe area so nothing collides with the title, the like and comment rail, or the channel handle along the bottom edge.
Because a Short can run up to 60 seconds of tightly edited talking, the captions stay locked to your cut. Preview the whole thing in the player, scrub to any moment, and check that the words land on the right beat before you export. What you see in the preview is what burns into the file.
Why captions decide whether a Short gets watched
Shorts move fast, and many start playing with the sound off — someone scrolling in a quiet room, on a commute, or late at night won't unmute a clip they haven't committed to yet. The first two or three seconds are where most viewers decide to keep watching or swipe away, and silent video gives them nothing to hold onto. On-screen captions carry your hook through that muted opening, so the point lands before sound is ever a factor.
Captions keep working once the audio is on. Fast talking, background music, accents, and small phone speakers all eat clarity, and a word readers can see is a word they don't have to strain to catch. Readable captions are one of the simplest ways to hold attention through to the end.
Word-perfect timing from on-device AI
Transcription runs on an AI speech model that works right in the browser. It auto-detects the spoken language across roughly 99 languages and returns a transcript with word-level timing — each word carries its own start and end, not just a rough caption block. That precision is what lets the karaoke and pop styles light up on the exact syllable being spoken, the look the Shorts feed has trained people to expect.
Nothing about the transcript is locked. Every word is editable, so you can fix a name, a piece of slang, or a term the model misheard by typing over it. If language detection picks the wrong one, override it — and you can re-generate the whole caption track in another language to ship a second-language version of the same Short. The model downloads to your browser once, then runs locally on every clip after that.
Your footage stays on your device
Privacy here is literal. The entire process — reading the audio, transcribing, styling, and rendering the final video — happens inside your browser on your own machine. Your clip is never uploaded, no server receives it, and nothing about it is used to train any AI. An unreleased Short, a client edit, or anything you'd rather not hand to a cloud service is safe to caption here, because there is no cloud in the loop.
That also means no sign-up wall and no queue. You don't create an account or wait for a server to process your file — you open the page, drop the clip, and work. Every language, every caption style, and every export option is free for everyone, with no paywall at the export step.
Caption styles that fit the vertical format
There are four animated styles. Karaoke highlights each word as it's spoken; Highlighted drops the active word into a colored box; Minimal shows one clean word at a time; and Dynamic shows a single word with a small pop on entry. The one-word styles read especially well on a phone-height 9:16 frame, where a single large word fills the column and never wraps off the side of the screen.
From there, tune the look to match your channel. Pick a typeface — Inter, Montserrat, Oswald, Lora, or JetBrains Mono — set its weight and size, and place the caption at the top, center, or bottom with a nudge to clear the Shorts interface. Set your text and highlight colors, add an outline and shadow so the words stay legible over busy or bright video, and cap the words per line so a caption never crowds the vertical frame.
Export at the right 9:16 size, or as a subtitle file
When the timing and styling look right, you export. Burning the captions in writes them permanently into a 1080 × 1920 MP4 that keeps your Short's native 9:16 shape — the safest route for the Shorts feed, since the words travel with the frame wherever the clip ends up. Rendering is hardware-accelerated, and you choose between a file optimized for sharing and one that preserves source quality. There is no watermark on the result.
If you'd rather upload a separate subtitle track to YouTube, export the captions as .srt or .vtt instead and attach them to the video. Either way, you can fine-tune any individual line's timing before you export, so a caption that's a beat early or late gets nudged into place rather than re-recorded. Burn it in, download the subtitle file, or do both.
Questions
Not strictly, but it's the safer choice. Burned-in captions are part of the 9:16 frame, so they show whether or not a viewer turns on the YouTube caption track — which matters in a fast, often-muted feed. You can burn them into the MP4 here, or export an .srt/.vtt file to upload as a separate track. Many creators do both.
Keep the text large enough to read on a phone — a heavier weight at a generous size works best on the tall 9:16 frame. Place captions in the center or upper-middle and nudge them clear of the bottom interface, where the title, channel handle, and the like and comment rail sit. The tool lets you set size, position, and an outline so the words stay legible over any background.
Yes. The AI auto-detects the spoken language across roughly 99 languages and times every word. You can override the detected language, edit any word in the transcript, or re-generate the captions in a different language to publish a second-language version of the same Short.
It's free, by choice — every caption style, every language, and every export option is available to everyone, with no account and no paywall at export. There is no watermark on the file you download.
No. Everything runs in your browser on your own device — transcription, styling, and the final render. Your clip is never uploaded, no server receives it, and it's never used to train any AI. The AI model downloads once, then works locally on every clip.