Add Captions to Instagram Videos
Caption any Instagram video right in your browser. The tool above transcribes your audio with an on-device AI model, times every word as it's spoken, and lets you style the result for a 4:5 feed post before you export. No sign-up, no watermark, and the video never leaves your device.
Captions built for the Instagram feed
Instagram's standard in-feed video runs in a 4:5 frame at 1080 x 1350, and a clip can stretch to a full sixty minutes. That tall-but-not-vertical shape changes where captions belong. There's real vertical room, but the bottom of the frame is where Instagram stacks the post caption, the like and comment row, and the account handle. Captions parked too low get clipped or crowded. The tool lets you set position to top, center, or bottom and nudge it pixel by pixel, so your words sit in the safe middle band where nothing covers them.
Because the same file might also go to a Story or a cross-post, you can reposition and re-export without re-transcribing. The transcript is done once; placement is just a setting you change.
Why a caption in the first frame decides the scroll
Feed videos start playing the moment they slide into view, and they start silent. Most people are thumbing past with the phone on mute, so for that first second your video is a moving image with no sound. If the opening frame already shows a line of text that says what this is about, you give someone a reason to stop. If it's a muted talking head with no words on screen, the thumb keeps moving.
That's the job of an Instagram caption: carry the message before anyone decides to tap for audio. Captioning here isn't an accessibility checkbox you add at the end. It's the hook that earns the watch. Open with a caption that frames the payoff, and you hold the viewer long enough for the rest to land.
Word-level timing from an AI model that knows ~99 languages
The model listens to your audio and writes the transcript with word-level timestamps, so each word appears exactly as it's said rather than a block of text dropped on screen. It auto-detects the spoken language across roughly ninety-nine of them. If it guesses wrong, or you want the captions in a different language entirely, you override the language and regenerate.
Nothing is locked. Every word in the transcript is editable. Fix a name, a piece of slang, or a brand spelling by typing over it, and the timing stays intact. Before you export, you can fine-tune any individual line's in and out points so fast speech or a deliberate pause reads cleanly.
Your footage stays on your device, start to finish
This runs entirely in the browser. The AI model downloads to your machine once, then transcribes locally. Your video is never uploaded, never sent to a server, and never used to train anything. There's no account to create and no file leaving your laptop or phone.
That matters for Instagram work specifically. Unreleased product shots, client edits, or a personal clip you haven't posted yet stay private because there's nowhere for them to go. The privacy isn't a line in a policy page; it's how the tool is built.
Four caption styles that read clean at feed scale
You get four animated styles: Karaoke, where each word highlights as it's spoken; Highlighted, which boxes the active word in a color; Minimal, one clean word at a time; and Dynamic, one word with a subtle pop. On a 4:5 post viewed mid-scroll, the one-word styles keep text large and instantly legible, while Karaoke suits longer talking pieces where viewers follow along.
Everything is tunable to match your grid. Pick a typeface — Inter, Montserrat, Oswald, Lora, or JetBrains Mono — set the weight and size, choose text and highlight colors, and add an outline or shadow so captions stay readable over bright or busy footage. Set words-per-line so nothing runs wider than the 4:5 frame.
Export burned-in MP4 at 4:5, or download SRT and VTT
When the captions look right, export a hardware-accelerated MP4 with the text burned in. Pick optimized-for-sharing to keep the upload light, or source quality to preserve the original. Burned-in captions are the reliable path for Instagram, because they show no matter how the video is viewed and survive every re-share. Keep the frame at 4:5 (1080 x 1350) and your captions land exactly where you placed them.
Prefer a separate file? Download a clean .srt or .vtt to upload alongside your video or to reuse on another platform. Whichever route you take, there's no watermark and no paywall. Every style, language, and export is free for everyone.
Questions
For in-feed video, burning them in is the most reliable choice. Burned-in captions appear no matter how someone watches and survive re-shares to Stories or other accounts. Export a 4:5 MP4 with the text baked in, or download an SRT or VTT if you'd rather keep a separate subtitle file.
Big enough to read on a phone mid-scroll. On a 4:5 (1080 x 1350) post, scale the text so a one-word style fills a comfortable portion of the width, and keep words-per-line low so nothing wraps off-frame. You can adjust size, weight, and position live before exporting.
Aim for the center band or just above the lower third. Instagram's own post caption, the handle, and the action buttons sit along the bottom, so words placed too low get covered. Set position to top, center, or bottom and nudge it until your captions clear the interface.
Yes. The AI auto-detects the spoken language across roughly ninety-nine of them, and you can override it or regenerate the captions in a different language. Every word stays editable, so you can correct names or spellings before you export.
No. Every style, language, and export is free for everyone, with no sign-up and no watermark on your video. Free is a deliberate choice, not a trial. Because the tool runs in your browser, your footage stays on your device the entire time.