Closed Captioning REST API: How to Use a Speech Recognition API for Captions

RevBlogResourcesOther ResourcesSpeech-to-Text APIsClosed Captioning REST API: How to Use a Speech Recognition API for Captions is not just for audio data. You can use the speech-to-text APIs to generate captions for live and pre-recorded video too! The process for doing so is much the same as for generating audio transcriptions. In this post, we’ll walk you through the process of getting captions, step by step.

Getting Started

Using the Rev Postman Workspace

For this tutorial, we’ll be using the Postman workspace to make requests to the captioning APIs. You can also follow along using your local Postman installation, or by making requests using Curl or any popular programming language. However, note that the Rev Postman workspace has a number of preconfigured headers and variables which makes getting up and running much easier, so we encourage you to use it as you get a feel for how things work.

When you open up the workspace you should see a page that looks more or less like this.

Because this workspace is public, we need to fork it to our own personal collection before we can start using it to make requests. To do so, click the button that says “Fork” near the upper right. In order to perform the fork, you’ll need to either create a Postman account or sign in to your existing one. Once you’ve forked the workspace, you should be good to go on the Postman side! Asynchronous API

First we’ll take a look at how to use the Async API. This API is used to caption static videos that have already been recorded. In order to use the API you’ll need a account and an access token.

Generating an Access Token

Head on over to and either create an account or login. From your account homepage, you’ll need to click on the link in the sidebar that says “Access Tokens”. Next, you’ll want to generate a new token and copy it to your clipboard. If you do this successfully, you should see a message that looks like this.

Creating an Authorization Header

Now that we have our access token, we can create the authorization header that will allow us to authenticate with the APIs.

First, we’ll need to navigate to the prebuilt request for submitting a streaming caption job. On the left side of your Postman workspace, click through the dropdown menu until you get to Speech to Text -> Transcription Jobs -> Submit Job with Media URL. Go ahead and click on that option. This will populate the Postman request fields with the information you need to make the request.

As in the below image, you should see that the URL has been set to the endpoint for submitting a captioning job. Some of the headers have also been configured such as the content-type.

Next, we’ll want to add our authorization header. Go ahead and navigate to “Authorization” in the request panel. Then under the Type dropdown select “Bearer Token” and paste your access token into the text box on the right as shown below.

Creating the Request

Now that we’ve got the authorization header set up, we can go ahead and create our request! First things first, we need a video to test the async API. We’re going to use this commercial hosted on Google Drive. This leads us to an important point about the API. The link to your video gets submitted to the API via a request parameter called “media_url” which, as the name suggests, points to the url of your media file. However, it’s important to note that you cannot use Youtube URLs. You need to provide a direct download link to the file, such as the Google Drive one used here. The most common direct download links will be on your own web server like AWS S3, Google Cloud, Azure, and similar file hosting services. You can learn how to format DropBox and Google Drive links as direct download links here. You can also use Google’s own Google Drive Direct Download Link Generator here for any Google Drive links. We’ll be using a formatted Google Drive link here.

Go ahead and fill out the body of your captioning request as shown below, then click “Send”. If the request gets submitted successfully, you should see the output shown for “Body” in the “Response” area at the bottom of the page.

Checking the Job Status

Go ahead and give your job about 30 seconds to run. Once you’ve done that, let’s check the status. Copy the “id” returned in the body of your “submit job” request response. Then, in the Postman collection, navigate to Speech to Text -> Transcription Jobs -> Get Job. Paste the id onto the end of the url that’s there in place of the “:jobId” tag and click send. You’ll also need to add your access token as an authorization header as before. The response you get from the API should look something like this.

As you can see, the status of our job has now changed to “transcribed”. This means that everything is ready and that we can now download our captions!

Getting Caption Output

Finally, we’re ready to download our captions. Under the Postman collection, navigate to Speech to Text -> Outputs -> Get Captions. From there, enter your authorization header (access token) as before and then paste your job id into the text box for the :jobId path variable as shown in the below screenshot. Rev supports two outputs for caption formats: SubRip (SRT) and Web Video Text Tracks (VTT). For this example we’ll use SRT, but you can change this by clicking on the “Headers” tab and then changing the “Accept” parameter from “application/x-subrip” to “text/vtt”. Once you have your request fully configured, click send. You should get a response as shown below.

You can see from the response body below that our video has been properly captioned. The individual caption outputs are printed line-by-line along with the associated timestamps for each. You can always download and save this response, then work with it however you see fit. And there you have it — automated caption generation with the API!

Captions for Streaming Video

Rev AI can also work for streaming online video hosted on a website. This is much more difficult to demonstrate in a guide, but you can find our full developer documentation for the Rev AI streaming API here.

Rev AI: The World’s Most Accurate Speech-to-Text API

Rev has a the largest network of professional transcriptionists and captioners in the world, which gives Rev the most accurate training model in the world for our speech-to-text API. This means Rev AI beats Google, Amazon, Microsoft, and other tech giants in speech-to-text accuracy. Try our API free for your first 5 hours of audio and video data.

Rev Beats Google Microsoft Amazon