Skip to content

Are Captions on TV Written by Humans or Speech Recognition Technology?

Are Captions on TV Written by Humans or Speech Recognition Technology?

RevBlogCaptionsAre Captions on TV Written by Humans or Speech Recognition Technology?

Just about every show you see on your television set is broadcast with a hidden layer of captions built in. Closed captions are an accessibility feature that make watching TV more inclusive for more people. But captioning is not just a kind service that video producers offer—it’s a legal requirement.

What Are Closed Captions?

Closed captions are like subtitles, but they are shown in the same language spoken on screen. Captions also include extra information about the music and nonverbal sounds in a show or movie. Captions may use different colored text to indicate different speakers. They also typically include sound effects and musical cues. These elements help enhance the experience for people who are deaf or hard of hearing. 

For TV shows, closed captions are embedded in the signal and decoded by a television set component. Viewers can usually access them with a click of a button. But online captioning is essential to provide usability for all audiences. Thankfully, today’s digital tools and third-party caption services like Rev make it easier for businesses to comply with these laws.

Closed Captions Are A Legal Requirement

The United States and other countries have legislation in place to ensure video content is accessible to all.

The Americans with Disabilities Act (ADA) passed in 1990, so its application to web content is hazy. In principle, the ADA requires any business under the category of “place of public accommodation” to provide closed captions or video transcriptions for their videos. A ‘place of public accommodation’ is any facility used by the public. This type of business falls under 12 categories, from shops and bars to movie theaters, schools, and the zoo.

However, not all digital-only businesses fall under these 12 categories. The courts make decisions from case to case. For example, a landmark 2012 case ruled Netflix to be a place of public accommodation. It is fair to expect any digital business that fits one of the 12 categories should keep to the ADA requirements. But other businesses should not assume their content is exempt.

The ADA passed in the same year as the Television Decoder Circuitry Act. This act requires television sets to have built-in components to decode closed caption signals. Before this, disabled consumers needed to buy a separate decoder. This shows how old the ADA is! Many believe the ADA needs updating and clarifying. In the meantime, it is safest for businesses to provide captioning as standard. And of course, this is the decent thing to do for viewers with different accessibility requirements.

The Federal Communications Commission (FCC) has since mandated more internet-ready legislation. For example, closed-captioned content shown on TV must also have captions when shown online. (Rev’s closed captions are FCC and ADA compliant, so you can use them on any video platform.)

Why Most TV Captions Use Human Transcription Services

Voice recognition software is getting better and better. However, media companies continue to use human transcription services for live programs and pre-recorded shows. Speech recognition technology helps them work fast to caption live broadcasts such as news programs. Top professionals still consider the human element to be essential for accurate captions.

The BBC alone has a team of 200 professional subtitlers. They caption 200 million words per year across its channels. Their technique is called ‘respeaking.’ They listen to what’s said on a recorded or live TV show and repeat it clearly into a studio microphone in real-time. This eliminates background noise, mispronunciations – and errors. The computer hears the subtitler’s clear voice and generates the caption on the screen.

The Human Touch

One of the main tasks of subtitlers is to repeat the words so clearly that there’s no way the Automatic Speech Recognition (ASR) technology mishears them. But the human touch goes much further.

The subtitler also looks out for anomalies like homophones. These are words that sound the same but mean different things. Even if the subtitler says “they’re” with perfect clarity, the computer may hear it as “their” or “there.”

Other anomalies need equal attention. Unusual names and foreign words may confuse the computer. A trained subtitler researches the subject and gets the details right. If they’re transcribing a live show, they may even create short-cuts or add words to the ASR’s dictionary. This way, they can work faster and with greater accuracy.

A human captioner also looks out for other sonic events and verbal nuances. They must caption sounds like laughter and even differentiate between different types of laughter. For example, ASR technology doesn’t yet know the difference between [uncontrolled laughter] and [sarcastic laughter]. A human subtitler notes the nonverbal elements of a show that are important to the viewing experience. They caption sound effects and describe the feel of the music.

The other key duty of a closed captioning professional is to keep the captions out of the way of the action. A human knows better than a computer which bits of the picture are important to see. Even while ‘respeaking’ and adding effects, the subtitler is ready to move the captions from the bottom of the screen if there is essential action going on down there. Examples include food prep in a cooking show, the ticker on a news show, or certain plays in live sports.

Human professionals can ensure more accurate captions. If you’re looking to caption your own video content, try Rev for quick, affordable captions from human professionals.

Affordable, fast transcription. 100% Guaranteed.