Speech-To-Text Simplifies Podcast Editing

Learn how a podcast creator uses a speech-to-text API to cut his editing time in two thirds and better engage his listeners.

Written by:

Allison Koo

February 3, 2019

Button Text

Podcasts
have become more popular with the rise of mobile devices and improved recording
software. Editing a podcast, however, can be a time-consuming endeavor.

Richard Kalling, co-host of the ALH podcast, needed to cut down on the amount of time he spent editing the podcast.

Originally,
his editing process involved making notes of what sections to remove or leave
in. This required him to review the entire podcast at least twice.

First,
he would make a label track in Audacity, where he would annotate the audio with
topics and key phrases concerning what was said in the podcast. Then, he would
use these annotations to make cuts and move sections in a second pass through
the audio file. He also used these annotations to find edit points to make
modifications based on suggestions from his co-host. After these edits, the podcast would be
uploaded publicly.

“This
was taking an obscenely long amount of time,” Richard says.

Automating
the annotation part of the process could help resolve this issue. Richard
decided to seek out a speech-recognition API to create a podcast transcription
to use for editing.

A
developer himself, Richard had several qualities in mind for the ideal ASR:

Easy to implement
Low price
Quick turnaround
Automated, no human transcription
Able to import into Audacity, the open-source audio software used to record and edit ALH

Testing Automatic Speech Recognition Options

Richard tested several options for transcribing their audio files. One contender was the Google Speech API, but versioning issues made setting up the environment and building the sample code time-consuming and complex. He also looked at IBM’s Watson and tested a sample with Amazon Transcribe, but found that they also did not suit his needs.

Richard has also tried Rev’s Temi software, which helped with transcribing. However, it still involved more manual steps than he was looking for and didn’t have enough flexibility on label formatting. On a recommendation, he turned to Rev AI to help create a more automated way to transcribe the ALH podcast.

Implementing Rev AI to Transcribe and Edit Podcast Files

Creating a solution using the Rev AI API only took about two hours, Richard says.

He looked at the GitHub page for Rev AI’s Python API project to browse the code and decide which classes to import. After downloading the Rev AI Python SDK, he wrote a simple wrapper script to submit and jobs and retrieve results.

Setup was far simpler than the other ASR services he tried, requiring only instantiating a RevAiAPIClient class and passing in an API key generated on his Rev AI account page.

Richard chose to use Rev AI’s JSON transcript so he could control the Audacity labels.

“The JSON format is flexible enough that I can write a script to put it in exactly the format that I need it, and that the transcription accuracy is more than good enough for what I need to use it for,” Richard says. “I’m sure, with some massaging of the input audio file, I can get even better accuracy.”

Improved Podcast Editing With Rev AI

Rather
than working with his own, abbreviated notes, Richard now has the full
transcript of the podcast to work with. This helped him eliminate the first
pass of his process. “Now it takes about half to two-thirds the time it
did before,” he says.

Having
a transcript of the podcast also helps Mark El-Wakil, Richard’s co-host and the
co-creator of ALH. Mark writes the podcast’s accompanying show notes, which
lists topics discussed and resources mentioned on an episode. Once Richard has
finished his editing, he sends the transcript to Mark, who can quickly scan the
transcript to determine what should be included as supplemental information for
the podcast.

Experience Rev AI for Yourself

Rev AI is an advanced speech recognition API from the makers of Temi and Rev.com. Power up your application with our best-in-class proprietary speech models.

Ready to tackle your program’s transcription needs more effectively? Sign up to try Rev AI now, with 5 free hours of credit and no credit card needed!

Topics:

Media & Entertainment

Heading

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript