Transcript File Formats: Everything You Need to Know About Different Transcription Formats

RevBlogResourcesTranscribeTranscribe Audio to TextTranscript File Formats: Everything You Need to Know About Different Transcription Formats

You can transcribe video and audio content on your own or use a professional transcription service, like Rev. Whichever route you take, it is important to be aware of the different transcription file formats available.

The choice of the transcription file format will depend on the requirements of each task. Some of the issues to consider include:

  • The video/audio hosting platform
  • Which file formats are supported by the platform?
  • Will your website host or embed videos?
  • Are downloadable transcripts needed?
  • Do you want to make the transcripts interactive or have a playlist search?

Let’s examine some of the layout choices available.

Standard Layouts

There are several standard layouts as explained below.

Plain Text Transcription Formats

Plain text formats are the most common and uncomplicated type. They are:

  • Plain Text Files (.txt): These have a .txt file extension and are stripped of any kind of formatting. Plain text files are opened using a plain text editor such as Notepad.
  • Microsoft Word Document (MS Word .docx): These are Microsoft Word transcripts that include text formatting. They are easy to read and edit.
  • PDF: Portable Document Formats are locked and can’t be formatted or edited. They are also easy to read.

Time-Stamped Transcription File Formats

As the name suggests, these files have timestamps against what was said. The time-stamps can be in seconds, minutes, or hours. The documents are typically in MS Word format.

A document with an SMTPE (Society of Motion Picture and Television Engineers) timecode also comes with frame labeling. Timestamps help to synchronize audio to the video content.

HTML Transcription File Formats

HTML file formats are useful when hosting and embedding videos on your website. The audio or video transcript is formatted with HTML to make it accessible online via a browser. It is often also optimized for screen readers.

JS and JSON File Formats

JS and JSON are less common. They are the main output used by machine learning transcription software. The main advantage of these formats is the unique time synchronization where every word has a timestamp right down to the millisecond. This is useful for interactive transcripts where each word becomes a link to the exact video or audio section where it was spoken.

JS and JSON formats also make it possible to look up a transcript from a database of video or audio content.

The drawback is that they are not user-friendly. They are difficult to read and not an ideal choice for downloads.

Text in Tables Vs Text in Paragraphs

Transcribing text into a table is not as straightforward as text in paragraphs. Most transcription software inputs text as plain text and doesn’t support tables. The text has to be converted into table format by exporting it as a Tab-delimited text or Comma Separated Values (CSV) text file.

In a Tab-delimited text file, information is separated using tabs that represent columns with one record per line.

In a Comma Separated Values (CSV) text file, information is delimited using commas that represent columns with one record per line.

Below is an example of how data can be created in table format:

Tab-delimited text 

Timestamp Speaker Transcript

00:00:04.19 Mark In this video, we will learn the basics of content marketing.

00:00:08:20 John In this next section, we will learn about buyer personas.

CSV text

Timestamp,Speaker ,Transcript

00:00:04.19,Mark,In this video, we will learn the basics of content marketing.

00:00:08:20,John,In this next section, we will learn about buyer personas.

The resulting table:

Industry Transcript Layouts

In addition to the standard layouts, there are layouts specific to certain industries as follows:

Edit Decision List (.edl)

This format is popular in the film industry and is used in the post-production process for script-based editing. It is an ordered and timestamped file format used to identify key moments in a transcript and relate them to the exact moment in the video.

Selects are highlighted in the transcript editor and exported to an EDL file that lists source video data and timecodes. This data can then be uploaded into a video editing software like Adobe Premiere Pro, Final Cut Pro, Avid Media Composer, or DaVinci Resolve to instantly create an assembly sequence. This format is invaluable for large video editing projects.

Avid ScriptSync (.txt)

ScriptSync is a product that lives inside Avid Media Composer, a video editing software by Avid Technology. It assists filmmakers and video editors to quickly sync videos and audio clips directly to the lines of words on a script. Either someone writes a script ahead of time or, like with documentaries or reality shows, someone transcribes what was said during the show and then builds the script in post-production.

Avid ScriptSync files are plain text files. The software creates video/audio sync marks for each line of text.

Legal Transcript

Legal transcripts are taken during legal proceedings such as court sessions, depositions, and congressional/senate hearings. Sessions are often recorded and transcribed later. The file is usually a Microsoft Word document.

However, the text formatting depends on the style guidelines provided by the lawyer, court reporters association, or court system. For example, the California Court Reporters Association, a professional body of court stenographers, has published the minimum transcription standards format on their website.

At Rev, we provide human transcription services with an accuracy rate of 99% and several transcript formatting options.