Google Speech Recognition API vs. Rev AI API

Differences in the Rev AI speech recognition API and the Google speech recognition API accuracy, price, ease of use, and more.

Written by:

Rev Press

May 20, 2021

Button Text

Rev’s network of 50,000 human transcriptionists completes thousands of projects daily. This provided an extensive training set for the Rev AI transcription API.

But is that enough to stack up against automatic speech recognition (ASR) from a tech giant like Google? Let’s compare the accuracy, speed, features, and cost of two popular ASR solutions: Google Speech Recognition API vs. Rev AI API.

See Comparisons of Rev AI API vs. Google Cloud Speech API

Which Is More Accurate: Google Speech Recognition API or Rev AI API?

In our podcast transcription benchmarks, we compared the word error rates (WER) of Rev AI and Google’s video model for 30 podcasts. Rev AI was more accurate than Google for 24 of 30 media files. Rev AI also had a lower average WER at 14.22% compared to 15.82% for Google Speech Recognition.

Try Rev’s free Word Error Rate Calculator tools

Feature Comparison: Google Speech Recognition API vs. Rev AI API

Speaker Identification and Diarization

Speaker identification and diarization are crucial for audio files with multiple speakers. These features break the audio file into separate streams for each speaker.

That way, the transcript can indicate who said which words. Rev AI includes full speaker identification and diarization in our service. Google Speech Recognition has these services as well, but they are still in beta.

Try Rev AI API free for your first 5 hours

Language Support

Google includes support for a total of 125 languages and variants. They currently have beta support for the auto-detection of individual languages. This feature works in situations with up to four pre-specified languages.

Rev AI supports a global English model as well as 30 other world languages. This includes:

Arabic	Finnish	Korean	Romanian
Bulgarian	French	Latvian	Russian
Catalan	German	Lithuanian	Slovak
Croatian	Greek	Malay	Slovenian
Czech	Hindi	Mandarin	Spanish
Danish	Hungarian	Norwegian	Swedish
Dutch	Italian	Polish	Turkish
English	Japanese	Portuguese

The global English model recognizes and transcribes speech from several English variants. It even includes English as spoken by a German or French speaker.

Turnaround Speed of Google Speech Recognition API and Rev AI API

For short files (10s of seconds), Google offers impressively rapid transcription turnaround times. For longer files, Google’s transcription takes about half the runtime of the media file. Rev AI’s transcriptions are somewhat slower for short files. However, Rev AI is able to transcribe long files at a remarkable rate — it even transcribes 1-2 hour media files in 5-10 minutes.

Try Rev AI API free for your first 5 hours

Ease-of-Use: Google Speech Recognition API vs. Rev AI API

Google has extensive integrations with other Google services, and it is great for complex applications. The recognizable user interface is a big advantage of their ASR. If you’ve used Gmail or other Google products, you’ll feel right at home in their speech API. Being in the Google ecosystem is also a decided advantage if you plan to integrate your application with other Google services.

Rev AI is built for ease of use as a standalone service. The simpler your application or the less you rely on Google integrations, the more advantageous Rev AI becomes. Rev AI’s output files are easy to read and use because they are available as either .txt or .json transcripts. The .json files contain the text, speaker IDs, timestamps, and confidence scores for each word.

Price Comparison: Google Speech Recognition API vs. Rev AI API

Rev AI charges $0.035 per minute (rounded up to the nearest 15-second increment) for our ASR service base plan. For high-volume users, Rev AI additionally has an enterprise plan that starts at $1.20 per hour ($0.02 per minute) and goes lower in price as volume goes up.

Contact the Rev AI Sales Team for Custom Pricing

Google has two tiers of ASR service: a standard model and an enhanced video model that is more accurate. The Google video model, which was used in the above accuracy comparisons, is $0.036 per minute (rounded up to the nearest 15-second increment). The standard model $0.024 per minute, also charged in 15-second increments. They do offer a discount if you choose to opt-in for data logging.

Note: Prices for these services are constantly changing, and these prices were accurate when this article was written. For up to date prices check out the Rev AI pricing page here and the Google Speech Recognition prices here.

How Do You Choose a Transcription API?

Google Speech Recognition API and Rev AI API both offer excellent ASR solutions. Google’s API offers impressively robust language coverage. Additionally, it integrates well with other Google offerings. That’s great for applications that are already immersed in the Google ecosystem.

Rev AI’s solution offers better accuracy.

This is particularly true for media files that need speaker identification and diarization. Rev AI is also easier to set up and use for standalone applications and has faster turnaround speeds for long media files. But don’t take our word for it, try out the Rev AI API for free today and try Rev’s free word error rate calculator & speech recognition benchmarking tools to run these tests yourself

Try Rev’s free Word Error Rate Calculator tools

Topics:

Speech to Text Technology

Transcription Blog

Heading

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript