Speech Recognition Trends to Watch in 2021 and Beyond: 3 Use Cases
Advances in Automatic Speech Recognition (ASR) technology are fueling a new wave of innovation, especially when it comes to the ways that we interact with the digital. Voice is natural. It’s intuitive, frictionless, and more accessible to a wider audience.
At the forefront of our cutting-edge research sit two game-changers. First, streaming ASR applies this tech in real-time, either to provide captions for a live event as it’s happening or to enable conversational user interfaces alongside other machine learning (ML) technologies like natural language processing (NLP).
Second, end-to-end (E2E) machine learning (ML) models are demonstrating a profound effect on ASR’s effectiveness. Rather than building and training multiple models (acoustic, linguistic, pronunciation) for a single ASR solution, an E2E model brings all of this functionality into a single algorithm. These models are smaller, faster, and easier to train, making them simpler to deploy at the edge. They enable us to more rapidly train for foreign languages, and support advanced features like speaker diarization, which enables an ASR model to determine who said what and when.
Now that we’ve released our Rev.ai speech-to-text application programming interface (API), we’re excited to see how developers are going to integrate our tech into their products. Want a sneak peek into what tomorrow may bring? These are the top use cases that we can’t wait to explore.
1. Voice User Interfaces
Just as the touchscreen revolutionized the way we interact with our devices, the voice user interface (UI) holds a similar promise. This is a huge paradigm shift.
“76 percent of people who regularly use voice technology say that they find it very natural,” writes Finances Online. “Other top reasons for its use are because it is simpler and faster than typing. Meanwhile, a more recent report also revealed that 68 percent of voice-operated assistant users say that these virtual assistants make life easier overall.”
Of course, smart speakers, virtual assistants, and the Internet of Things (IoT) more generally are already leading the way with voice UIs. However, as ASR tech progresses, we will see a few key shifts. These devices will be able to more effectively deal with the complexities and nuances of our speech, enabling them to do much more than we ask of them now: check the weather, search the internet, set a timer, etc.
The traditional mobile app market will also see the effects of voice UI. While nearly any application can employ ASR to create a better user experience, this tech will prove crucial for apps that cater to specific demographics, such as the elderly or those with vision impairment. Instead of navigating menus and screens with swipes and clicks, we can simply talk.
Lastly, home automation devices, ranging from robotic vacuums to interactive toys, will also get a bump from improved ASR. Not only will a better voice UI give us greater control over these devices, but this also opens the door to more personable relationships with our domestic technology.
2. Searchable Audio
When Dan Kokotov, Rev’s VP of Engineering, went on Lex Fridman’s AI podcast, they discussed how Rev’s ASR is a boon for media asset management. “Rev put a smile on my face,” said Fridman as he explained how Rev has proven instrumental for podcasters like him.
Previously, if you wanted to revisit part of a podcast, audiobook, video, or song, the only option was searching for the title and manually going through the file until you found the right part. ASR changes that equation by creating an indexable, searchable text file that’s easy to reference.
In a business setting, searchable audio also shines for sharing, reviewing, and referencing video conference calls. 2021 may not be the year of Zoom in the same way as 2020, but since remote workforces are here to stay, we’re going to continue to need ways to know who said what in the call. Watching the recording simply isn’t feasible. Transcription means you’re only one control+F away from the information you need.
Along similar lines, this is a big game for students who need to review lecture notes for their studies. If they miss something or they need to revisit a particularly tricky subject, they can search the transcript for the information they need.
3. Immersive Entertainment
Video games continue to achieve greater levels of realism, and ASR is poised to even further blur the lines between player and avatar. Natural dialogue with nonplayer characters (NPCs) will give players increased agency, expand the realm of what’s possible in a game, and open the door to more personalized experiences.
Many games also employ an in-game assistant, and interacting with them using voice will allow us to interface with the game in a similar way that we would a smart speaker in our home. Whether that means saying “Hey, listen!” to Navi in The Legend of Zelda or asking for help from Cortana in Halo, being able to interact with these companions in a more genuine way will make these games more immersive, easier to learn, and just plain fun.
Things get really interesting when we consider the synergies between voice and augmented/virtual reality. Both ASR and VR enable us to interact with games in more natural ways, and when we combine them we might just find ourselves in an environment that’s closer to Ready Player One than Super Mario Bros.
At the end of the day, all these trends come down to bridging the divide between the digital and physical. Whether that means having a real conversation in a video game or telling your robotic vacuum cleaner to sweep up after the dog, we’re witnessing the gap between the internet and IRL steadily close.
And we’re just getting started. While these 3 use cases are some of our top picks for where ASR is headed, this is far from an exhaustive list. That’s one of the main reasons why we’re so thrilled to partner with developers around the world via the Rev.ai API. We’re giving you access to our tech so that you can dream big and invent the future.
Do you have the next great idea for an ASR application? Sign up below for a free trial!