Using Google Cloud Speech-to-Text to transcribe your Twilio calls in real-time

macOS Incident Response | Part 2: User Data, Activity and Behavior

August 28, 2019

Developers have asked us how they can use Google Cloud’s Speech-to-Text to transcribe speech (especially phone audio) coming from Twilio, a leading cloud communications PaaS. We’re pleased to announce that it’s now easier than ever to integrate live call data with Google Cloud’s Speech-to-Text using Twilio’s Media Streams.

The new TwiML <stream> command streams call audio to a websocket server. This makes it simple to move your call audio from your business phone system into an AI platform that can transcribe that data in real time and use it for use cases like helping contact center agents and admins, as well as store it for later analysis.

When you combine this new functionality with Google Cloud’s Speech-to-Text abilities and other infrastructure and analytics tools like BigQuery, you can create an extremely scalable, reliable and accurate way of getting more value from your audio.

The overall architecture for creating this flow looks something like what you see below. Twilio creates and manages the inbound phone number. Their new Stream command takes the audio from an incoming phone call and sends it to a configured websocket which runs on a simple App Engine flexible environment. From there, sending the audio along as it comes to Cloud Speech-to-Text is not very challenging. Once a transcript is created, it’s stored in BigQuery where real-time analysis can be performed.

Using Google Cloud Speech-to-Text to transcribe your Twilio calls in real-time

Top 20 articles for vSphere, July 2019

macOS Incident Response | Part 2: User Data, Activity and Behavior