The secret to mic-drop demos workshop - Oct 6 at 10am EST.Sign up now!
So, Just How Good Are Wingman’s Call Transcripts?

So, Just How Good Are Wingman’s Call Transcripts?

Kesava Mandiga
Kesava Mandiga
December 24, 2021
5 min read

There's an elephant in the conversation intelligence space. If it’s so great, why do you still see transcription errors? Let's unpack it.

Disclaimer: I’m simplifying all the tech as much as I can, but we’re talking about using machine learning to solve a uniquely human problem. So this is going to be a little jargon-y anyway. ;)

How Wingman works: behind the scenes

It all starts with recording your sales calls. Then, each call recording is run through an Automatic Speech Recognition (ASR) system to identify speakers and turn speech into text — your call transcript. Now, each transcript passes through Natural Language Processing (NLP) algorithms for contextual analysis.

Or that’s how most conversation intelligence tools work, after your call ends. But Wingman does it in real-time. So sales meetings are recorded, transcribed and analyzed as they happen. This is what powers the on-call monologue alerts and contextual battle cards that our customers love.

Note: The process outlined above is for video meetings. We’d process dialer calls in real-time too, but most dialer apps record by default so we import and process them a few minutes later.

How Wingman Works: BTS

Wingman uses a combination of multiple third-party ASR systems and our own purpose-built NLP algorithms. The ASR layer detects speakers (diarization) and creates a transcript. Then, this output is processed by our NLP layer to analyze and refine contextual categorization. Think automatic highlights on every sales conversation — questions, blockers, next steps and so on. All in real-time.

Every day, Wingman processes thousands of sales calls as they happen. We’d love to claim it’s 99.99% accurate, but that’d be a lie. There’s some research data here, but we also measure transcription accuracy ourselves via independent testing.

There’s a set of truth data — calls transcribed by 2 or more human transcriptionists (the most accurate, but even humans have a 4-5% error rate). We run the same calls through various ASR systems to calculate an accuracy score and better identify areas of improvement. 

We recently tested transcription results from Amazon, Google and Wingman.

  • Amazon came in third at 77.95% accuracy
  • Google was just a little bit better at 78.51% accuracy
  • Wingman hit the ceiling with 85.25% accuracy

Wingman uses a combination of multiple third-party ASR systems and our own purpose-built NLP algorithms to achieve this.

What!? Why isn’t Wingman 100% accurate?

The short answer is that the machines are just not that smart yet.

We’ve come a long way from Bell Labs’ Audrey (1952), an ASR system that could recognize numbers spoken to it, and IBM’s Shoebox (1962) that understood 16 English words. Going from Hidden Markov Models (HMMs) to deep learning neural networks, today’s ASR systems are exponentially better. They are so good, we’re now seeing applications of speech-to-text everywhere — from virtual assistants to live video captions.

Think Siri, Alexa or even YouTube’s live captions. They’re good, really good, but accurate? Not always. I asked Siri for Max Cooper (a musician) yesterday and got Madhapur, Hyderabad (a location). Sigh.

Live Captions on YouTube via The Atlantic


Yet, speech-to-text for dictation is relatively easy. Long-form transcription with 100% accuracy isn’t.

Yes, today's ASR systems boast of 90%+ accuracy and that’s for single-speaker snippets like you’d see with virtual assistants or dictation apps. But transcribing video meetings and phone calls (long-form multi-speaker conversations) accurately is still a challenge, even for the best ASR systems out in the world.

You probably know a few reasons why already, but here five variables that significantly impact the long-form transcription accuracy of modern ASR software: 

  1. Accents and variations in rate of speech
  2. Homophones, homographs and homonyms
  3. Crosstalk aka overlapping dialogue
  4. Audio quality and background noise
  5. Acronyms and industry-specific jargon

I’m covering these challenges in more detail here, but we’re already taking action to improve transcription accuracy (more on this in a bit).

How is Wingman’s 85% accuracy useful for sales?

It’s not perfect, like I just described, but even imperfect transcription has many uses. When it comes to transcribing and analyzing sales conversations though, the gist is — time, scale and directionality. 

Time

Wingman’s conversation intelligence cuts time spent on call reviews by half or more. With access to audio + transcript, sales reps, managers and leaders get full context faster than ever.

Scale

Analyzing thousands of conversations at scale is certainly not a job for humans. Wingman does it at a fraction of the cost — with ever-increasing accuracy — unlocking insights from all your sales calls instead of random samples.

Directionality

Wingman gives sales teams actionable signals on where sales conversations are headed — as they happen. Pricing? Blockers? Next Steps? You got it. Competitor mentions too, so you can plant a few depositioning landmines and take action before deals slip away.

Actionable tips to improve transcription accuracy

While we’re constantly improving Wingman’s transcription (more on this next), there are two things you can do right now to help. Just two little things that will ensure better accuracy.

1. Improve the audio quality of your meetings.

Use a better microphone (or noise-cancelling headphones with mic) to ensure there is little to no background noise. If it’s a video call, join from the app instead of dialing in from the phone for higher quality recording.

2. Avoid crosstalk. Well, as much as you can.

This one’s tough, ‘coz most of us seem to be hardwired to interrupt, but resist the temptation to speak over each other on calls. It’s okay if it happens, but try to keep overlapping dialogue to a minimum (so the AI doesn’t get confused).

That’s it. Better audio quality and minimal crosstalk will immediately make your Wingman transcripts more accurate. In the background, we’re working on some improvements too.

Coming soon to your Wingman

We use Wingman ourselves. Or as one of our sales heroes (sorry not sorry, Neha!) says: “we drink our own beer.” So we know each percentile improvement in long-form transcription accuracy counts, it means more actionable intelligence for sales, marketing and product teams.

We’re building a few things to make Wingman a little more accurate everyday.

Add custom vocabulary rules

Wingman admins will soon be able to add pre-set rules to automatically correct misspellings, specific words or common mistakes in call transcripts.

Conversation hints for NLP

When speakers use brand names, features and acronyms, you can quickly add a note so Wingman’s AI contextually updates call transcripts.

Edit transcripts in-line

Even as we improve the accuracy of our NLP algorithms, we know there will be a miss here and there. So we’re building in the ability to edit/correct transcripts directly in Wingman too.

Conclusion

So that’s an overview of modern ASR systems, how Wingman works, why 85% accuracy is super useful and simple ways to improve it. Plus, what we’re building to make it better as we go.

Taking a moment here to thank our engineers for putting up with all my questions, but if you like technical stuff — keep an eye out for our new Engineering blog. Toodles!

Liked this? Here are a few interesting reads:

In this article

Liked this article?

Subscribe to our newsletter and stay up to date.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related articles