Best approach to get streaming Meeting Audio (or streaming transcript)

gibron · October 13, 2022, 8:17pm

Today

We are developing a Zoom App and are using the {userId}/recordings endpoint to get the Audio Transcript for Cloud Recorded Meetings after the meeting is done.

We process the transcript once ready, after the audio is transcribed by Zoom. This approach has limitations:

Requires User to have Business License
Requires Cloud Recording
Must occur after a meeting cloud recording audio is transcribed

What we want

To processing a transcript of the meeting live while the meeting is running, like Closed Captioning might, regardless of recording status.
Enable our app functionality to non-business license users as the Audio Transcript is the only Business License level feature we are using.

Our specific technical goal

Programmatically get audio stream of a meeting (not livestream — which if I understand correctly forces the meeting to be public). What is the right way to do this?
- Alternatively programmatically retrieving the audio transcript segments (as text) streamed, or even as event, or a webhook based solution, etc.

Would it be possible via…

The Zoom Meeting Web SDK inside of a Zoom App?
As a use case for 3rd Party Closed Captioning?
RTMP Live Streaming through Zoom Video SDK for Web?

Note: I have seen a number of questions here in devforum on this general topic over the years but not finding a recent one that factor in the latest functionality: Live Transcription and Closed Captioning features of Zoom Meeting Web SDK. Or perhaps just havent seen clear guidance on the ideal approach to retrieve meeting audio from a Zoom App.

Thank you in advance for any and all guidance.

amanda-recallai · October 16, 2022, 3:29am

@gibron, a Zoom App doesn’t allow you to capture the real-time audio or transcript, you must use a separate method to capture the data, but you can pipe it back to your Zoom App to display.

Unfortunately, there are no direct API endpoints to access the real-time transcript. However, here are 4 other ways you could explore to create a real-time transcript from a Zoom meeting.

1. Use the Zoom RTMP live-streaming API

Pros:

Doesn’t require any 3rd party services
Lighter weight than building and running a Zoom bot

Cons:

Needs to initiated on a per-meeting basis
You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
Participants can get spooked by the “live” badge that appears in the meeting (even if it’s a privte meeting)
No speaker separation

2. Build a desktop app to capture users’ computer audio

Pros:

One of the most cost effective solutions

Cons:

You need to build a separate app for Windows, Mac and Linux
It is especially difficult to tap into computer audio on Mac
App runs on users’ computer so it can slow their computer down/make computer fans go off
No speaker separation

3. Build a Zoom bot

Pros:

Can get the separate audio streams per participant for perfect diarization / speaker labels

Cons:

It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
Running infrastructure for Zoom bot costs more than live streaming.
You need to encode the raw video and audio yourself

4. Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio,
video and transcription in real-time.

Pros:

Handles spinning up the servers, and providing the real-time raw audio/transcript so all you interact with is a simple API.
Works on any Zoom plan (including Free)
Gets speaker diarization / speaker labels
Works agnostic of meeting platform

Cons:

It’s another 3rd party service in your stack

Let me know if you have any questions!

MaxM · October 17, 2022, 10:56pm

@amanda-recallai Amazing, thank you for offering your insight here! I’ll just add a couple of relevant links for @gibron or anyone else:

Meeting Bots: Accessing Media Streams

Some additional info on using RTMP:

gibron · October 25, 2022, 12:31am

@amanda-recallai thank you for your thoughtful response and overall support!
And @MaxM thank you also!
Both have given us a lot to think through and determine our best path forward. Cheers!

Topic		Replies	Views
Live transcription options Zoom Apps	3	723	August 16, 2023
Audio stream access from Zoom's SDK API and Webhooks	4	2273	January 22, 2024
Stream meeting to extract transcripts in real time API and Webhooks	8	2381	August 21, 2020
Creating a transcription service for Zoom Meeting SDK	1	739	April 15, 2022
Web Client SDK and Real-Time Transcripts Web	1	803	March 14, 2022

Best approach to get streaming Meeting Audio (or streaming transcript)

Today

What we want

Our specific technical goal

Would it be possible via…

1. Use the Zoom RTMP live-streaming API

2. Build a desktop app to capture users’ computer audio

3. Build a Zoom bot

4. Use Recall.ai

Related Topics