Help on How to Start Building Live Transcription App within Zoom

parthasarathy.madhav · June 30, 2024, 7:21pm

I am struggling to begin creating a zoom app that would essentially transcribe meeting audio live and push this audio to an external API. I am not sure where to start; whether I should use meeting sdk, video sdk, zoom apps, oauth, server-to-server oauth.

The idea is that I would like to have this application transcribe audio from every meeting the user is a part of, and after every meeting, push this transcription to an external API.

amanda-recallai · July 13, 2024, 3:10am

Hey @parthasarathy.madhav , Unfortunately, there are no direct API endpoints to access the real-time transcript. However, here are 4 other ways you could explore to create a real-time transcript from a Zoom meeting.

1. Use the Zoom RTMP live-streaming API

Pros:

Doesn’t require any 3rd party services
Lighter weight than building and running a Zoom bot

Cons:

Needs to initiated on a per-meeting basis
You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
Participants can get spooked by the “live” badge that appears in the meeting (even if it’s a privte meeting)
No speaker separation

2. Build a desktop app to capture users’ computer audio

Pros:

One of the most cost effective solutions

Cons:

You need to build a separate app for Windows, Mac and Linux
It is especially difficult to tap into computer audio on Mac
App runs on users’ computer so it can slow their computer down/make computer fans go off
No speaker separation
Not compliant with Zoom’s recording policies

3. Build a Zoom bot

Pros:

Can get the separate audio streams per participant for perfect diarization / speaker labels

Cons:

It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
Running infrastructure for Zoom bot costs more than live streaming.
You need to encode the raw video and audio yourself

4. Use Recall.ai

Recall.ai is a unified API that lets you send meeting bots to video conferencing platforms to capture the audio, video and transcription in real-time.

Pros:

Handles spinning up the servers, and providing the real-time raw audio/transcript so all you interact with is a simple API.
Works on any Zoom plan (including Free)
Gets speaker diarization / speaker labels
Works agnostic of meeting platform

Cons:

It’s another 3rd party service in your stack

Let me know if you have any questions!

Topic		Replies	Views
Audio stream access from Zoom's SDK API and Webhooks	4	2273	January 22, 2024
Live transcription options Zoom Apps	3	723	August 16, 2023
Live meeting audio / transcription API and Webhooks	2	2181	October 18, 2020
Does zoom SDK/API allow for live transcriptions during the call? API and Webhooks	4	532	February 25, 2022
How to get the live transcript of the Zoom Meeting API and Webhooks	1	570	June 22, 2024

Help on How to Start Building Live Transcription App within Zoom

1. Use the Zoom RTMP live-streaming API

2. Build a desktop app to capture users’ computer audio

3. Build a Zoom bot

4. Use Recall.ai

Related Topics