How can I speed up transcription availability for Zoom calls?

petert · December 27, 2022, 4:42pm

I’m building a Zoom integration with the Zoom API.

After a call, I would like to access the call transcript within 5 minutes. However, Zoom API only provides this later. I often experience 15-60 minutes delays.

How could I get the call transcript faster at a reasonable cost?

Options I have considered:

Send bots to meetings
But it’s costly and having a bot in a meeting can be a bad experience
Get people to use Zoom in their browser and capture captions from there
But requires a significant user behavior change
Build Mac and Windows apps that use zoom local recordings and captions
But increases build complexity significantly
Wait until Zoom speeds up the availability of their transcript after a call
But they might do this only after several years
Educate users that waiting for 15-60 minutes is ok
But they will likely not agree
Use a 3rd party transcription service
But this gets expensive

freelancer.nak · December 27, 2022, 6:27pm

Hi @petert, the best option as per my experience is

Zoom Recording Downloader Bot → once the meeting recording.completed event is triggered by Zoom and Bot will pull the recording and send that to some third-party Transcription service provider (The best is assembly.ai) and Send emails or display transcription on the dashboard if you have.
Zoom Recording Bot get meeting streams in real-time and Send Streams to the Transcription Service provider and once the meeting ended all the transcription will be ready.

I have bot solutions for both (1. & 2.) if interested please ping me via my Upwork profile link which is in my description. Thanks

petert · December 27, 2022, 7:50pm

Thanks, but this seems to require an additional bot-participant in the call which I am hoping to avoid.

freelancer.nak · December 27, 2022, 7:53pm

@petert 1. is without bot in meeting and 2. is with a bot as a participant in the meeting.

freelancer.nak · December 27, 2022, 7:55pm

Option 2. will be super fast b/c once the meeting will end transcription will be ready.

petert · December 28, 2022, 8:49am

As I understand it:

the problem here is that you get a lag, as the cloud recording is not ready immediately after the call. So I would have to wait here
and the problem here is that there is a bot in every meeting, which many of our users don’t like because they get questions about it from other meeting participants

freelancer.nak · December 28, 2022, 8:54am

Then you can use a custom live stream service to record the meeting on the server without participant.

petert · December 28, 2022, 9:11am

Interesting, that’s an option I haven’t explored before. I assume the live-stream destination would still need to transcribe this. If so, is there a cost-efficient way to do this transcription? I’ve seen costs typically at something like ~$0.8/h for high-quality transcription which is too much for my use case (it could be maybe $0.1/h at max).

freelancer.nak · December 28, 2022, 9:23am

@petert AWS will be cheaper. Assembly.ai will charge $0.9/hr with core transcription only.

petert · December 28, 2022, 1:37pm

It’s cheaper but will still be too high. Even at the 5M minute tier, it seems to be $0.42/h while we would need <$0.1/h

freelancer.nak · December 28, 2022, 1:56pm

Please review Symbl.ai and rev.ai.

amanda-recallai · January 4, 2023, 8:18pm

@petert, the best way to speed up transcription availability is with a real-time transcript that is available immediately after the call is done. Unfortunately, there are no API endpoints to access the real-time transcript. However, here are 4 other ways you could explore to create a real-time transcript from a Zoom meeting.

1. Use the Zoom RTMP live-streaming API

Pros:

Doesn’t require any 3rd party services
Lighter weight than building and running a Zoom bot

Cons:

Needs to initiated on a per-meeting basis
You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
Participants can get spooked by the “live” badge that appears in the meeting (even if it’s a privte meeting)
No speaker separation

2. Build a desktop app to capture users’ computer audio

Pros:

One of the most cost effective solutions

Cons:

You need to build a separate app for Windows, Mac and Linux
It is especially difficult to tap into computer audio on Mac
App runs on users’ computer so it can slow their computer down/make computer fans go off
No speaker separation

3. Build a Zoom bot

Pros:

Can get the separate audio streams per participant for perfect diarization / speaker labels

Cons:

It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
Running infrastructure for Zoom bot costs more than live streaming.
You need to encode the raw video and audio yourself

4. Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio,
video and transcription in real-time.

Pros:

Handles spinning up the servers, and providing the real-time raw audio/transcript so all you interact with is a simple API.
Works on any Zoom plan (including Free)
Gets speaker diarization / speaker labels
Works agnostic of meeting platform

Cons:

It’s another 3rd party service in your stack

Let me know if you have any questions!

Topic		Replies	Views
Get Zoom Transcription API and Webhooks recording	3	862	October 15, 2023
Can we get from Zoom API real time speech to text transcription of the meeting? API and Webhooks	4	1832	September 13, 2024
Is there any API or any other way we can use to transcript audio from zoom call in real-time so that we can use it in our backend for further processes ofcourse with the consent of the user Feature Requests meeting-sdk	1	626	October 8, 2022
How to get live transcription during a meeing API and Webhooks	2	5094	October 22, 2023
Help on How to Start Building Live Transcription App within Zoom API and Webhooks live-streaming , recording , api	2	671	July 16, 2025

How can I speed up transcription availability for Zoom calls?

1. Use the Zoom RTMP live-streaming API

2. Build a desktop app to capture users’ computer audio

3. Build a Zoom bot

4. Use Recall.ai

Related topics