Can we get from Zoom API real time speech to text transcription of the meeting?

Can we get from Zoom API real time speech to text transcription of the meeting?

Hey @mukhayyo.tashpulatov
Unfortunately you cant right now.
Cheers,
Elisa

@mukhayyo.tashpulatov, there are 3 ways you could get the real-time transcription from Zoom:

1. Use the Zoom live-streaming API, and feed audio stream to transcription provider

Pros:

  • Doesn’t require any 3rd party services
  • Lighter weight than building and running a Zoom bot

Cons:

  • Needs to initiated on a per-meeting basis
  • You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
  • Participants can get spooked by the “live” badge that appears in the meeting, depending on the use case
  • No speaker separation

2. Build a Zoom bot, and feed audio stream to transcription provider

Pros:

  • Can get the separate audio streams per participant for perfect diarization / speaker labels
  • Doesn’t spook participants

Cons:

  • It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
  • Running infrastructure for Zoom bot costs more than live streaming.
  • You need to encode the raw video and audio yourself

3. Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio,
video and transcription in real-time.

Pros:

  • Handles spinning up the servers, and providing the real-time raw audio/transcript so all you interact with is a simple API.
  • Gets speaker diarization / speaker labels
  • Works agnostic of meeting platform

Cons:

  • It’s another 3rd party service in your stack

Let me know if you have any questions!

Hello Amanda,

I’m looking for your prices as I want to use Deepgram. Could you contact me at astowny(at)gmail.com ?

Thanks.
Tony