Streaming closed captions produced by zoom to third party app via API

API Endpoint(s) and/or Zoom API Event(s)

Question
Is there any facility that will allow me to real-time stream closed captions (or real-time transcriptions) produced within a zoom meeting to a 3rd party app.

Description
I have not seen a satisfactory solution/answer to this yet . There is an obvious solution:

  • Create an app that joins the meeting, captures the audio and transcribes it in real-time to display the text within the 3rd party app. However, this is a very heavy-weight solution as the 3rd party app has to run a zoom client in order to capture the audio and transcribe it.

The ideal solution would be to have Zoom provide a way to send the closed captions to a third party via an API. I would even be willing to pay extra per month to achieve this. Speaker identification and time code would be very useful features. Zoom already provides closed captions in real-time and also provides a full transcript of the meeting that can be downloaded after the meeting has concluded. so it doesn’t seem that this would be a very onerous feature to implement. Does this feature exist? Is this feature planned?

Any help would be greatly appreciated

There are 3 ways you can stream the real-time transcription from Zoom to a 3rd party app.

1. Use the Zoom live-streaming API

Pros:

  • Doesn’t require any 3rd party services
  • Lighter weight than building and running a Zoom bot

Cons:

  • Needs to initiated on a per-meeting basis
  • You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
  • Participants can get spooked by the “live” badge that appears in the meeting, depending on the use case
  • No speaker separation

2. Build a Zoom bot

Pros:

  • Can get the separate audio streams per participant for perfect diarization / speaker labels
  • Doesn’t spook participants

Cons:

  • Like you said, it is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot.
  • Running infrastructure for Zoom bot costs more than live streaming

3. Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms (like Zoom) to capture the audio and video in real-time.

Pros:

  • We handle the spinning up the servers, and piping the audio to transcription providers so all you interact with is a simple API.

Cons:

  • It’s another service in your stack

This topic was automatically closed 368 days after the last reply. New replies are no longer allowed.