API Endpoint(s) and/or Zoom API Event(s)
Is there any facility that will allow me to real-time stream closed captions (or real-time transcriptions) produced within a zoom meeting to a 3rd party app.
I have not seen a satisfactory solution/answer to this yet . There is an obvious solution:
- Create an app that joins the meeting, captures the audio and transcribes it in real-time to display the text within the 3rd party app. However, this is a very heavy-weight solution as the 3rd party app has to run a zoom client in order to capture the audio and transcribe it.
The ideal solution would be to have Zoom provide a way to send the closed captions to a third party via an API. I would even be willing to pay extra per month to achieve this. Speaker identification and time code would be very useful features. Zoom already provides closed captions in real-time and also provides a full transcript of the meeting that can be downloaded after the meeting has concluded. so it doesn’t seem that this would be a very onerous feature to implement. Does this feature exist? Is this feature planned?
Any help would be greatly appreciated
There are 3 ways you can stream the real-time transcription from Zoom to a 3rd party app.
1. Use the Zoom live-streaming API
- Doesn’t require any 3rd party services
- Lighter weight than building and running a Zoom bot
- Needs to initiated on a per-meeting basis
- You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
- Participants can get spooked by the “live” badge that appears in the meeting, depending on the use case
- No speaker separation
2. Build a Zoom bot
- Can get the separate audio streams per participant for perfect diarization / speaker labels
- Doesn’t spook participants
- Like you said, it is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot.
- Running infrastructure for Zoom bot costs more than live streaming
3. Use Recall.ai
It’s a unified API that lets you send meeting bots to video conferencing platforms (like Zoom) to capture the audio and video in real-time.
- We handle the spinning up the servers, and piping the audio to transcription providers so all you interact with is a simple API.
- It’s another service in your stack