Live Captioning?

Is there a way to get captions in near-real-time so my app can know what is being said during a meeting?

I’ve had a hard time getting responses to questions on this forum. It’s kind of frustrating. Why bother having one if you’re not going to monitor it and help people out?

Hi @dmyers ,

Thank you for your patience and feedback! We are working to improve response times with respect to other Developer Advocate responsibilities.

You can you use the caption API token to get captions in near-real-time for a 3rd party app.

Check out our Postman workspace for the endpoint.

Does this help?

Thank you for the response. I’m looking for Zoom to provide my app with text representing what is being said during the meeting. WebEx has this available. Does Zoom?

@dmyers There are 4 ways you can stream the real-time transcription from Zoom to a 3rd party app.

1. Use the Zoom live-streaming API

Pros:

  • Doesn’t require any 3rd party services
  • Lighter weight than building and running a Zoom bot

Cons:

  • Needs to initiated on a per-meeting basis
  • You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
  • Participants can get spooked by the “live” badge that appears in the meeting, depending on the use case
  • No speaker separation

2. Build a desktop app

Pros:

  • Will work agnostic of meeting platform
  • Very simple to build

Cons:

  • No speaker diarization, only one audio stream
  • Runs on user’s computer so any processing slow their computer down and drain their battery
  • Recording video is especially resource intensive on user computers
  • Requires user to install software, which some may be hesitant to

3. Build a Zoom bot

Pros:

  • Can get the separate audio streams per participant for perfect diarization / speaker labels
  • Doesn’t spook participants

Cons:

  • It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot.
  • Running infrastructure for Zoom bot costs more than live streaming

4. Use Recall.ai

It’s a unified API that lets you send meeting bots to video conferencing platforms (like Zoom) to capture the audio and video in real-time.

Pros:

  • We handle the spinning up the servers, and piping the audio to transcription providers so all you interact with is a simple API.
  • Gets near-perfect diarization / speaker labels
  • Supports video capture
  • Works agnostic of meeting platform

Cons:

  • It’s another service in your stack