How to get realtime meeting video and audio data

Hello everyone! Happy to join :wink:

So, my use case -
I need to get the meeting video and audio (raw) in real-time, right to my back-end, to do my logic and display meeting insights in the meeting chat in real time :slight_smile:

Is there any other way to get real-time data other than using the Livestream Meeting feature?
API? Webhook?

Thanks a lot,

2 Likes

@sahar.b1 , Here are 3 methods people typically use to get the real-time raw video and audio from Zoom.

1. Use the Zoom RTMP live-streaming API

Pros:

  • Doesn’t require any 3rd party services
  • Lighter weight than building and running a Zoom bot

Cons:

  • Needs to initiated on a per-meeting basis
  • You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
  • No speaker separation

2. Build a Zoom bot

Pros:

  • Can get the separate audio and video streams per participant for perfect diarization / speaker labels

Cons:

  • It is very heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
  • Running infrastructure for Zoom bot costs more than live streaming.
  • You need to encode the raw video and audio yourself

3. Use the Recall.ai API

It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio, video and transcription in real-time.

Pros:

  • Zoom has an official Meeting Bot Starter Kit they created with Recall.ai
  • Gets the real-time video and real-time audio data for you - you just need to call an API endpoint.
  • You can send messages in the Zoom meeting chat via their APIs as well.
  • Works on any Zoom plan (including Free)
  • Gets speaker diarization / speaker labels
  • Works agnostic of meeting platform

Cons:

  • It’s another 3rd party service in your stack

Let me know if you have any questions!

Many thanks @amanda-recallai !
Your helpful answer led me to either #2 or #3 options.

Are there any latency details for both options?

Can anyone actually help with this, what I’m trying to achieve is build a real-time zoom call assistant, so I would need to get the raw, real time audio and video in my python app for processing. How do I get the raw video and audio in real time (not using this recall.ai)?

Hi @gjorgji ,

I’m working on python bindings for the Zoom Meeting SDK here: GitHub - noah-duncan/py-zoom-meeting-sdk: Python Bindings for the Zoom Meeting SDK

The binds are in beta and do not cover the entire SDK yet, but they will allow you to bring real time zoom meeting data into a python application.

1 Like

Cool. Is there an easier more lightweight solution for getting realtime transcripts to a 3rd party service again not using the recall ai BS?

If you need real-time raw audio and video directly to your backend, Livestreaming is currently the most direct route. However, if you’re looking to augment or complement that, you might consider using webhooks for real-time events (like participant join/leave, screen share start/stop) and Meeting SDKs to access media streams client-side and relay them yourself. There’s no official API that gives you raw media directly server-side at this time—Livestream remains your best bet for that.