austinmw
(Austin)
August 23, 2022, 3:29am
1
Hi, I’d like to build a patient monitoring interface on top of Zoom by performing object detection of people in real time. In order to do this I need to get access to the raw realtime video data, and I found this discouraging post from 2020: Access to streaming data for object detection
Is it still true that this is unavailable or is there a way to do this nowadays?
Thanks
Hi @austinmw
Thanks for reaching out to the Zoom Developer Forum, I am happy to help here!
Unfortunately, this is still not available.
Cheers,
Elisa
@austinmw , this is definitely possible. There are 3 common ways you could access the real-time raw video data from Zoom.
1. Use the Zoom live-streaming API
Pros:
Doesn’t require any 3rd party services
Lighter weight than building and running a Zoom bot
Cons:
Needs to initiated on a per-meeting basis
You need to set up an RTMP server to receive the data, which requires engineering effort to deploy, scale, and monitor
Participants can get spooked by the “live” badge that appears in the meeting, depending on the use case
Can’t get separate video stream per participant
No speaker diarization
2. Build a Zoom bot
Pros:
Can get the separate audio streams per participant for perfect diarization / speaker labels
Can get separate video streams per participant
Doesn’t spook participants
Cons:
It is extremely heavy-weight as you would need to spin up multiple servers to run the Zoom client for the bot
Running infrastructure for Zoom bot costs more than live streaming.
You need to encode the raw video and audio yourself
3. Use Recall.ai
It’s a unified API that lets you send meeting bots to video conferencing platforms to capture the audio and video in real-time.
Pros:
Handles spinning up the servers, and providing the real-time raw video/audio so all you interact with is a simple API
Can get separate video streams per participant
Can get the separate audio streams per participant for perfect diarization / speaker labels
Works agnostic of meeting platform
Cons:
It’s another 3rd party service in your stack