Guidance Needed on Handling Media Streams for a Zoom Bot to Enable Interactive Communication

I am developing a Zoom bot that needs to both receive and send real-time media streams within Zoom meetings, enabling it to listen and respond audibly during calls. The bot’s functionality is supported by a backend hosted on a websocket, which manages these media streams almost in real-time.

I am looking for detailed guidance or resources on how to effectively access and send back media streams, so the bot can actively participate in meetings, not just as a listener but also as a speaker. I would appreciate it if you could direct me to any resources or documentation that specifically address this capability, excluding the general guide available at Zoom’s Meeting Bots SDK Media Streams page, as I need more specific information for implementing these functionalities.

@devjianomads, the easiest way to build a Zoom bot receive real-time media streams is to use the Recall.ai Meeting Bot API.

Recall.ai is an API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots.

Zoom made an official meeting bot starter kit with them if you want to check that out:

@devjianomads If you wanted to accomplish this with just the Zoom Meeting SDK then we have resources on building your own bot from the ground up.

Outside of the documentation that you provided we do have a sample application that you can use here:

As well as blog posts on meeting bots that you can find here:

Finally, here is a short presentation I did from our Developer Summit on using real time video streams with OpenCV:

Let me know if that helps!