Hey @filip.baranczuk,
It sounds like you may be trying to use the Zoom Apps SDK currently, is that accurate? If the end goal is to receive/process the raw audio and video streams, you’re correct that you’ll have to connect the bot and process any media on the backend. Thus, you’ll want to use a different approach than the Zoom Apps SDK for this.
There are a few options here:
1. Meeting SDK
You can use the Windows or Linux Meeting SDK to access raw data. We recommend the Linux SDK – this is currently what we use to power our own meeting bots at scale.
2. RTMP streaming
Another option is to use RTMP streaming. This post is a great starting point for learning more about RTMP streaming and example use cases.
3. Recall.ai
Another alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots.
Let me know if you have any questions!