I’m trying to get the raw audio stream data to make some transcribing and translation jobs in real-time(like otter.ai did) for a meeting . The SDK Im using is Web Meeting SDK v3.8.0 . Currently it can only join the meeting as a bot with a given invitation link.
From the Meeting SDK guides, it seems that only Windows/Linux/MacOS sdk can access the raw audio data in real-time.
So from what I understand is firstly build a Desktop SDK server to access the raw data. Then send the data by web socket or other way to the Zoom app(in this case, is my web Meeting SDK project).
Is this correct? or there is an alternative way to do it.
thanks for your time and looking forward to your reply.
To access the raw audio stream data, you need to build a meeting bot.
To build a meeting bot. you can use the Zoom Linux SDK, Zoom Windows SDK, or Zoom Mac SDK. It is recommended to use the Linux SDK for the bot use case.
After you pick an SDK type to use, do the following steps:
Spin up a server. We recommend AWS, GCP, or Digital Ocean.
Use the Zoom SDK to launch an instance of the Zoom client.
Once you have the Zoom SDK launched, and use the Raw Data functionality to extract the video and audio streams.
This will return the video in I420 raw frames and audio in PCM 16LE raw format, so you’ll need to encode the audio and video yourself afterwards.
Once you have one instance of this working, you’ll need to scale this across several servers if you want to run multiple bots simultaneously, which is required to have bots for multiple meetings.
Another option is Recall.ai. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots.
It also works across Google Meet, Microsoft Teams, and other meeting platforms.