Making live translation app

I am planning to build an live translation app for zoom meetings. In order to accomplish this, I believe I need to access to live audio streams for each speakers in the meeting and change audio streams to text and translate it. As far as I know, this can be accomplished by using bots by windows meeting sdk or linux meeting sdk. Is there a difference between using windows meeting sdk or linux meeting sdk? If there is a difference, which one is recommended for my translation process?

@woojin7879, the difference is that the Linux meeting SDK is designed to be run in a headless way, which makes it much easier to run a meeting bot on.

The Windows SDK existed way before the Linux SDK, so that’s why you might see posts recommending the Windows SDK. The most recent recommendation is to use the Linux SDK.

If you didn’t want to deal with the complexities of implementing meeting bots yourself, an alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots.

We even made an easy meeting bot starter kit with the Zoom team: Introducing the Meeting bot starter kit with Recall.ai

Let me know if you have any questions!