We are planning a zoom meeting in 25 languages.
We have an existing platform for live translation (internal product built on top of Janus WebRTC) which allow to have multiple translator per language (which we cannot do in zoom) and also integrate with other services.
We would like to replace the translator audio sound stream programmatically with the sound coming from our platform.
schema for clarity: https://i.imgur.com/qcsAd6d.png
Which API if any could allow this schema?
For what you have mentioned, it is possible, but latency will be an issue.
Here’s one possible way of doing it.
- Use a linux meeting SDK or windows meeting SDK to retrieve raw audio (or the source audio which you have shared in the architecture diagram) in PCM format
- Feed the raw audio stream into a translation service, or translation team
- Use web sockets or other low latency real time streaming protocol to stream translated text / translated audio to the end users.
As there are additional roundtrip time, there will be expected latency in the translated audio / text received in part 3.
@yasha, to expand on the first step in Chun Siong’s response, to access the real-time audio from the meeting with low latency, the most common way is to build a meeting bot.
The meeting bot would run on the linux meeting SDK or windows meeting SDK as Chun Siong mentioned. Here are the steps to build the meeting bot:
- Spin up a server. We recommend AWS, GCP, or Digital Ocean.
- Use either the Windows or Linux Zoom SDK to launch an instance of the Zoom client.
- Once you have the Zoom SDK launched, and use the Raw Data functionality to extract the video and audio streams.
- This will return the video in I420 raw frames and audio in PCM 16LE raw format, so you’ll need to encode the audio and video yourself afterwards.
- Once you have one instance of this working, you’ll need to scale this across several servers if you want to run multiple bots simultaneously, which is required to have bots for multiple meetings.
Finally, another option is Recall.ai. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots.
Let me know if you have any questions!