Live stream audio data format

Hi there,

I’ve been working on getting the live stream data from a Zoom meeting and it works quite well to get the data sent over RTMP to my media server.

But as I’m rather new to the whole streaming thing I’m getting a bit confused as to exactly what data is sent and in which format it is in. I’m only interested in the audio data, so the video data is simply dropped. Now I’d like to know the following:

  1. What is the audio codec? (I believe it’s AAC but just want to make sure)
  2. What is the sample rate? (44.1 kHz?)
  3. How many channels are sent? (I’m guessing only one, where all participants’ audio is in? If that is the case, is there maybe an easy way to figure out who is talking at what time, e.g. with the active speaker API?)
  4. Is there some kind of inherent latency? I notice an offset of a few seconds, but I’m not sure if it’s my setup.

Thanks a lot for a great developer documentation!


1 Like

Hi @son, thanks for that compliment :smiley:

I’m not as familiar with our output audio data as I am the API, so let me know if I can provide any better details.

From what I gather, all this data is available in the metadata of the RTMP stream.

We only stream a combined stream, not individual streams.

Opus is the codec. I’m trying to figure out what the max kbps I can get and optimize my microphone equipment to match.

Hey @txtrumpet,

You can see info about the kbps here:

For additional questions unrelated to the Zoom App Marketplace please reach out to, they will be able to better assist. :slight_smile: