Zoom Linux SDK V5.16.10 - Mapping Raw Streams

Zoom Meeting SDK for Linux V5.16.10

I am accessing raw audio and video streams using StartRawRecording() and receiving the raw data frames as well. I have a couple of questions regarding the mapping of Audio and Video data.

  1. Does Zoom provide a complete recording of all the users in the meeting? (Using Local Recording? )
  2. If not, Can we map the raw audio and video streams that we are getting from the SDK?

@striver.strikes could you elaborate what you meant by “Complete recording”?

By complete recording, I mean audio and video recordings of all the participants in the meeting.
Just like in raw recording where we get the video and audio streams for all the participants, I want to access the recordings of all the participants after the meeting has ended without managing the individual user streams.

@striver.strikes ,

For Audio, you can choose to either get the

  • mixed channel (everyone speaking), or
  • individual channel (each user’s audio).

For Video, you can only get

  • individual channel (each user’s video)

In summary for audio, you can just get mixed channel, but for video, you will need to get every user’s video one by one.

For

Individual Audio

  • There is a userID associated with individual channel

Individual Video

  • You will need to subscribe to the user’s video by userID

I am getting individual streams for both audio and video for each user. The issue I face is mapping audio and video on my end. The timestamps for both renderers might vary. For example: My camera might be ON but the mic is OFF so I’m getting the video stream but not the audio stream and vice versa.
Preparing a final video for each user becomes difficult when this happens.
So my question to you is:

  1. How can we map audio stream + video stream for each user such that they fit in perfectly (A final MP4 should be generated at the end where audio sits over video and if the screen is OFF we get a black screen)
  2. Again, does Zoom provide a recording of all participants post-meeting, is there a way to do so?

@striver.strikes

  1. How can we map audio stream + video stream for each user such that they fit in perfectly (A final MP4 should be generated at the end where audio sits over video and if the screen is OFF we get a black screen)

I would propose using some libraries such as FFMPEG which would by default use black frames when there is no raw video received. The same goes for the audio track in the same MP4 file which FFMPEG will output. When there is no raw audio received, there will be silence.

  1. Again, does Zoom provide a recording of all participants post-meeting, is there a way
    to do so?

Without using raw stream, you can use the Video SDK Cloud Recording feature.

@striver.strikes, we use the Linux SDK extensively to power our meeting bots, and this is something we’ve encountered before.

It is pretty complicated to stitch together a composite video from the separate video and audio streams provided by the SDK. @chunsiong.zoom is correct that you can use a library such as FFMPEG to produce a composite video. The high level approach for video would be to:

  1. Keep track of when each participant’s stream starts and stops.
  2. Use a library such as FFMPEG to pad each individual stream to be the length of the full meeting
  3. Use a library such as FFMPEG to composite the individual streams into a combined view showing everyone’s video, and screen share if enabled.

For audio, the approach would be similar:

  1. You’d need to keep track of when each person’s audio stream started or stopped
  2. Pad the audio streams to be the same length as the full meeting, which aligns them with the video
  3. Either mix together the separate audio streams into a single audio stream to put into an MP4, or if you’re looking to analyze the audio (e.g. transcription), you can analyze each speaker separately for increased accuracy.

If you want to get a recording of all participants without all this processing, you have 2 options

Option 1

Ask your users to record to the cloud on Zoom, then pull the recording from their Zoom cloud storage using the Zoom cloud API.

Option 2

Use Recall.ai . It’s an API for meeting bots to get the raw audio/video from meetings + output video/audio without you needing to spend months to build, scale and maintain these bots.

Let me know if you have any questions!

I would propose using some libraries such as FFMPEG which would by default use black frames when there is no raw video received. The same goes for the audio track in the same MP4 file which FFMPEG will output. When there is no raw audio received, there will be silence.

FFMPEG library requires some info for processing data(raw audio and raw video) like bit rate, frame rate, etc

Do you know how can we convert char* data that we receive OnDataReceived fn to mp3 and mp4 formats using ffmpeg library

@nandakishor2010608 the format is 16bit PCM, mono channel, 32000hz sample.

The use of ffmpeg is beyond the support scope of this forum.

What is the format of video raw data and what is the difference between YUVRawDataI420->GetBuffer() and YUVRawDataI420->GetYBuffer(),u and v

is buffer same as ybuffer+ubuffer+vbuffer
?

Yes @nandakishor2010608 , you’re absolutely correct.

GetBuffer() just gets the ybuffer, ubuffer, and vbuffer concatenated together.

GetYBuffer(), GetUBuffer(), and GetVBuffer() provides the corresponding buffers individually.

Let me know if there’s anything else I can do to help!