How to subscribe to multiple user audio video in headless sdk that to in scync

in the above app how could i subscribe to multiple people audio and video and get a final output as i get in normal zoom app recording

Hey @swatantra12singh ,

We use the Linux SDK to run meeting bots ourselves and encountered the same question - happy to provide some guidance here!

Option 1: Subscribing to participant video/audio streams + mixing the video

Subscribing to participant video/audio streams

To subscribe to each participant’s video, you can follow these steps:

  1. Implement an instance of the IZoomVideoSDKRawDataPipeDelegate.
  2. Use callback functions provided by the IZoomVideoSDKRawDataPipeDelegate to receive each frame of the raw video data.
  3. Pass the delegate into the video pipe of a specific user.

Zoom provides an example of this in the Linux SDK guide for receiving raw video (here):

ZoomVideoSDKRawDataPipeDelegate *dataDelegate = new ZoomVideoSDKRawDataPipeDelegate();

void ZoomVideoSDKRawDataPipeDelegate::onRawDataFrameReceived(YUVRawDataI420 *data){

void ZoomVideoSDKRawDataPipeDelegate::onRawDataStatusChanged(RawDataStatus status){

ZoomVideoSDKRawDataPipe pipe = user.getVideoPipe();
pipe.subscribe(ZoomVideoSDKVideoResolution.VideoResolution_360P, dataDelegate);

For raw audio, you can either get mixed or participant-separated audio.

In both cases, you will need to:

  1. Access IZoomVideoSDKAudioHelper using getAudioHelper.
  2. Access startAudio using IZoomVideoSDKAudioHelper.
  3. Subscribe to audio using IZoomVideoSDKAudioHelper.
  4. Listen for the following callbacks in your listener.

In the demo app you linked, the StartRawRecording() method is a good example of this, and should provide some solid guidance around what the implementation looks like.

I would also recommend viewing the subscribe() method here for an example of how to subscribe to the video/audio streams.

Mixing the video

Once you have the individual video streams, and want to produce, a single recording from them, you need to mix the video.

The specific implementation will vary depending on what you want the output to look like, but in general, you would likely want to consider leveraging something like gstreamer or ffmpeg for this.

If you have specific video layouts in mind, then you might consider using gstreamer similar to this example.

If you want a “speaker view” that shows the current speaker’s video, then you might consider using ffmpeg to cut and mix them into a composite recording using active speaker events.

Option 2:

If you didn’t want to deal with these complexities, an alternative is to use instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video and generate composite video recordings from meetings, without you needing to spend months to build, scale and maintain these bots yourself.

Let me know if you have any questions!

@amanda-recallai How can we make a pipeline in GStreamer to merge both audio and video? There are a couple of issues while merging them even if we create an audio and video map.

  1. Delay in audio and video streams.
  2. Frame rate inconsistencies.

The combined video doesn’t sync up properly. I have used ffmpeg initially but now thinking of shifting to GStreamer. Can we manage these issues in Gstreamer if we compile audio and video data in real-time? Can you share some code samples as well?

thanks @amanda-recallai it was of great help