Syncing Send Video and Send Audio in meeting SDK

I’m trying to create a meeting bot that streams a video file along with audio into a Zoom meeting. I was able to get the code here working to send both audio and video GitHub - zoom/meetingsdk-linux-raw-recording-sample

But after a few seconds, the video slows down and goes out of sync with the audio. How can I ensure that both remain in sync?

The video is playing at 30fps and I’m adding sleep to ensure this frame rate.

void PlayVideoFileToVirtualCamera(IZoomSDKVideoSender* video_sender, const std::string& video_source)
{
    char* frameBuffer;
    int frameLen = height / 2 * 3 * width;
    frameBuffer = (char*)malloc(frameLen);

    const int fps = 30;
    const std::chrono::microseconds frameDuration(1000000 / fps); // 1 second = 1,000,000 microseconds

   

    // execute in a thread.
    while (video_play_flag > 0 && video_sender) {
        Mat frame;
        VideoCapture cap;
        cap.open(video_source);
        if (!cap.isOpened()) {
            cerr << "ERROR! Unable to open camera\n";
            video_play_flag = 0;
            break;
        }
        else {
            //--- GRAB AND WRITE LOOP
            std::cout << "Start grabbing" << endl;
            while (video_play_flag > 0)
            {
                
                auto start = std::chrono::high_resolution_clock::now();

                // wait for a new frame from camera and store it into 'frame'
                cap.read(frame);
                // check if we succeeded
                if (frame.empty()) {
                    cerr << "ERROR! blank frame grabbed\n";
                    break;
                }
                Mat resizedFrame;
                resize(frame, resizedFrame, Size(width, height), 0, 0, INTER_LINEAR);

                //covert Mat to YUV buffer
                Mat yuv;
                cv::cvtColor(resizedFrame, yuv, COLOR_BGRA2YUV_I420);
                char* p;
                for (int i = 0; i < height / 2 * 3; ++i) {
                    p = yuv.ptr<char>(i);
                    for (int j = 0; j < width; ++j) {
                        frameBuffer[i * width + j] = p[j];
                    }
                }
                SDKError err = ((IZoomSDKVideoSender*)video_sender)->sendVideoFrame(frameBuffer, width, height, frameLen, 0);
                if (err != SDKERR_SUCCESS) {
                    std::cout << "sendVideoFrame failed: Error " << err << endl;
                }
                
                auto end = std::chrono::high_resolution_clock::now();
                auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);

                if (elapsed < frameDuration) {
                            std::this_thread::sleep_for(frameDuration - elapsed);
                            std::cout << "frame sleep " << endl;
                }
            }
            cap.release();
        }
    }
    video_play_flag = -1;
}

The audio is being played while being processed as chunks:

void PlayAudioFileToVirtualMic(IZoomSDKAudioRawDataSender* audio_sender, string audio_source)
{
	printf("PLAYAUDIOFILETOVIRTUALMIC invoked!");

	int chunkSize = 4096; // 4096 bytes per chunk
	int sampleRate = 44100; // samples per second
	int bytesPerSample = 2; // 16-bit audio

	
	// execute in a thread.
	while (audio_play_flag > 0 && audio_sender) {

		// Check if the file exists
		ifstream file(audio_source, ios::binary | ios::ate);
		if (!file.is_open()) {
			std::cout << "Error: File not found. Tried to open " << audio_source << std::endl;
			return;
		}

		// Get the file size
		int file_size = file.tellg();
		file.seekg(0, ios::beg);

		vector<char> buffer(chunkSize);
		while (file.read(buffer.data(), buffer.size()) || file.gcount() > 0) {
			size_t bytesRead = file.gcount();
			SDKError err = audio_sender->send(buffer.data(), bytesRead, 44100);
			if (err != SDKERR_SUCCESS) {
				cout << "Error: Failed to send audio data to virtual mic. Error code: " << err << endl;
				break;
			}
			std::this_thread::sleep_for(std::chrono::milliseconds(chunkSize / (sampleRate * bytesPerSample / 1000)));

		}
		file.close();
		audio_play_flag = -1;
	}
}

Great question, @purnam-admin! Handling synchronized audio and video data is complex. There’s no one-size-fits-all solution. I suggest using a tool like GStreamer to handle the synchronization of video and audio as mentioned in the post linked below. Please refer to these forum posts on this topic.

Thank you @donte.zoom for the quick response and resources shared :pray:

I do see that the context of both of the responses are for the recording implementation and not the sending. I wasn’t able to figure out from them how I can implement it for my use case.

Is there any kind of high level code implementation steps you could share with me as a starting point for sending audio and video streams in sync in meeting SDK using Gstreamer in the context of the sample code or otherwise?

That would help me start moving in the right direction. Thank you so much!

@donte.zoom @chunsiong.zoom @amanda-recallai

You’re welcome, @purnam-admin! Currently, there isn’t a high-level code for that implementation, but I’m working on it and will get back to you. Please feel free to follow up on this thread.

1 Like

Thank you so much, appreciate your support :pray: This component is would solve a big problem for us and if it works out, it’ll be a great win.

Hey @purnam-admin, happy to help!

The approach we would recommend here, as @donte.zoom mentioned, is to use GStreamer as manually implementing audio and video synchronisation can be complex.

To give a specific example of how you’d accomplish this, you’d want a pipeline similar to the following (in textual format, which can be executed on the command line with gst-launch-1.0)

uridecodebin name=dec uri=file:///vid.mp4 ! \
videoconvert ! videoscale ! videorate ! video/x-raw,format=I420,width=1280,height=720,framerate=(fraction)30/1 ! appsink name=vidsink sync=true \
dec. ! audioconvert ! audioresample ! audio/x-raw,format=s16le,rate=16000,channels=1 ! appsink name=audsink sync=true

The uridecodebin demuxes the MP4 into audio and video streams, and we use a videoconvert, videoscale, videorate to convert the video to the format that the Zoom SDK expects. We do the same with the audio, and convert it using audioconvert and audioresample.

We terminate both branches of the pipeline at an appsink which is a GStreamer element that allows you to extract media from the GStreamer pipeline into your application.

In your application, you’d attach to the new-sample signal emitted by the appsink, and call audio_sender->send() or video_sender->sendVideoFrame() when new audio or video media is available.

In this arrangement, the GStreamer pipeline maintains an internal clock and handles the synchronisation of the data reaching the appsinks, which drives the callbacks to the audio sender and video sender.

Alternate Solution

If you don’t want to deal with managing all of this yourself, an alternative is to use the Recall.ai API for your meeting bots instead.

It’s a simple 3rd party API that lets you use meeting bots to send raw audio/video into meetings without you needing to spend months to build, scale and maintain these bots.

Let me know if you have any questions!

1 Like

To ensure that both the video and audio remain in sync during your Zoom meeting bot’s operation, you need to synchronize their playback. Here are some suggestions to achieve synchronization:

  1. Match Frame Rate and Audio Sampling Rate: Ensure that the frame rate of your video matches the audio sampling rate. In your case, the video is playing at 30fps, and the audio is sampled at 44100Hz. These rates should ideally match for smooth synchronization.
  2. Adjust Sleep Duration Dynamically: Instead of relying on a fixed sleep duration to control frame rate, calculate the actual time taken to process each frame and adjust the sleep duration dynamically to maintain synchronization.
  3. Implement Audio-Video Sync Mechanism: Create a mechanism to synchronize the audio and video streams. You can achieve this by timestamping each audio and video frame and ensuring that they are played back in sync. For example, you can timestamp each audio chunk and match it with the corresponding video frame.
  4. Consider Using Multimedia Frameworks: Instead of manually handling audio and video playback, consider using multimedia frameworks or libraries that provide synchronization features out of the box. These frameworks often handle synchronization intricacies more efficiently.
  5. Monitor and Fine-Tune: Continuously monitor the synchronization during playback and fine-tune your synchronization mechanism as needed. It may require experimentation and adjustments to achieve optimal synchronization.

As for linking your site {vlgnwapk, you can include it as a reference for further information on video editing tools and resources.
For more information on video editing tools and resources,

By implementing these strategies and fine-tuning your synchronization mechanism, you can ensure that both the audio and video streams remain in sync throughout the Zoom meeting.

Thank you so much Amanda, using your pointers I was able to implement it and get it to sync together and the video and audio are streaming well together!

I just had a minor question. Is the resolution of Zoom sendvideo capped at 480p? Because if I try sending anything larger than 480p, the video look zoomed in and cropped. I have adjusted the height and width parameters

Height and width set to 640 x480 with source video also same resolution

Height and width set to 1280 x720 with source video also same resolution

Figured it out. I needed to upgrade my Zoom account by writing to the support team to enable group HD and also use setting service video settings context to enable HD video.

720p streaming working now.

Glad to hear you are up and running @purnam-admin ! Here is our documentation on sending 720p video.