Video Stream is 2-3x faster then the audio stream

gofmannir · March 13, 2024, 9:56am

Hi,
So I’m using the Linux SDK to record the host’s video & audio stream.
The SDK successfully creates a yuv & pcm files.

I’m trying to deal with 2 different situations (maybe related to each other) which I would like to get some help:

For example, The sdk creates a .yuv file for the video, After taking the file and running ffmpeg command to convert it to mp4, it look like the video playing very fast.
Here is the command I’m running. In order to deal with it I played with the -framerate attribute in the command but I believe it’s not really solves the situation.
Usually used -framerate 25

ffmpeg -y -f rawvideo -pix_fmt yuv420p -video_size {video_file_width}*{video_file_height} -framerate 17 -i temp-{video_file} -f mp4 video.mp4

The bot subscribes to the stream this way:

videoHelper1->setRawDataResolution(ZoomSDKResolution_720P);
videoHelper1->subscribe(getUserObj(i)->GetUserID(), RAW_DATA_TYPE_VIDEO);

The function that creates the yuv file:

void ZoomSDKRenderer::SaveToRawYUVFile(YUVRawDataI420* data) {

	// Open the file for writing

    std::string filename = "video_output-";
    int number = data->GetSourceID();
    int width = data->GetStreamWidth();
    int height = data->GetStreamHeight();

    // Convert int to string and concatenate
    filename += std::to_string(number);
    filename += "---";
    filename += std::to_string(width);
    filename += "---";
    filename += std::to_string(height);
	filename += ".yuv";
	const char* file = filename.c_str();

	std::ofstream outputFile(file, std::ios::out | std::ios::binary | std::ios::app);
	if (!outputFile.is_open())
	{
		std::cout << "Error opening file." << std::endl;
		return;
	}
	// Calculate the sizes for Y, U, and V components
	size_t ySize = data->GetStreamWidth() * data->GetStreamHeight();
	size_t uvSize = ySize / 4;

	// Write Y, U, and V components to the output file
	outputFile.write(data->GetYBuffer(), ySize);
	outputFile.write(data->GetUBuffer(), uvSize);
	outputFile.write(data->GetVBuffer(), uvSize);

	// Close the file
	outputFile.close();
	outputFile.flush();
}

Is there any explanation what can cause that?

The audio and the video not aligned. What I’m doing is this flow:

Make sure there are yuv & pcm files. and the host’s camera & mic are open.
Delete both files at the same time.
The sdk creates the files immediately.
After 10 seconds I’m coping the files to ‘temp’ yuv & pcm.
running ffmpeg command to combine the video and the audio this way:

ffmpeg -y -f rawvideo -pix_fmt yuv420p -video_size {video_file_width}*{video_file_height} -framerate 17 -i temp-{video_file} -f mp4 video.mp4
ffmpeg -y -i video.mp4 -f s16le -ar 32000 -ac 1 -i temp-{audio_file} -c:v copy -c:a aac -strict experimental final.mp4

The result ‘final.mp4’ file is’nt perfect because the audio not 100% percent aligned with the video. When the person speaks so there is a delay with the person’s lips and moves.

What I’m missing here?

CC: @chunsiong.zoom I’m very appreciate your help

Thanks!

chunsiong.zoom · March 13, 2024, 10:07am

@gofmannir muxing is probably out of scope for this developer forum.

One way to solve this is to use ffmpeg or gstreamer in code level to first encode the yuv frames into mkv or mp4.

Thereafter muxing the audio and video together should be in sync.

gofmannir · March 13, 2024, 10:18am

What about that the stream yuv file is very fast relatively ?

chunsiong.zoom · March 13, 2024, 10:23am

@gofmannir if you use gstreamer or ffmpeg at code level to encode the frame each time you receive the callback, it will have the same length as the wav file.

gofmannir · March 13, 2024, 11:59am

Currently I’m not encoding the frame each callback, but each ~10 seconds interval.
the process in separated for video and audio.
The video output file is running 2x time then real time which is weird, what is the frequency that the callback called? what FPS? framerate?

Thanks.

chunsiong.zoom · March 13, 2024, 1:01pm

@gofmannir and you are using command line to encode the video and audio every 10 seconds? That’s likely the issue.

The solution is to encode it at runtime.

gofmannir · March 13, 2024, 1:18pm

Why is it matters?
If I’m taking the yuv file after 10 seconds (let’s leave muxing aside), and after 10 secs I’m converting it to mp4, I’m getting a very fast video.
Are you saying this approach causing the video to be fast?

amanda-recallai · March 15, 2024, 10:27pm

Hey @gofmannir!

Why is it matters?
If I’m taking the yuv file after 10 seconds (let’s leave muxing aside), and after 10 secs I’m converting it to mp4, I’m getting a very fast video.
Are you saying this approach causing the video to be fast?

When the video is running fast, this is because the frame rate you’re specifying to ffmpeg is too high.

ffmpeg receives the input frames but needs to know how long to show each frame for. In the case where your video is too fast, this means that the frame rate is too high and you should lower it accordingly.

In general, you shouldn’t use a fixed frame rate when converting the video from the Zoom SDK. The reason for this is that the frame rate can actually vary. For instance, if the network connection is bad or experiences a disruption, you could actually get a lower frame rate or drop frames.

We recommend using something like gstreamer to encode the video in real-time. This will also solve the issue you’re seeing around audio and video becoming desynchronized. When you encode the audio and video simultaneously, this will keep them in sync regardless of if you have a gap in the video due to your network, or any other reason.

Let me know if this helps and if you have other questions here!

Another alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots. We’ve encountered all of the same issues you’ve experienced and developed a service that allows you to abstract away the complexities and implementation details of meeting bots so that you can focus on building your core product features.

gofmannir · April 1, 2024, 6:34am

@amanda-recallai Can you please share some example how to encode the frames in real-time in the callback?

benren · April 26, 2024, 1:28pm

@gofmannir did you have any luck with encoding frames in realtime? would you mind sharing your learnings? thanks

Topic		Replies	Views
Working with the Zoom RAW video data .yuv file Linux recording , video-sdk	6	686	April 17, 2024
Raw Video Sending Getting Cropped and Frame Drops/Slow Rate Meeting SDK	0	133	April 20, 2024
Lengths of recorded radio and video are not the same, and out of sync while merging them together Linux recording , video-sdk	5	497	May 6, 2024
Is there a way to decrease video artefacts when streaming the raw video data? Meeting SDK live-streaming	3	233	February 21, 2024
Render recordings from Video raw data Meeting SDK	16	3080	May 13, 2022

Video Stream is 2-3x faster then the audio stream

Related Topics