Audio and video inconsistently out-of-sync when receiving raw data

I’ve been using the raw audio and video that is provided by the functions onMixedAudioRawDataReceived and onRawDataStatusChanged. Ultimately, I create an MP4 from these streams. After several months of complex coding, it seems to be emerging that there is an inconsistent time difference between the audio and video buffers received.

As soon as the audio buffers arrive, the video buffers do not arrive around the same time. Instead, onMixedAudioRawDataReceived will end up getting called around 200 times initially before any frames arrive at onRawDataStatusChanged. Once the video frames arrive, the audio and video do appear to be in sync. However, those first 200 audio buffers equate to 10 millseconds a piece and that is about 2 seconds. What isn’t clear is why there are so many audio buffers being received with no video frames for that initial startup period. The result of this is that the audio and video get out of sync. The number 200 isn’t a fixed amount. It can vary but is around 200.

One solution I tried was to ignore all audio and video buffers until 200 audio buffers have been received and only then do I start to use them. This does seem to help somewhat. The audio and video are still a little out of sync but they are at least consistently out-of-sync by the same amount of time. Technically, the next solution would be to just compensate for this difference by a fixed amount.

But this all seems rather iffy. What the API really needs to provide are real-time presentation times for both audio and video buffers. I am referring to the microseconds or nanoseconds when each buffer or frame is recorded. These presentation times need not originate from the source video from where they are being sent. The presentation times could be determined upon arrival internally and just passed on to the onMixedAudioRawDataReceivedand onRawDataStatusChanged functions.

As it stands, there is no real way of guaranteeing how the audio will sync with the video making recording videos extremely painful.

I would appreciate some deeper insight as to what Zoom considers the proper solution here.

Hi @AndroidDev,

Sorry to hear you are running into issues with the audio sampling being out of sync with video frames. There are a lot of different factors here which could be causing this, so let’s try to narrow it down a bit with some more information.

First, you mention that you are creating an MP4 file from these streams. This immediately stands out as a potential cause for a couple of issues. How are you creating this file and ensuring the correct timing of the audio/video samples/frames?

Thanks!

The problem comes and goes, so let me do some more testing and I’ll get back to you if I can’t resolve the issue. The presentation times are generated by my code using a running stop watch based on the system time. The start time begins with the arrival of the first audio buffer. Presentation times for both audio buffers and video frames are relative to this start time.

Hi @AndroidDev,

When you are doing the file I/O to write the raw data to an MP4, is this being done within the raw data callbacks, or are you doing it on a separate thread? If it is being done in the callback directly, I strongly suspect that this is the cause of the issue. Excessive operations being performed directly in the the thread in which the callbacks are executed can delay subsequent callbacks from being sent, which could cause the behavior you are seeing when combined with the fact that you are using the current system time of the callback to determine presentation time.

Thanks!

Processing of raw data is done on a separate thread. I managed to figure out the problem. The coding was entirely correct. I actually have two modes of recording video sessions to mp4. The “Live” mode encodes the mp4 on-the-fly as stream data is being received. The other is what I call “cached” mode. In cached mode, I don’t encode the streams on-the-fly but rather store the received audio/video buffers to storage and only after a session ends do I encode them into mp4. Cached recordings are needed for development work to avoid having to debug code while a Zoom session is actually in progress. My mp4s are also fairly complex and not just muxing audio and video. I inject an intro screen and later on will add other pre and post elements into the video - so having cached buffers makes debugging easier.

The live recording always worked while the issue was with the cached recording even though both use the same modules to create the mp4. The only difference is that for live recordings the encoding system is fed the buffers when they arrive while the cached recording feeds them in from storage. However, because I have the entire recording in storage, I was taking advantage of feeding the buffers from cache into the encoding sytem at a much higher rate than what the live stream would feed them in at. This allowed me to debug the code at a significantly faster rate without having to encode the mp4 at the same rate as a live recording. No need to wait one minute to encode a vide when it could be done in less than 30 seconds saving me time. I was careful however to make sure not to feed the encoder with buffer data at a rate that was too fast. But it turned out that there was one area in my code where certain non-buffered images were being fed into the encoder at a faster rate than what the encoder could handle.

The result of feeding the encoder data faster than it can handle will lead to very bad side effects such as out-of-sync audio/video, slow video, very long delays and all kinds of things. An encoder that is fed data too fast will not crash but just not operate correctly. This took me a long time to determine as I was certain the problem had something to do with my own code. Once I fixed the bug and prevented the encoder from being fed too quickly, the problem disappeared.

I’m using Android’s native MediaCodec and MediaMuxer. So if anyone is reading this post and has similar problems, they may want to check the rate at which they are feeding their encoder. One last thing is that you cannot use Android’s Log.i to display stuff in the log console during the buffer receiving events. The Log.i is significantly slower than the rate at which buffers are received and this will mess up the encoding as well. For this reason, I created a separate logger that runs on its own thread and just accumulates the log messages and displays them at its own rate.

That’s great that you were able to resolve this!

This was certainly an interesting read, thank you for providing context in case other developers run into similar issues. :blue_heart:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.