Audio-Video Sync Issues with Raw Data from Zoom SDK on Linux

Venkat_Koushik · September 16, 2025, 4:53pm

Description:

We’re experiencing persistent audio-video synchronization issues when processing raw YUV video and PCM audio data from the Zoom Meeting SDK on Linux. Despite multiple approaches, we cannot achieve proper sync.

Setup:

Linux Ubuntu 22.04, Zoom Meeting SDK
Capturing raw YUV420 video via onVideoRawDataReceived()
Capturing raw PCM audio via onMixedAudioRawDataReceived()
Using FFmpeg for post-processing and composition

Issues:

Variable video speed: Video starts at ~1.5x speed, then settles to 1x
Audio-video drift: Constant lag between audio and video streams
Timestamp inconsistencies: Raw data timestamps appear unreliable

Attempted Solutions:

FFmpeg setpts and atempo filters with various values
GStreamer automatic synchronization with videomixer
Multiple participant layouts (Stack Overflow style: 1 large + 3 small videos)
Forced constant frame rates with fps=30 filter
Used Zoom SDK’s GetTimeStamp() method

References:

Question:
What’s the recommended approach for maintaining proper audio-video sync when processing raw data from multiple participants? Are there specific timing considerations or SDK methods we should use for frame-accurate synchronization?

“We would also like to know how we can start recording automatically without the host’s help, whether the host is from our organization or a different one.”
(Internal meeting or external meetings)

Any guidance would be greatly appreciated!

amanda-recallai · September 17, 2025, 10:04pm

Hey @Venkat_Koushik, you could be seeing A/V drift and speed swings for several reasons:

Early frames are using wall time or callback order instead of SDK timestamps
Video pacing isn’t tied to a master clock, so the first seconds can run ~1.5× before settling
PTS origin varies per stream, so audio and video start at different zeros and then drift

There are a few things you could try to sync A/V better though:

Align all timings from SDK timestamps, not arrival time or system clock, using AudioRawData::GetTimeStamp and the Linux raw data callbacks
Buffer ~150–300 ms per stream as a jitter buffer, pick audio as master, normalize each track to t=0, and pace the video by drop/dup to your target FPS
If capture rate drifts, resample audio to the master clock and keep video aligned; Zoom’s guidance is to timestamp each audio and video frame and ensure they are played back in sync

On how to auto-start recording for internal or external meetings, you can get a meeting’s join token for local recording to have your bot automatically start recording after it enters the call; this generally works for meetings owned by the authenticated user/app.

If you’d rather not build and maintain the buffering and sync layer, teams often use Recall.ai’s meeting bot API to pull real-time Zoom audio, video, and transcripts and offload multi-participant timing and layout orchestration

Topic		Replies	Views
Audio and video inconsistently out-of-sync when receiving raw data Android	6	3966	September 4, 2021
Syncing Send Video and Send Audio in meeting SDK Meeting SDK	10	645	May 29, 2024
How to get millisecond-accurate recording timings from Video SDK? Twilio Migration recording , video-sdk	3	427	March 28, 2024
Raw data collection process Windows	11	1644	May 25, 2023
Is it possible to use external video(YUV)/audio(PCM) raw data instead of camera as the source to input zoom sdk? Meeting SDK	2	719	April 16, 2021

Audio-Video Sync Issues with Raw Data from Zoom SDK on Linux

Related topics