Audio-Video Sync Issues with Raw Data from Zoom SDK on Linux

Hey @Venkat_Koushik, you could be seeing A/V drift and speed swings for several reasons:

  • Early frames are using wall time or callback order instead of SDK timestamps
  • Video pacing isn’t tied to a master clock, so the first seconds can run ~1.5× before settling
  • PTS origin varies per stream, so audio and video start at different zeros and then drift

There are a few things you could try to sync A/V better though:

  • Align all timings from SDK timestamps, not arrival time or system clock, using AudioRawData::GetTimeStamp and the Linux raw data callbacks
  • Buffer ~150–300 ms per stream as a jitter buffer, pick audio as master, normalize each track to t=0, and pace the video by drop/dup to your target FPS
  • If capture rate drifts, resample audio to the master clock and keep video aligned; Zoom’s guidance is to timestamp each audio and video frame and ensure they are played back in sync

On how to auto-start recording for internal or external meetings, you can get a meeting’s join token for local recording to have your bot automatically start recording after it enters the call; this generally works for meetings owned by the authenticated user/app.

If you’d rather not build and maintain the buffering and sync layer, teams often use Recall.ai’s meeting bot API to pull real-time Zoom audio, video, and transcripts and offload multi-participant timing and layout orchestration