I’ve been using the raw audio and video that is provided by the functions
onRawDataStatusChanged. Ultimately, I create an MP4 from these streams. After several months of complex coding, it seems to be emerging that there is an inconsistent time difference between the audio and video buffers received.
As soon as the audio buffers arrive, the video buffers do not arrive around the same time. Instead,
onMixedAudioRawDataReceived will end up getting called around 200 times initially before any frames arrive at onRawDataStatusChanged. Once the video frames arrive, the audio and video do appear to be in sync. However, those first 200 audio buffers equate to 10 millseconds a piece and that is about 2 seconds. What isn’t clear is why there are so many audio buffers being received with no video frames for that initial startup period. The result of this is that the audio and video get out of sync. The number 200 isn’t a fixed amount. It can vary but is around 200.
One solution I tried was to ignore all audio and video buffers until 200 audio buffers have been received and only then do I start to use them. This does seem to help somewhat. The audio and video are still a little out of sync but they are at least consistently out-of-sync by the same amount of time. Technically, the next solution would be to just compensate for this difference by a fixed amount.
But this all seems rather iffy. What the API really needs to provide are real-time presentation times for both audio and video buffers. I am referring to the microseconds or nanoseconds when each buffer or frame is recorded. These presentation times need not originate from the source video from where they are being sent. The presentation times could be determined upon arrival internally and just passed on to the
As it stands, there is no real way of guaranteeing how the audio will sync with the video making recording videos extremely painful.
I would appreciate some deeper insight as to what Zoom considers the proper solution here.