Audio/video re-sync issues with RTMS when toggling camera off then on (trying to livestream meeting)

8fiftydigital · October 28, 2025, 5:17pm

We are ingesting the rtms data into an express js app, then having it flow through ffmep and into AWS IVS (Amazon’s Video Service) where a subscriber to the video can watch on a browser. So basically re-broadcasting the meeting as close to real time as possible. We used the rtms to youtube sample as a guide to get going.

We are struggling to get syncing of audio and video working once the video is turned off and then back on. We are injecting black video frames and have tried a few different ways to get this to work from having ffmpeg do it to having javascript realign the video and audio streams before sending to ffmpeg. Even tried implementing some drift logic to try and keep it in sync but to no avail.

When the video is turned back on the video and audio end up being off by multiple seconds sometimes.
The only solution that seems to be working is restarting ffmpeg when the video gets turned back on but that causes a disruption in the stream. Any help much appreciated.

MaxM · October 28, 2025, 5:33pm

Hey @8fiftydigital,

Thank you for reaching out with details on the issue you’re encountering. Just to make sure that I understand, are you saying that the timestamps themselves are a couple of seconds off or the timestamps are correct but you haven’t found a method to mux the audio and video streams together?

Thanks,

Max

8fiftydigital · October 28, 2025, 5:41pm

thanks for the quick response!
we haven’t found a working method to mux the audio and video streams. Specifically when the camera gets toggled off and on.

MaxM · October 28, 2025, 6:09pm

Got it, thanks for the clarification! I’ll go ahead and test this out today to see if I can replicate the issue. Even if I don’t see the same issue, this will give me an opportunity to experiment with mux’ing and advise on the best method. I think you’re on the right track using FFMPEG.

A couple quick questions to help me understand your setup:

How are you feeding frames to FFmpeg? Are you piping raw frames via stdin or using a different method?
Are you setting explicit presentation timestamps (PTS)? The key to keeping A/V in sync when video drops out is ensuring both streams maintain a continuous timeline with proper PTS values.
What’s your FFmpeg command/config? Specifically interested in the -vsync, -async, and any timestamp-related flags you’re using.

One thing that’s critical with intermittent video - when you inject black frames during camera-off periods, those frames need to have the correct PTS that aligns with the audio timeline. If the timestamps aren’t properly synchronized to a common clock, you’ll see that drift.

Will dig in on my end and follow up soon!

8fiftydigital · October 28, 2025, 6:29pm

1. How are you feeding frames to FFmpeg?

Using pipes.

Video input: pipe:3 (stdio[3])
Audio input: pipe:4 (stdio[4])

2. Are you setting explicit presentation timestamps (PTS)?

PTS is adjusted via filters.

Tracking:

        if (msg.msg_type === MessageType.MEDIA_DATA_AUDIO && msg.content?.data) {
            const buffer = Buffer.from(msg.content.data, 'base64');
            const timestamp = msg.content.timestamp;

            // Track base timestamp for synchronization
            if (this._baseAudioTimestamp === undefined) {
                this._baseAudioTimestamp = timestamp;
                logger.debug(`${logPrefix} First audio packet timestamp: ${timestamp}`);
            }

            this._lastAudioTimestamp = timestamp;
            this._ffmpegHandler.processAudioPacket({ data: buffer, timestamp });

            return;
        }

        if (msg.msg_type === MessageType.MEDIA_DATA_VIDEO && msg.content?.data) {
            const buffer = Buffer.from(msg.content.data, 'base64');
            const timestamp = msg.content.timestamp;

            // Track base timestamp for synchronization
            if (this._baseVideoTimestamp === undefined) {
                this._baseVideoTimestamp = timestamp;
                logger.debug(`${logPrefix} First video packet timestamp: ${timestamp}`);
            }

            this._lastVideoTimestamp = timestamp;

Offset calculation and PTS adjustment:

    private configureOffset(offsetSeconds: number): { videoFilter: string; audioFilter: string } {
        let videoFilter =
            'scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1';
        let audioFilter = 'aresample=48000,aformat=channel_layouts=stereo';

        /*
         * offsetSeconds is the difference in seconds between the video and audio arrival times.
         * If offsetSeconds is positive, the audio arrived before the video.
         * If offsetSeconds is negative, the video arrived before the audio.
         * If offsetSeconds is zero, the video and audio arrived at the same time.
         *
         * We buffer packets from whichever arrives first. When we flush the buffer,
         * the early-arriving stream gets a "head start" in PTS timeline.
         * We need to shift the late-arriving stream's PTS forward to catch up.
         */

        if (offsetSeconds > 0) {
            // Shift video PTS forward by offsetSeconds to match audio
            videoFilter = `setpts=PTS+${offsetSeconds}/TB,${videoFilter}`;
            audioFilter = `asetpts=PTS,${audioFilter}`;

            logger.debug(`${logPrefix} Configuring offset: Video PTS shifted forward by ${offsetSeconds.toFixed(3)}s`);
        } else if (offsetSeconds < 0) {
            // Shift audio PTS forward by abs(offsetSeconds) to match video
            videoFilter = `setpts=PTS,${videoFilter}`;
            audioFilter = `asetpts=PTS+${Math.abs(offsetSeconds)}/TB,${audioFilter}`;
            logger.debug(
                `${logPrefix} Configuring offset: Audio PTS shifted forward by ${Math.abs(offsetSeconds).toFixed(3)}s`,
            );
        } else {
            // Reset both to start at 0
            videoFilter = `setpts=PTS-STARTPTS,${videoFilter}`;
            audioFilter = `asetpts=PTS-STARTPTS,${audioFilter}`;
            logger.debug(`${logPrefix} Configuring offset: No offset (streams already synchronized)`);
        }

        return { videoFilter, audioFilter };
    }

3. FFmpeg command/config

Built here:

    public configureStream(offsetSeconds: number): void {
        const { videoFilter, audioFilter } = this.configureOffset(offsetSeconds);
        const processFlags = ['-filter_complex', `[0:v]${videoFilter}[vout];[1:a]${audioFilter}[aout]`];

        const ffmpegCommand = [
            ...this.videoInputFlags,
            ...this.audioInputFlags,
            ...processFlags,
            ...this.mapFlags,
            ...this.videoEncodeFlags,
            ...this.audioEncodeFlags,
            ...this.syncFlags,
            ...this.whipOutputFlags,
        ];

        this.spawnProcess(ffmpegCommand);
    }

Sync flags:

    private get syncFlags(): string[] {
        return ['-async', '1', '-fps_mode', 'cfr'];
    }

Here is my ffmpeg.ts file contents

import type { ChildProcess, StdioOptions } from 'node:child_process';
import { spawn } from 'node:child_process';
import type internal from 'node:stream';
import logger from '~/lib/useConsoleLogger';
import fs from 'node:fs';
import path from 'node:path';

type BufferedPacket = {
    data: Buffer;
    timestamp: number;
};

const logPrefix = 'FfmpegHandler:';

const resourcesPath = path.join(import.meta.dirname, '..', '..', '..', 'resources');
const blackFramesLoopPath = path.join(resourcesPath, 'black_frames_loop.h264');
const blackFramesLoop = fs.readFileSync(blackFramesLoopPath);

export class FfmpegHandler {
    private _token: string;
    private _ffmpeg?: ChildProcess;
    private _videoStream?: internal.Writable;
    private _audioStream?: internal.Writable;
    private _videoBuffer: BufferedPacket[] = [];
    private _audioBuffer: BufferedPacket[] = [];
    private _blackFrameInterval?: NodeJS.Timeout;
    private _shouldInjectBlackFrames = true;

constructor(token: string) {
    this._token = token;
}

private get videoInputFlags(): string[] {
    return ['-framerate', '25', '-f', 'h264', '-i', 'pipe:3'];
}

private get videoEncodeFlags(): string[] {
    return [
        '-c:v',
        'libx264',
        '-profile:v',
        'baseline',
        '-preset',
        'veryfast',
        '-tune',
        'zerolatency',
        '-g',
        '50', // 25 fps * 2 sec
        '-keyint_min',
        '50',
        '-sc_threshold',
        '0',
        '-b:v',
        '3000k', // 720p bitrate
        '-maxrate',
        '3000k',
        '-bufsize',
        '6000k',
        '-force_key_frames',
        'expr:gte(t-prev_forced_t,2)', // Force I-frame every 2 seconds
    ];
}

private get audioInputFlags(): string[] {
    return ['-f', 's16le', '-ar', '16000', '-ac', '1', '-i', 'pipe:4'];
}

private get audioEncodeFlags(): string[] {
    return ['-c:a', 'libopus', '-b:a', '128k', '-ar', '48000', '-ac', '2'];
}

private get mapFlags(): string[] {
    return ['-map', '[vout]', '-map', '[aout]'];
}

private get syncFlags(): string[] {
    return ['-async', '1', '-fps_mode', 'cfr'];
}

private get whipOutputFlags(): string[] {
    return ['-f', 'whip', '-authorization', this._token, 'https://global.whip.live-video.net'];
}

private get stdioOptions(): StdioOptions {
    return ['ignore', 'inherit', 'inherit', 'pipe', 'pipe'];
}

private flushBuffers(): void {
    logger.debug(
        `${logPrefix} Flushing buffered packets (audio: ${this._audioBuffer.length}, video: ${this._videoBuffer.length})`,
    );

    for (const packet of this._audioBuffer) {
        if (this._audioStream) {
            this._audioStream.write(packet.data);
        }
    }

    for (const packet of this._videoBuffer) {
        if (this._videoStream) {
            this._videoStream.write(packet.data);
        }
    }

    this._audioBuffer = [];
    this._videoBuffer = [];
    logger.debug(`${logPrefix} FFmpeg started and buffers flushed successfully`);
}

private spawnProcess(command: string[]): void {
    logger.debug(`${logPrefix} Command spawned: /opt/ffmpeg-whip/bin/ffmpeg ${command.join(' ')}`);

    this._ffmpeg = spawn('/opt/ffmpeg-whip/bin/ffmpeg', command, { stdio: this.stdioOptions });
    this._videoStream = this._ffmpeg.stdio[3] as internal.Writable;
    this._audioStream = this._ffmpeg.stdio[4] as internal.Writable;

    this._ffmpeg.on('spawn', () => {
        logger.info(`${logPrefix} Process spawned`);
        this.flushBuffers();

        // Only inject black frames on initial start, not on restart
        if (this._shouldInjectBlackFrames) {
            this.startBlackFrameInjection();
        }
    });

    this._ffmpeg.on('error', (err) => logger.error(new Error(`${logPrefix} Process error: ${err.message}`)));

    this._ffmpeg.on('exit', (code, signal) => {
        if (code !== 0) {
            logger.error(new Error(`${logPrefix} Process exited with error ${JSON.stringify({ code, signal })}`));
            return;
        }

        logger.info(`${logPrefix} Process exited successfully ${JSON.stringify({ code, signal })}`);
    });

    this._ffmpeg.stdout?.on('data', (data: Buffer) => {
        const message = data.toString().trim();
        if (!message) return;

        logger.debug(`${logPrefix} Process stdout: ${message}`);
    });

    this._ffmpeg.stderr?.on('data', (data: Buffer) => {
        const message = data.toString().trim();
        if (!message) return;

        logger.debug(`${logPrefix} Process stderr: ${message}`);
    });
}

private configureOffset(offsetSeconds: number): { videoFilter: string; audioFilter: string } {
    let videoFilter =
        'scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1';
    let audioFilter = 'aresample=48000,aformat=channel_layouts=stereo';

    /*
     * offsetSeconds is the difference in seconds between the video and audio arrival times.
     * If offsetSeconds is positive, the audio arrived before the video.
     * If offsetSeconds is negative, the video arrived before the audio.
     * If offsetSeconds is zero, the video and audio arrived at the same time.
     *
     * We buffer packets from whichever arrives first. When we flush the buffer,
     * the early-arriving stream gets a "head start" in PTS timeline.
     * We need to shift the late-arriving stream's PTS forward to catch up.
     */

    if (offsetSeconds > 0) {
        // Shift video PTS forward by offsetSeconds to match audio
        videoFilter = `setpts=PTS+${offsetSeconds}/TB,${videoFilter}`;
        audioFilter = `asetpts=PTS,${audioFilter}`;

        logger.debug(`${logPrefix} Configuring offset: Video PTS shifted forward by ${offsetSeconds.toFixed(3)}s`);
    } else if (offsetSeconds < 0) {
        // Shift audio PTS forward by abs(offsetSeconds) to match video
        videoFilter = `setpts=PTS,${videoFilter}`;
        audioFilter = `asetpts=PTS+${Math.abs(offsetSeconds)}/TB,${audioFilter}`;
        logger.debug(
            `${logPrefix} Configuring offset: Audio PTS shifted forward by ${Math.abs(offsetSeconds).toFixed(3)}s`,
        );
    } else {
        // Reset both to start at 0
        videoFilter = `setpts=PTS-STARTPTS,${videoFilter}`;
        audioFilter = `asetpts=PTS-STARTPTS,${audioFilter}`;
        logger.debug(`${logPrefix} Configuring offset: No offset (streams already synchronized)`);
    }

    return { videoFilter, audioFilter };
}

get started(): boolean {
    return this._ffmpeg !== undefined;
}

public configureStream(offsetSeconds: number): void {
    const { videoFilter, audioFilter } = this.configureOffset(offsetSeconds);
    const processFlags = ['-filter_complex', `[0:v]${videoFilter}[vout];[1:a]${audioFilter}[aout]`];

    const ffmpegCommand = [
        ...this.videoInputFlags,
        ...this.audioInputFlags,
        ...processFlags,
        ...this.mapFlags,
        ...this.videoEncodeFlags,
        ...this.audioEncodeFlags,
        ...this.syncFlags,
        ...this.whipOutputFlags,
    ];

    this.spawnProcess(ffmpegCommand);
}

public startBlackFrameInjection() {
    if (!this._videoStream || this._blackFrameInterval) return;

    logger.debug(`${logPrefix} Starting black frame injection loop`);
    this._videoStream.write(blackFramesLoop);

    this._blackFrameInterval = setInterval(() => {
        if (!this._videoStream) return;
        this._videoStream.write(blackFramesLoop);
    }, 2000); // 2 second loop to match your black frame file duration
}

public stopBlackFrameInjection() {
    if (!this._blackFrameInterval) return;

    logger.debug(`${logPrefix} Stopped black frame injection loop`);
    clearInterval(this._blackFrameInterval);
    this._blackFrameInterval = undefined;
}

public reset(): void {
    this.stopBlackFrameInjection();

    this._ffmpeg = undefined;
    this._videoStream = undefined;
    this._audioStream = undefined;
    this._videoBuffer = [];
    this._audioBuffer = [];
    this._shouldInjectBlackFrames = true; // Reset flag for next initial start
}

private killProcess(): void {
    if (!this._ffmpeg) return;
    this._ffmpeg.kill('SIGINT');
    this._ffmpeg = undefined;
    this._videoStream = undefined;
    this._audioStream = undefined;
}

public close(): void {
    this.killProcess();
    this.reset();
}

public processAudioPacket(packet: BufferedPacket): void {
    if (this.started && this._audioStream) {
        this._audioStream.write(packet.data);
    } else {
        this._audioBuffer.push(packet);
    }
}

public processVideoPacket(packet: BufferedPacket): void {
    if (this.started && this._videoStream) {
        this._videoStream.write(packet.data);
    } else {
        this._videoBuffer.push(packet);
    }
}

public clearVideoBuffer(): void {
    logger.debug(`${logPrefix} Clearing video buffer (${this._videoBuffer.length} packets)`);
    this._videoBuffer = [];
}}

8fiftydigital · October 28, 2025, 8:32pm

Hope that helps…but yeah, we’ve been at this for a while and can’t get it to work. We are trying to see if it’s even possible. So any help would be wonderful!

Elowen · October 29, 2025, 3:26am

It sounds like you’re facing an A/V sync issue when the video track stops and restarts in your RTMP → FFmpeg AWS IVS pipeline. This happens because timestamps between audio and video drift when one stream pauses. Restarting FFmpeg fixes it since it resets timestamps, but causes a visible interruption.

To fix it without restarting, you’ll need to maintain continuous, aligned timestamps—either by sending silent audio and black video continuously (so FFmpeg never sees a gap), or by using a custom muxer/timestamp alignment layer before FFmpeg to keep PTS/DTS in sync.

chunsiong.zoom · October 29, 2025, 8:02am

@8fiftydigital Here’s something you can try as well in addition to syncing timestamp.

Do note that these are naive implementation, you could probably fine-tune them for better precision.

Here’s what I would do.

Create. a loop to check for gap of more than 500ms for missing video buffer (muted state),. If no video buffer is received for more than 500ms, send in empty video keyframes to fill the exact timespan (anything from ~500-1000ms) gap. This loop would be in the .js or .ts file where the video buffer is being received from Zoom RTMS.

This gap filling has to be rather precise to prevent drifting. I’ve pre-generated black h264 keyframes in 40ms (since 1fps = 40ms), 80ms, 160ms, 320ms and demoninations once such as 1, 2, 4, 8, 16, 32.

Pre-generated as they are less taxing on the compute. These are also intentionally generated as keyframes.

When the video buffer from Zoom RTMS resumes, calculate the timespan gap and fill it in again.

The gap filling loop will only start running after the first buffer (video or audio) is received from Zoom RTMS.

The above applies for audio as well

8fiftydigital · October 29, 2025, 6:34pm

thanks for the feedback!
Restarting ffmpeg doesn’t work because zoom delivers speaker view for video and when there is a non-video participant actively talking the video stream from zoom stops which caused constant restarting of ffmpeg when there is a video participant with a non video participant speaking.

We’ve tried different ways to sync and just can’t get it right.

8fiftydigital · October 29, 2025, 6:40pm

Thanks @chunsiong.zoom

We are injecting pre generated black frames and trying to sync up the streams but to no avail.

Also, zoom delivers audio consistently even when everyone is muted which is great but video doesnt work that way. So we haven’t been able to successfully sync the streams.

Maybe we just have to wait for gallery view to become available.
We’ve tried many iterations with AI, using different models and still can’t get it

Will hack at it some more but close to giving up, and this was our primary use case. The rebroadcasting of zoom meetings. It works well when there is no video at all of course, but managing the video to audio stream sync seems impossible.

Haven’t tried but I would imagine even the livestream sample apps provided by zoom would not work and stay in sync (didnt try HLS but that wouldnt work for us). We are using a WHIP protocol to AWS IVS.

chunsiong.zoom · October 30, 2025, 3:09am

@8fiftydigital can you take a look at this sample which I’ve prepared for your scenario?

This is still on YouTube, but the concept should be applicable. Tag me and let me know if you have any questions

8fiftydigital · November 5, 2025, 5:41pm

Thanks for this, it got us a lot closer but we are still having issues with drift on audio.

We’d love to get 10 or 15min to show you the implementation we have.

chunsiong.zoom · November 10, 2025, 3:41pm

@8fiftydigital could you try this out?

I’ve figured out the drift issue, especially when users do not turn on their video at the start of the meeting.

This will handle both situation where user turn off their video at the start, and when they turn off their video halfway. The initial detection needs to be way for precise than expected, as there can be big variance if user turn off their video at start of meeting.

25 fps is still a nicer integer (40ms) to work with at the moment, so for now the sample uses 40ms for the timer interval.

        videoMuteDetectionTimer = setInterval(() => {
            const now = Date.now();
            const gap = now - lastVideoTime;
            if (gap > 320 && videoMuteState === "active") {
                // First time mute detected - inject frames to cover the gap
                const framesToInject = Math.ceil(gap / 40);
                if (videoStream.writable) {
                    for (let i = 0; i < framesToInject; i++) {
                        videoStream.write(denominationBuffers[40]);
                    }
                    console.log(`🎥 Video mute detected: injected ${framesToInject} × 40ms frames to cover ${gap}ms gap`);
                }
                videoMuteState = "continuous_mute";
            } else if (videoMuteState === "continuous_mute") {
                // Continuous injection - inject 1 frame of 40ms every 40ms
                if (videoStream.writable) {
                    videoStream.write(denominationBuffers[40]);
                }
            }
        }, 40);

8fiftydigital · November 12, 2025, 7:04pm

hi @chunsiong.zoom
This is still not working for us
Were you able to run multiple tests for long enough to confirm it stays in sync on your side.

chunsiong.zoom · November 13, 2025, 8:21am

@8fiftydigital I did test on this code, and it remain in sync after extended period of time before turning back on the video.

Heres another variation, just in case there are edge case which were not covered

Topic		Replies	Views
Lengths of recorded radio and video are not the same, and out of sync while merging them together Linux recording , video-sdk	6	616	December 2, 2024
Zoom RTMS SDK Error: Trouble in the sdk setup and fetching live data stream of meeting Realtime Media Streams webhooks , api , rtms	8	204	October 6, 2025
Zoom Meeting SDK Headless Bot in Linux - Audio starts ahead of time than video Meeting SDK recording , api , video-sdk	2	360	March 27, 2024
Syncing audio and video streams Realtime Media Streams	4	312	July 21, 2025
Live Streaming with RTMP API and Webhooks faq , live-streaming , guide	8	23327	April 23, 2022

Audio/video re-sync issues with RTMS when toggling camera off then on (trying to livestream meeting)

1. How are you feeding frames to FFmpeg?

2. Are you setting explicit presentation timestamps (PTS)?

3. FFmpeg command/config

Related topics