Audio/video re-sync issues with RTMS when toggling camera off then on (trying to livestream meeting)

We are ingesting the rtms data into an express js app, then having it flow through ffmep and into AWS IVS (Amazon’s Video Service) where a subscriber to the video can watch on a browser. So basically re-broadcasting the meeting as close to real time as possible. We used the rtms to youtube sample as a guide to get going.

We are struggling to get syncing of audio and video working once the video is turned off and then back on. We are injecting black video frames and have tried a few different ways to get this to work from having ffmpeg do it to having javascript realign the video and audio streams before sending to ffmpeg. Even tried implementing some drift logic to try and keep it in sync but to no avail.

When the video is turned back on the video and audio end up being off by multiple seconds sometimes.
The only solution that seems to be working is restarting ffmpeg when the video gets turned back on but that causes a disruption in the stream. Any help much appreciated.

2 Likes

Hey @8fiftydigital,

Thank you for reaching out with details on the issue you’re encountering. Just to make sure that I understand, are you saying that the timestamps themselves are a couple of seconds off or the timestamps are correct but you haven’t found a method to mux the audio and video streams together?

Thanks,

Max

thanks for the quick response!
we haven’t found a working method to mux the audio and video streams. Specifically when the camera gets toggled off and on.

Got it, thanks for the clarification! I’ll go ahead and test this out today to see if I can replicate the issue. Even if I don’t see the same issue, this will give me an opportunity to experiment with mux’ing and advise on the best method. I think you’re on the right track using FFMPEG.

A couple quick questions to help me understand your setup:

  1. How are you feeding frames to FFmpeg? Are you piping raw frames via stdin or using a different method?

  2. Are you setting explicit presentation timestamps (PTS)? The key to keeping A/V in sync when video drops out is ensuring both streams maintain a continuous timeline with proper PTS values.

  3. What’s your FFmpeg command/config? Specifically interested in the -vsync, -async, and any timestamp-related flags you’re using.

One thing that’s critical with intermittent video - when you inject black frames during camera-off periods, those frames need to have the correct PTS that aligns with the audio timeline. If the timestamps aren’t properly synchronized to a common clock, you’ll see that drift.

Will dig in on my end and follow up soon!

1. How are you feeding frames to FFmpeg?

Using pipes.

  • Video input: pipe:3 (stdio[3])
  • Audio input: pipe:4 (stdio[4])

2. Are you setting explicit presentation timestamps (PTS)?

PTS is adjusted via filters.

Tracking:

        if (msg.msg_type === MessageType.MEDIA_DATA_AUDIO && msg.content?.data) {
            const buffer = Buffer.from(msg.content.data, 'base64');
            const timestamp = msg.content.timestamp;

            // Track base timestamp for synchronization
            if (this._baseAudioTimestamp === undefined) {
                this._baseAudioTimestamp = timestamp;
                logger.debug(`${logPrefix} First audio packet timestamp: ${timestamp}`);
            }

            this._lastAudioTimestamp = timestamp;
            this._ffmpegHandler.processAudioPacket({ data: buffer, timestamp });

            return;
        }

        if (msg.msg_type === MessageType.MEDIA_DATA_VIDEO && msg.content?.data) {
            const buffer = Buffer.from(msg.content.data, 'base64');
            const timestamp = msg.content.timestamp;

            // Track base timestamp for synchronization
            if (this._baseVideoTimestamp === undefined) {
                this._baseVideoTimestamp = timestamp;
                logger.debug(`${logPrefix} First video packet timestamp: ${timestamp}`);
            }

            this._lastVideoTimestamp = timestamp;

Offset calculation and PTS adjustment:

    private configureOffset(offsetSeconds: number): { videoFilter: string; audioFilter: string } {
        let videoFilter =
            'scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1';
        let audioFilter = 'aresample=48000,aformat=channel_layouts=stereo';

        /*
         * offsetSeconds is the difference in seconds between the video and audio arrival times.
         * If offsetSeconds is positive, the audio arrived before the video.
         * If offsetSeconds is negative, the video arrived before the audio.
         * If offsetSeconds is zero, the video and audio arrived at the same time.
         *
         * We buffer packets from whichever arrives first. When we flush the buffer,
         * the early-arriving stream gets a "head start" in PTS timeline.
         * We need to shift the late-arriving stream's PTS forward to catch up.
         */

        if (offsetSeconds > 0) {
            // Shift video PTS forward by offsetSeconds to match audio
            videoFilter = `setpts=PTS+${offsetSeconds}/TB,${videoFilter}`;
            audioFilter = `asetpts=PTS,${audioFilter}`;

            logger.debug(`${logPrefix} Configuring offset: Video PTS shifted forward by ${offsetSeconds.toFixed(3)}s`);
        } else if (offsetSeconds < 0) {
            // Shift audio PTS forward by abs(offsetSeconds) to match video
            videoFilter = `setpts=PTS,${videoFilter}`;
            audioFilter = `asetpts=PTS+${Math.abs(offsetSeconds)}/TB,${audioFilter}`;
            logger.debug(
                `${logPrefix} Configuring offset: Audio PTS shifted forward by ${Math.abs(offsetSeconds).toFixed(3)}s`,
            );
        } else {
            // Reset both to start at 0
            videoFilter = `setpts=PTS-STARTPTS,${videoFilter}`;
            audioFilter = `asetpts=PTS-STARTPTS,${audioFilter}`;
            logger.debug(`${logPrefix} Configuring offset: No offset (streams already synchronized)`);
        }

        return { videoFilter, audioFilter };
    }

3. FFmpeg command/config

Built here:

    public configureStream(offsetSeconds: number): void {
        const { videoFilter, audioFilter } = this.configureOffset(offsetSeconds);
        const processFlags = ['-filter_complex', `[0:v]${videoFilter}[vout];[1:a]${audioFilter}[aout]`];

        const ffmpegCommand = [
            ...this.videoInputFlags,
            ...this.audioInputFlags,
            ...processFlags,
            ...this.mapFlags,
            ...this.videoEncodeFlags,
            ...this.audioEncodeFlags,
            ...this.syncFlags,
            ...this.whipOutputFlags,
        ];

        this.spawnProcess(ffmpegCommand);
    }

Sync flags:

    private get syncFlags(): string[] {
        return ['-async', '1', '-fps_mode', 'cfr'];
    }

Here is my ffmpeg.ts file contents

import type { ChildProcess, StdioOptions } from 'node:child_process';
import { spawn } from 'node:child_process';
import type internal from 'node:stream';
import logger from '~/lib/useConsoleLogger';
import fs from 'node:fs';
import path from 'node:path';

type BufferedPacket = {
    data: Buffer;
    timestamp: number;
};

const logPrefix = 'FfmpegHandler:';

const resourcesPath = path.join(import.meta.dirname, '..', '..', '..', 'resources');
const blackFramesLoopPath = path.join(resourcesPath, 'black_frames_loop.h264');
const blackFramesLoop = fs.readFileSync(blackFramesLoopPath);

export class FfmpegHandler {
    private _token: string;
    private _ffmpeg?: ChildProcess;
    private _videoStream?: internal.Writable;
    private _audioStream?: internal.Writable;
    private _videoBuffer: BufferedPacket[] = [];
    private _audioBuffer: BufferedPacket[] = [];
    private _blackFrameInterval?: NodeJS.Timeout;
    private _shouldInjectBlackFrames = true;

constructor(token: string) {
    this._token = token;
}

private get videoInputFlags(): string[] {
    return ['-framerate', '25', '-f', 'h264', '-i', 'pipe:3'];
}

private get videoEncodeFlags(): string[] {
    return [
        '-c:v',
        'libx264',
        '-profile:v',
        'baseline',
        '-preset',
        'veryfast',
        '-tune',
        'zerolatency',
        '-g',
        '50', // 25 fps * 2 sec
        '-keyint_min',
        '50',
        '-sc_threshold',
        '0',
        '-b:v',
        '3000k', // 720p bitrate
        '-maxrate',
        '3000k',
        '-bufsize',
        '6000k',
        '-force_key_frames',
        'expr:gte(t-prev_forced_t,2)', // Force I-frame every 2 seconds
    ];
}

private get audioInputFlags(): string[] {
    return ['-f', 's16le', '-ar', '16000', '-ac', '1', '-i', 'pipe:4'];
}

private get audioEncodeFlags(): string[] {
    return ['-c:a', 'libopus', '-b:a', '128k', '-ar', '48000', '-ac', '2'];
}

private get mapFlags(): string[] {
    return ['-map', '[vout]', '-map', '[aout]'];
}

private get syncFlags(): string[] {
    return ['-async', '1', '-fps_mode', 'cfr'];
}

private get whipOutputFlags(): string[] {
    return ['-f', 'whip', '-authorization', this._token, 'https://global.whip.live-video.net'];
}

private get stdioOptions(): StdioOptions {
    return ['ignore', 'inherit', 'inherit', 'pipe', 'pipe'];
}

private flushBuffers(): void {
    logger.debug(
        `${logPrefix} Flushing buffered packets (audio: ${this._audioBuffer.length}, video: ${this._videoBuffer.length})`,
    );

    for (const packet of this._audioBuffer) {
        if (this._audioStream) {
            this._audioStream.write(packet.data);
        }
    }

    for (const packet of this._videoBuffer) {
        if (this._videoStream) {
            this._videoStream.write(packet.data);
        }
    }

    this._audioBuffer = [];
    this._videoBuffer = [];
    logger.debug(`${logPrefix} FFmpeg started and buffers flushed successfully`);
}

private spawnProcess(command: string[]): void {
    logger.debug(`${logPrefix} Command spawned: /opt/ffmpeg-whip/bin/ffmpeg ${command.join(' ')}`);

    this._ffmpeg = spawn('/opt/ffmpeg-whip/bin/ffmpeg', command, { stdio: this.stdioOptions });
    this._videoStream = this._ffmpeg.stdio[3] as internal.Writable;
    this._audioStream = this._ffmpeg.stdio[4] as internal.Writable;

    this._ffmpeg.on('spawn', () => {
        logger.info(`${logPrefix} Process spawned`);
        this.flushBuffers();

        // Only inject black frames on initial start, not on restart
        if (this._shouldInjectBlackFrames) {
            this.startBlackFrameInjection();
        }
    });

    this._ffmpeg.on('error', (err) => logger.error(new Error(`${logPrefix} Process error: ${err.message}`)));

    this._ffmpeg.on('exit', (code, signal) => {
        if (code !== 0) {
            logger.error(new Error(`${logPrefix} Process exited with error ${JSON.stringify({ code, signal })}`));
            return;
        }

        logger.info(`${logPrefix} Process exited successfully ${JSON.stringify({ code, signal })}`);
    });

    this._ffmpeg.stdout?.on('data', (data: Buffer) => {
        const message = data.toString().trim();
        if (!message) return;

        logger.debug(`${logPrefix} Process stdout: ${message}`);
    });

    this._ffmpeg.stderr?.on('data', (data: Buffer) => {
        const message = data.toString().trim();
        if (!message) return;

        logger.debug(`${logPrefix} Process stderr: ${message}`);
    });
}

private configureOffset(offsetSeconds: number): { videoFilter: string; audioFilter: string } {
    let videoFilter =
        'scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1';
    let audioFilter = 'aresample=48000,aformat=channel_layouts=stereo';

    /*
     * offsetSeconds is the difference in seconds between the video and audio arrival times.
     * If offsetSeconds is positive, the audio arrived before the video.
     * If offsetSeconds is negative, the video arrived before the audio.
     * If offsetSeconds is zero, the video and audio arrived at the same time.
     *
     * We buffer packets from whichever arrives first. When we flush the buffer,
     * the early-arriving stream gets a "head start" in PTS timeline.
     * We need to shift the late-arriving stream's PTS forward to catch up.
     */

    if (offsetSeconds > 0) {
        // Shift video PTS forward by offsetSeconds to match audio
        videoFilter = `setpts=PTS+${offsetSeconds}/TB,${videoFilter}`;
        audioFilter = `asetpts=PTS,${audioFilter}`;

        logger.debug(`${logPrefix} Configuring offset: Video PTS shifted forward by ${offsetSeconds.toFixed(3)}s`);
    } else if (offsetSeconds < 0) {
        // Shift audio PTS forward by abs(offsetSeconds) to match video
        videoFilter = `setpts=PTS,${videoFilter}`;
        audioFilter = `asetpts=PTS+${Math.abs(offsetSeconds)}/TB,${audioFilter}`;
        logger.debug(
            `${logPrefix} Configuring offset: Audio PTS shifted forward by ${Math.abs(offsetSeconds).toFixed(3)}s`,
        );
    } else {
        // Reset both to start at 0
        videoFilter = `setpts=PTS-STARTPTS,${videoFilter}`;
        audioFilter = `asetpts=PTS-STARTPTS,${audioFilter}`;
        logger.debug(`${logPrefix} Configuring offset: No offset (streams already synchronized)`);
    }

    return { videoFilter, audioFilter };
}

get started(): boolean {
    return this._ffmpeg !== undefined;
}

public configureStream(offsetSeconds: number): void {
    const { videoFilter, audioFilter } = this.configureOffset(offsetSeconds);
    const processFlags = ['-filter_complex', `[0:v]${videoFilter}[vout];[1:a]${audioFilter}[aout]`];

    const ffmpegCommand = [
        ...this.videoInputFlags,
        ...this.audioInputFlags,
        ...processFlags,
        ...this.mapFlags,
        ...this.videoEncodeFlags,
        ...this.audioEncodeFlags,
        ...this.syncFlags,
        ...this.whipOutputFlags,
    ];

    this.spawnProcess(ffmpegCommand);
}

public startBlackFrameInjection() {
    if (!this._videoStream || this._blackFrameInterval) return;

    logger.debug(`${logPrefix} Starting black frame injection loop`);
    this._videoStream.write(blackFramesLoop);

    this._blackFrameInterval = setInterval(() => {
        if (!this._videoStream) return;
        this._videoStream.write(blackFramesLoop);
    }, 2000); // 2 second loop to match your black frame file duration
}

public stopBlackFrameInjection() {
    if (!this._blackFrameInterval) return;

    logger.debug(`${logPrefix} Stopped black frame injection loop`);
    clearInterval(this._blackFrameInterval);
    this._blackFrameInterval = undefined;
}

public reset(): void {
    this.stopBlackFrameInjection();

    this._ffmpeg = undefined;
    this._videoStream = undefined;
    this._audioStream = undefined;
    this._videoBuffer = [];
    this._audioBuffer = [];
    this._shouldInjectBlackFrames = true; // Reset flag for next initial start
}

private killProcess(): void {
    if (!this._ffmpeg) return;
    this._ffmpeg.kill('SIGINT');
    this._ffmpeg = undefined;
    this._videoStream = undefined;
    this._audioStream = undefined;
}

public close(): void {
    this.killProcess();
    this.reset();
}

public processAudioPacket(packet: BufferedPacket): void {
    if (this.started && this._audioStream) {
        this._audioStream.write(packet.data);
    } else {
        this._audioBuffer.push(packet);
    }
}

public processVideoPacket(packet: BufferedPacket): void {
    if (this.started && this._videoStream) {
        this._videoStream.write(packet.data);
    } else {
        this._videoBuffer.push(packet);
    }
}

public clearVideoBuffer(): void {
    logger.debug(`${logPrefix} Clearing video buffer (${this._videoBuffer.length} packets)`);
    this._videoBuffer = [];
}}

Hope that helps…but yeah, we’ve been at this for a while and can’t get it to work. We are trying to see if it’s even possible. So any help would be wonderful!

It sounds like you’re facing an A/V sync issue when the video track stops and restarts in your RTMP → FFmpeg AWS IVS pipeline. This happens because timestamps between audio and video drift when one stream pauses. Restarting FFmpeg fixes it since it resets timestamps, but causes a visible interruption.

To fix it without restarting, you’ll need to maintain continuous, aligned timestamps—either by sending silent audio and black video continuously (so FFmpeg never sees a gap), or by using a custom muxer/timestamp alignment layer before FFmpeg to keep PTS/DTS in sync.

@8fiftydigital Here’s something you can try as well in addition to syncing timestamp.

Do note that these are naive implementation, you could probably fine-tune them for better precision.

Here’s what I would do.

Create. a loop to check for gap of more than 500ms for missing video buffer (muted state),. If no video buffer is received for more than 500ms, send in empty video keyframes to fill the exact timespan (anything from ~500-1000ms) gap. This loop would be in the .js or .ts file where the video buffer is being received from Zoom RTMS.

This gap filling has to be rather precise to prevent drifting. I’ve pre-generated black h264 keyframes in 40ms (since 1fps = 40ms), 80ms, 160ms, 320ms and demoninations once such as 1, 2, 4, 8, 16, 32.

Pre-generated as they are less taxing on the compute. These are also intentionally generated as keyframes.

When the video buffer from Zoom RTMS resumes, calculate the timespan gap and fill it in again.

The gap filling loop will only start running after the first buffer (video or audio) is received from Zoom RTMS.

The above applies for audio as well

thanks for the feedback!
Restarting ffmpeg doesn’t work because zoom delivers speaker view for video and when there is a non-video participant actively talking the video stream from zoom stops which caused constant restarting of ffmpeg when there is a video participant with a non video participant speaking.

We’ve tried different ways to sync and just can’t get it right.

Thanks @chunsiong.zoom

We are injecting pre generated black frames and trying to sync up the streams but to no avail.

Also, zoom delivers audio consistently even when everyone is muted which is great but video doesnt work that way. So we haven’t been able to successfully sync the streams.

Maybe we just have to wait for gallery view to become available.
We’ve tried many iterations with AI, using different models and still can’t get it :frowning:

Will hack at it some more but close to giving up, and this was our primary use case. The rebroadcasting of zoom meetings. It works well when there is no video at all of course, but managing the video to audio stream sync seems impossible.

Haven’t tried but I would imagine even the livestream sample apps provided by zoom would not work and stay in sync (didnt try HLS but that wouldnt work for us). We are using a WHIP protocol to AWS IVS.

@8fiftydigital can you take a look at this sample which I’ve prepared for your scenario?

This is still on YouTube, but the concept should be applicable. Tag me and let me know if you have any questions