How to integrate AI STT/TTS with Zoom Video SDK on a headless server

Alvan · September 3, 2025, 9:05am

Hi Zoom Dev Community,

I’m working on a project where I want to integrate AI speech-to-text (STT) and text-to-speech (TTS) into a Zoom Video SDK application running on a remote/headless server. The idea is to have an AI “agent” that can:

Listen to participants in real time (via STT).
Generate a response (via LLM or other AI logic).
Speak back into the Zoom session (via TTS).

I’ve already managed to run the Zoom Video SDK on a server and successfully join sessions with audio/video. The challenge I’m facing is:

Since the server has no microphone or speakers, I can’t use normal input/output devices for the AI tool.
I need a way to capture participant audio directly from the Zoom SDK and feed it into my STT service.
I also need to inject the AI-generated TTS audio back into the Zoom session as if it were microphone input.

Questions:
1. What is the recommended way to capture raw audio from participants inside the Video SDK?
2. Can I continuously stream AI-generated PCM audio into sendAudioRawData() to make the bot “speak” in the meeting?
3. Are there constraints around audio format (e.g., PCM 16-bit, 16kHz vs 48kHz)?
4. Is this the right approach, or is there a better way to implement an AI voice agent inside a Zoom Video SDK session?
Ultimately, I want to create a bot that can “listen and talk” naturally in real time, without needing physical audio devices.

Any guidance, examples, or best practices would be greatly appreciated!

Thanks in advance

Topic		Replies	Views
Zoom Audio bots Meeting SDK	2	1402	September 20, 2022
Adding a Voice Assistant to Zoom Web	6	2016	May 20, 2021
RoadMap for access to Audio and Video raw content in meetng Zoom Apps	7	698	January 31, 2023
Total newbie question on zoom API API and Webhooks	2	560	February 3, 2022
Can we embedded meeting sdk inside zoom app Meetings meeting-sdk	1	406	September 17, 2022

How to integrate AI STT/TTS with Zoom Video SDK on a headless server

Related topics