Live Transcripts

Does RTMS expose any partial/interim transcript events that correspond to the live captions shown in the Zoom client?.

The current RTMS server doesn’t stream the transcripts. It provides the transcripts at the end of the sentence.

@Amit11 I’m assuming you might be asking if we support streaming transcript. At this moment we do not, the transcripts are sent only when there is a pause or end of sentence.

@chunsiong.zoom Thanks for the quick reply!

Even If I implement the transcripts sorting based on the timestamp, then transcripts rendering experience will not look good from the end user point of veiw.

How does zoom native client shows transcripts/captions in real time ?

Is it planned something in the future or suggestions on fixing this behavior ?

@Amit11 if you are looking for streaming transcript. You might want to explore passing the RTMS audio to a streaming speech to text service.

here are some samples

The transcribe service which RTMS uses is different from the one provided in zoom client.

I’ll float up your feedback regarding streaming transcript.

Hey @Amit11, RTMS transcripts only fire at sentence boundaries, which is why the UX feels choppy. The fix is to pipe RTMS raw audio into a streaming STT service that gives you word-level partial results.

The rtms-samples repo already has working examples for AssemblyAI, Deepgram, and local Whisper under the audio/ folder, so you are not starting from scratch. GitHub

For STT: Deepgram Nova-3 has the lowest streaming latency at around 450ms median, making it great for live captions. AssemblyAI is solid too and has a dedicated RTMS integration with a real-time streaming API. Both give you is_final: false partial events so you can render words as they arrive, which is exactly the UX the Zoom native client produces.

Thanks,
Naeem Ahmed

Thanks @freelancer.nak