Controlling Partial vs. Full Transcripts in Zoom RTMS

Hi,

Is there any way I can control whether zoom sends me a partial or a full transcript?

I noticed that during a meeting closed captions appears instantly on the screen but the transcript only gets sent to my webhook whenever the person talking pauses.

Any help would be great.

Hi Edward, thanks for reaching out! Just to confirm, are you referring to a scenario where the speaker continues talking without pausing or taking a breath?

I understand what you mean about the difference between closed captions appearing instantly and the RTMS transcript behavior. I believe this is expected behavior, as RTMS sends transcript data whenever there is even a brief pause or a change in speaker. It does include the full transcript in real time, though, so your situation might be an edge case. I would need to do some testing to confirm this and determine at what point RTMS sends what has been said so far, even if the active speaker does not pause or take a breath.

Before I test, could you clarify what you mean by “partial or full transcript”? Are you asking whether it is possible to receive only segments of speech as they occur, or the entire transcript after a section is complete? Or you asking if there is a way to have a section be complete without a speaker ever pausing or changing?

Hi Jen, I appreciate the response.

So what I’m asking is whether its possible to receive segments of speech as they occur in real time; It would be nice if the transcript could mimic the closed caption behavior.

Thanks, Edward. RTMS can send transcripts in real time, but it is not designed to mimic closed caption behavior. The RTMS pipeline is built to deliver transcript data at the utterance level rather than the word or sentence level.

We also work with transcription partners such as Assembly and Deepgram, who can possibly provide additional real-time processing options for your use case. Another possible approach would be to send the audio stream continuously for live transcription, though that would really only apply in a use case where the speaker never takes a pause whatsoever.