Description
I’ve been developing an application that uses Zoom transcriptions of spoken audio. After a meeting, I wait for the recording.transcript_completed webhook. Everything works, but the dev and testing processes are extremely slow due to the need to wait for the transcript to be completed, sometimes for many, many hours, and in some cases, over a full day. Other times transcriptions come back within a few short minutes. Is there any factors that can be controlled to get back transcriptions as quickly as possible (meeting durations, meeting time of day, upgraded API access, anything else)?
Which App Type (OAuth / Chatbot / JWT / Webhook)?
both OAuth and JWT integrations using the webhooks
Which Endpoint/s?
How To Reproduce (If applicable)
Steps to reproduce the behavior:
In the Zoom app configuration on marketplace.zoom.us, navigate to Feature > Event Subscriptions > Recording and make sure that Recording Transcript files have completed is enabled with the correct event notification endpoint URL.
Create, start, and end a Zoom meeting.
Wait for the recording.transcript_completed to be received at the event notification endpoint URL. Note that this could take a few short minutes, many hours, or more than a day.
Thanks for reaching out about this, and good question. Several of the factors you mentioned can impact the amount of time that it takes for a transcription to be completed, but the duration of the meeting would likely have the biggest impact.
While it’s normal for the transcript to be completed a couple of hours after the meeting, do you have an example Meeting ID for one that took much longer (such as an entire day)? I’m happy to take a closer look for you.
Thanks for sharing this—I’m wondering if this might be because the meeting was so short/maybe the active speaker wasn’t captured. However, I’ve reached out to my team to help confirm and we’re looking into this. (ZOOM-231561)
Thanks for checking. In order to avoid long transcription times, we’ve tried 20-second calls, 1-minute calls, 5-minute calls, and 15-minute calls. The call duration didn’t seem to have much impact on the transcription time, which usually takes several hours. We’ve tried single-participant and multi-participant calls. For the speech, we’ve tried reading a paragraph out of a book, rambling on about various stuff, having real conversation, or simply counting out numbers. None of that seems to noticeably affect transcription time. We’ve also tried making calls at different times of day. The time of day does seem to affect the transcription time, but in an odd way; that is a late night call might transcribe within minutes while a call from earlier that day hasn’t finished transcribing and might take some hours longer before completion.
Any insight to help us along is highly appreciated.