I am helping to support an undergraduate class focused on Zoom-oriented development. Many of the possible projects involve building an app that can respond based on what is being said in the meeting.
I have read through the documentation, and it seems like the only way to do this is to live stream the meeting into some kind of speech-to-text service.
How would you recommend doing this? I know there are some third-party services, such as Otter, that claim to offer real-time transcription of Zoom meetings, and I know there is a /closed_captioning API, but I am not sure what is the best route to take.
Thanks for reaching out about this—this has come up before, and while we don’t have an out of the box way to do this, some other devs have had luck with 3rd party tools like you’ve mentioned. You may find some of the info in this thread helpful: