I am helping to support an undergraduate class focused on Zoom-oriented development. Many of the possible projects involve building an app that can respond based on what is being said in the meeting.
I have read through the documentation, and it seems like the only way to do this is to live stream the meeting into some kind of speech-to-text service.
How would you recommend doing this? I know there are some third-party services, such as Otter, that claim to offer real-time transcription of Zoom meetings, and I know there is a /closed_captioning API, but I am not sure what is the best route to take.