Thank you for reaching out to the Zoom Developer Forum. This sounds like a good use case for our upcoming Zoom feature called Zoom Apps. You can sign up at that link to make sure you’re kept in the loop.
For now, it seems the best method to accomplish something like this is to have your AI join the meeting as a participant. It could then process the meeting audio and listen for voice commands. You could also have the AI use a virtual microphone to allow the AI to speak in the meeting.
The biggest blocker is that with the Web SDK, there isn’t a method to automatically join audio or video. Due to security and privacy restrictions implemented by the browser and Zoom, user interaction is required to enable microphone or camera devices even if they are virtually created by your AI.
You may be able to use a Native Client SDK (macOS or Windows) to join the meeting audio/video programmatically but I wasn’t able to confirm this in our documentation. Using a native SDK will also provide you with better access to compute resources available to the device.
When it comes to questions about our Windows or MacOS SDKs, I recommend posting in our #desktop-client-sdk:windows or #desktop-client-sdk:macos categories. They’ll be able to advise further on that topic.
I hope that helps! Let me know if you have any questions.