I am excited to share my project involving a voice bot designed to conduct job interviews on Visio conference platforms like Zoom. The voice bot has been trained to simulate real recruiter-led job interviews, where it asks questions and evaluates candidate responses to determine the next steps of the interview process.
To achieve this, I need the platform to provide APIs that allow me to manage the flow of the interview seamlessly. Here are the key functionalities I am seeking through APIs:
1-Joining a Zoom meeting using the interviewer’s host account.
2-Capturing the candidate’s audio responses and feeding them back to the voice bot.
3-Sending voice responses from the voice bot to Zoom, allowing the interviewer to deliver them.
While researching similar topics, I found some discussions dating back to 2020. However, I believe Zoom’s APIs have undergone significant development since then. As someone new to setting up APIs, I would greatly appreciate your input on this matter. Specifically, I would like to know which APIs are most suitable for achieving these functionalities and whether it is possible to accomplish them using Zoom’s APIs.
I am grateful for the excellent developer documentation provided and look forward to your valuable insights.
Happy to break it down how to do it. To build a meeting bot:
Spin up a server. We recommend AWS, GCP, or Digital Ocean.
Use either the Windows or Mac Zoom SDK to launch an instance of the Zoom client.
Once you have the Zoom SDK launched, and use the Raw Data functionality to extract the video and audio streams.
This will return the video in I420 raw frames and audio in PCM 16LE raw format, so you’ll need to encode the audio and video yourself afterwards.
Run the audio through a transcription provider like AWS Transcribe.
Output audio from the bot to respond.
Once you have one instance of this working, you’ll need to scale this across several servers if you want to run multiple bots simultaneously, which is required to have bots for multiple meetings.
Another option is Recall.ai. It’s a simple 3rd party API that lets you use meeting bots to get the raw audio/video from meetings + output video/audio without you needing to spend months to build, scale and maintain these bots.