I’m analyzing the idea of an accessibility app that would provide voice description of the speaker. Once the speaker activates the app, it takes a screenshot of the video of the speaker and send it to an API at Eden AI that sends back a text description, which I would then send to some text-to-speech API that would output the final audio to the user.
But then as I’m setting the APP, Zoom asks me to choose between the available scopes and I’m new to all this.
But by the description of the app if you are an experienced zoom app developer you were already able to figure it out, so please help me choose the ones that I will need among these:
When it comes to finding out what scope your app needs, there are a couple of methods. First, if you request an API without the necessary scopes then the error returned will indicate that required scopes are needed.
However, a better method is to identify the APIs you need to use in our documentation. As part of constructing your API request, you can check the scope parameter on the API that you are using:
With that, the first step will be to determine what APIs you need to use. Based on your description, I’m not sure that you would need to use a specific REST API.
Instead, the best plan is likely to use the Meeting SDK to capture raw video data (your screenshot in this case) and then use a Zoom App to play the audio for the user in the meeting.
As Max already mentioned, based on your use case, you may not need to use the REST API for this at all, and so additional scopes may not be necessary.
To get an image of a meeting in progress, you would likely want to use the Meeting SDK to access the raw video as outlined here: Meeting Bots Accessing Media Streams
Then, to output the audio, you would do this from your Zoom application e.g. through the Windows SDK as outlined here: Use raw data | Meeting SDK | Windows
Another alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to receive raw audio/video from meetings, output audio, etc. without you needing to spend months to build, scale and maintain these bots.