Track active speaker activity with background service

I want to develop an app that tracks how active is every speaker for a given call. For example: speaker A talked for 60% of time, speaker B for 15% and speaker C for 25%.

I know that client apps somehow know this information because the active speaker is highlighted. But I couldn’t find so far if there’s any API endpoint that returns me such kind of details for a call. If those metrics are not provided by the API, I was thinking that maybe I could develop some sort of client that joins the call and receives those “highlighted user” events then keep track of the timestamps when they occur. However, this is something that needs to happen in the background as part of the backend of some web service. Do you think that this is somehow possible with the Web SDK and Node.js?

Hey @fonsecajavier,

We do not have an API for this. That being said, you could probably pull this off with our Client SDKs:

Thanks,
Tommy

Thanks @tommy. That’s what I guessed. However, my question is more about to know if it’s possible to use the Web SDK somehow to do this as part of a backend service, in a “headless” fashion without any need of a person firing up a client in the browser. That’s why I thought about doing something with Node.js, but maybe you could bring some insight about how feasible this is before getting into a rabbit hole (like, “yes sure, that’s possible, …”, or “no way, don’t even try”)

Hey @fonsecajavier,

With the Web SDK since it only supports Speaker and Self view right now, it would not be possible.

Thanks,
Tommy

That’s a bummer @tommy. However, I thought on another thing…

Given than an user already enabled access to recordings with timestamps, what if I create an app that listens to the “record.completed” webhook ( https://marketplace.zoom.us/docs/api-reference/webhook-reference/recording-events/recording-completed ), then I extract the Timeline file URL from the payload, and download it in order to process it and get the active speakers information that I need based on the “ts” and “users” attributes of the timeline json.

Moments later I found this thread: Access to zoom recording and speaker activity data in which you proposed that same idea and that made my day :slight_smile:

Thank you @tommy and Tommy of the past!

1 Like

Hey @fonsecajavier,

That is a great solution and currently how a few other apps do this! :slight_smile:

Thanks,
Tommy

Saw your later thoughts in using a webhook (after the fact), but what about using one of the numerous web automation toolkits available in most languages like nodejs or python. For instance, selenium? Then run such service headless using a chromedriver on some cloud instance, or heck possibly even serverless on aws apigateway using zappa or equiv.

I thought about doing something like that, and that’s why I asked if the Web SDK could help me with that task. But it looks like the current Web SDK has less features compared with the SDK for other platforms. Also I’m a bit worried on how much computing resources could be spent if I try to run multiple Selenium threads in case that this app ever scales. Maybe at some point I’ll experiment with those other approaches, but using the timeline file will be good enough for a proof of concept. Thanks for bringing up those ideas.

1 Like

Hey @fonsecajavier, @brian.quandt,

Yeah, the timeline file is currently the best approach! :slight_smile:

-Tommy

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.