I’ve been working on the Zoom RTMS integration and have successfully built a solution to capture live meeting transcripts.
My backend service establishes an Event Subscriber WebSocket connection (instead of using webhooks) and receives events such as meeting.started, rtms.started, and others through this channel.
Since this service runs on Kubernetes, there may be brief periods of downtime during scaling events or pod restarts. During such windows, there is a possibility of missing events delivered over the WebSocket connection.
What is the recommended approach for catching up on missed events in this scenario? Does Zoom provide an API to query historical events related to my application for a specific time window? If not, could you suggest best practices for handling this use case and ensuring event consistency?
@Amit11 the websocket or webhook receiver service should be always on. The RTMS message should be thereafter passed over to a docker container instance, and the connection should be established within 60 seconds after the initial receipt of the message.
For websocket, we do not re-fire the event.
For webhook we refire the event, but for RTMS subsequence messages are usually expired, and no longer valid for connection.
Solution is to ensure that your websocket / webhook receiver service in always on (and in HA config where possible)
Even with an HA deployment, we can’t guarantee that the websocket service will be available 100% of the time. There could be a brief failover window (a few seconds) during which no instance is running, resulting in missed events.
To address this, I was considering performing a reconciliation step when the service restarts by calling Zoom APIs to retrieve live meetings and meeting activities from the last few seconds/minutes. So far, I’ve come across the following APIs:
/metrics/meetings
/report/meeting_activities
However, /report/meeting_activities does not include RTMS-related events. It only returns the following activity types:
Meeting Created
Meeting Started
User Joined
User Left
Remote Control
In-Meeting Chat
Meeting Ended
Is there any API that can provide the list of events that occurred during a specific time window (for example, the last 10 seconds), including RTMS events? Alternatively, is there a recommended approach for recovering missed RTMS events after a temporary websocket disconnect?
@Amit11 I would recommend you to establish 2 instance of websocket then, have a comparison service. Or alternatively establish one instance of websocket and another instance of webhook, then use a comparison service to make sure you don’t miss out the event.
There is no “GET” at this moment right now which will allow you to retrieve or poll for the payload which are essential to initiate the handshake for signaling and media server. It is now “pushed” via websocket and webhook only,
@chunsiong.zoom On the meeting.started event, I invoke the RTMS Start API, which subsequently results in an rtms.started notification containing the signaling server details.
I noticed that the /metrics/meetings API can return both live and past meetings for a specified time window. However, it does not directly provide information about which meetings started or ended during that window. To determine newly started or ended meetings, I would need to compare the meetings returned by the API against the meeting state stored in my system and infer the changes. Is my understanding correct, or do you see any issues with this approach?
One concern is the scenario where I miss the rtms.started event. As far as I can tell, there does not appear to be an API that allows me to query RTMS session details after the fact. However, since the meeting would still be returned by the live meetings API, I should be able to recover by reissuing the RTMS Start API call based on my stored meeting state and continue recording. Does that sound correct?
My goal is to handle service outages gracefully. Running the Event Subscriber WebSocket in an HA configuration, or combining it with webhooks, reduces the likelihood of missing events but does not eliminate the possibility entirely. During maintenance windows, deployments, or unexpected outages, there can still be periods when no instance is connected.
Because of this, I am looking for a recovery mechanism where I can periodically fetch the current state from Zoom, compare it against my stored state, and identify any changes that occurred while my services were unavailable.
I assume this challenge is not unique to my application and would apply to any integration that relies on event delivery. In such cases, is the recommended approach from Zoom simply to run the event processing services in a highly available configuration, or is there another recommended pattern for state reconciliation and recovery from missed events?
@chunsiong.zoom If I implement HA for the WebSocket-based event subscriber, would the following configuration work?
I will have a single OAuth application for production, and my solution will be deployed across 8 regions.
Currently, I am planning to create one Event Subscriber WebSocket connection per region, resulting in a total of 8 WebSocket event subscribers. To support HA, I would create an additional Event Subscriber WebSocket connection in each region, bringing the total to 16 WebSocket event subscribers.
Are there any limitations on the number of Event Subscriber WebSocket connections that can be established for a single OAuth application? If so, what is the maximum supported limit?
Zoom Apps allows you query the RTMS status, and you might be able to restart RTMS using the logic in Zoom Apps to achieve that.
You don’t have to have 8 pairs of websocket event subscribes. This means you need to compare 8 services for duplicate before coming to a consensus across 8 different regions.
I would just stick to one region, have 1 websocket and 1 webhook receiver. The first to arrive will wait for the 2nd to arrive within 2-3 seconds (or some other short arbitary number). Store this in a persistant store so that the subsequent message is not processed.
If you do no de-duplicate the message, the second message will kick the first message’s successful connection. If there is retry logic in the first message’s disconnect, it might result in race condition of kicking each other out.
Our solution is deployed across 8 regions. Due to GDPR and data residency requirements, meetings must be recorded and processed within the region where they are hosted. To enforce this, when a meeting notification is received, we validate the meeting ID against the meeting records stored in the corresponding regional database to determine whether the meeting should be recorded by that region.
As a result, we currently have 8 Event Subscriber WebSocket connections, one running in each region and listening for Zoom events.
If we were to implement high availability, we would need an additional Event Subscriber instance in each region, effectively doubling the number of Event Subscriber connections.
Is there a maximum limit on the number of Event Subscriber WebSocket connections that can be established for a single application? If so, what is the supported limit?