RTMS Transcript Webhook Architecture Question

Hi,

My team and I are planning to implement the RTMS package in our Next.js application.

From what I can tell, the Node.js library appears to be a thin wrapper around C++ code that only runs on Linux. My main concern is that the C++ layer might be establishing an active outbound connection to Zoom (for example, to handle the transcription webhook when a meeting starts).

If that’s the case, it could be difficult to manage connection lifecycles in a horizontally scaled environment, since coordination between instances would be required.

Could you provide more details about the overall network architecture of the @zoom/rtms Node.js package? Specifically, does it maintain a persistent connection to Zoom, or does it act more as a passive listener for incoming webhook events?

We’d like to design our system in a way that avoids having to vertically scale a single instance.

Hi @JasonKiwi ,

There are 2 main things which @zoom/rtms package does. The below is non exhaust list of events, but I’ll highlight the main events

  1. Listen for main webhooks related to meeting.rtms_start and meeting.rtms_stopped. These webhooks provides the payload necessary to connect to the signaling and media server. Once received, it is handed over to point 2 below.

  2. Connect to signaling and media server. These 2 server connections are necessary for each RTMS stream. The media server is the one which eventually will be passing the data over websocket to your server. The SDK helps to keep connection to the signaling and media server, and including things like reconnection in case if it breaks.

Here’s what you can do for scalability.

  1. Setup a server or serverless service which will listen to webhooks from Zoom and pass it into a service bus or service queue
  2. Have a service orchestrator which will spin up docker instances (scaling horizontally) based on each of these messages in the service queue. Remember to pop it so that you will 1 message per 1 docker instance.
  3. Within this docker instance, you will use the message which looks something like this

{
meeting_uuid: “meeting-uuid”,
rtms_stream_id: “stream-id”,
server_urls: “wss://rtms.zoom.us”
}

and use the below command to connect to signaling and media server
rtms.join({
meeting_uuid: “meeting-uuid”,
rtms_stream_id: “stream-id”,
server_urls: “wss://rtms.zoom.us”
});

Are you intending to do some heavy processing on the machine?

Hi Chun,

This makes sense. I have a couple of questions.

  1. I imagine we would want to have multiple meetings handled by one docker container since all we are interested in is the transcript and we won’t need to do a lot of cpu bound work. I assume that is fine in this scenario. We would prefer getting the transcript via webhook but It doesn’t appear to be possible. Correct me if I’m wrong.

  2. How does the cleanup get handled in this scenario? I noticed in the example linked below that the nodejs process keeps a map of clients. How do we safety clean up the memory once the meeting is over? We won’t have a way to signal disconnect from our webhook to our docker container.
    repo path on github: zoom/rtms-quickstart-js/blob/main/index.js

I am specifically interested in question #2.

Thanks!

@JasonKiwi .

If you are doing number 1, then it will going towards a monolithic app. If you are working with just transcript, a single VM should be able to handle easily by a single VM or single docker instance.
You can get transcript via webhook, but it is via cloud recording, and it won’t be real time.

For number 2, there are a number of ways you can tell if a meeting has stopped. One is via the webhook. If you are using webhook, you probably need some orchestrator to know which docker instance to terminate. Another way is from the event which is received on the RTMS SDK.