Pricing & Feasibility Validation: RTMS vs. Linux SDK for External Meeting Capture

Hi,

Our organization is building an integration to capture audio and insights for our Sales team. A critical constraint of our use case is that our users are joining meetings hosted by external customers (we do not own the meeting infrastructure/tenancy).

We are evaluating two architectural approaches and need confirmation on the pricing and feasibility for each, specifically regarding the “Guest” context.

Use Case Definition:

  • M: Total meeting minutes per month.

  • B: Number of concurrent bot instances.

  • Context: Our app/bots join meetings hosted by external tenants. We need participant-separated audio.

Approach 1: Real-time Media Streams (RTMS)

Architecture: Our app installs on the Seller’s account. When the Seller joins an external meeting, we attempt to initiate an RTMS session to stream audio to our backend.

Questions:

  1. Feasibility: Can an app installed on a participant’s account initiate an RTMS stream for a meeting hosted by a different organization? Or is RTMS strictly a host-side privilege? Can the org’s admin block our app so that we cannot even request the Meeting host’s approval?

  2. Pricing: If feasible, does this require the Zoom Developer Pack? Is the cost consumption-based (e.g., per minute) or license-based?

Approach 2: Linux Meeting SDK (Headless Bots)

Architecture: We spin up headless Linux instances (Docker) using the Meeting SDK. These “bots” join the external meeting as a guest participant alongside the Seller to capture Raw Audio.

Questions:

3. Bot Licensing: If these bots join using ZAK tokens generated from Basic (Free) users within our account, are there any usage costs? Do we need to provision a “Pro/Licensed” seat for every bot instance (B), or can they function as free guests?

4. Raw Data Cost: Does accessing Raw Audio data via the Linux SDK incur a per-minute fee (similar to Video SDK) or require a specific “Raw Data” license add-on, or is this included in the standard SDK usage for free?

We are trying to estimate the exact operational costs for these models. Any clarification on the latest pricing model for these specific scenarios would be appreciated.

Thanks!

RTMS is not strictly “host-only,” but the host org controls whether your app can access content: admins can enable/disable RTMS and hosts can enforce “Require host approval” (including flows where “the app may be blocked entirely based on organizational policy”). If it’s allowed, RTMS is designed to deliver “per-participant structured data streams,” which is the right primitive for participant-separated audio.

Zoom positions RTMS as “available through the Zoom Developer Pack, a flexible, credit-based add-on” (pricing is via sales, not a public per-minute schedule). So for cost modeling, treat it as consumption against Developer Pack credits rather than “free with app install.”

Linux Meeting SDK raw audio isn’t available to an arbitrary “guest” participant: startRawRecording requires the bot to have host/co-host/recording permission or to use a host-provided recording_token. For cost/entitlements, Zoom staff indicate the Meeting SDK is available to Pro, Business, and Enterprise accounts at no additional cost, and there’s no doc-backed per-minute “raw data” fee - access can still depend on the account-level “Allow access to raw data” entitlement.

Keep in mind that if building with the Linux Meeting SDK, you will also need to implement either ZAK or OBF tokens following Zoom’s latest changes to zoom client join flows