Description
We are building a real-time interpretation bot on top of the Zoom Meeting SDK
for Linux (C++, headless, deployed to AWS ECS Fargate). The bot joins meetings
as a co-host, follows participants into breakout rooms, captures per-participant
audio via onOneWayAudioRawDataReceived, and streams it into an STT and MT
pipeline. Output is delivered back to a web frontend in real time.
When the bot is inside a breakout room, we cannot reliably distinguish two
sources of audio that both arrive through the same callback:
- Speech from a participant inside the same breakout room — the audio we
want to translate. - The host’s “Broadcast Voice” from the main session — audio we want to
ignore (or at least route differently, e.g. label as a host announcement).
Both arrive via:
void onOneWayAudioRawDataReceived(AudioRawData* data, uint32_t node_id)
and we have not found a reliable way to classify the source.
Environment
- SDK: Zoom Meeting SDK for Linux (C++)
- SDK version: <PLEASE FILL IN, e.g. 6.x.x>
- Deployment: Headless raw-data subscriber bot, no UI
- Use case: Real-time multilingual interpretation across enterprise customers
What we tried
1) Filter by current BO participant roster
Hypothesis: if the node_id from the audio callback is not present in the
current breakout room’s participant list, the audio must be the broadcast from
the main session.
void onOneWayAudioRawDataReceived(AudioRawData* data, uint32_t node_id) {
auto* ctrl = GetMeetingService()->GetMeetingParticipantsController();
IUserInfo* user = ctrl->GetUserByUserID(node_id);
// ... check membership against BO roster ...
}
Problem: in our tests, node_id values overlap between the main session and
the breakout-room context. A node_id that resolves to a valid BO participant
via GetUserByUserID() can actually correspond to broadcast audio originating
from a different user in the main session. In other cases the lookup returns
nullptr. Either way, the lookup result is not a reliable source classifier.
2) Use IUserInfo::GetPersistentId() as a stable key
persistentId is documented as stable across BO ↔ main transitions for the
same user, which is great for tracking. But the audio callback only provides
node_id, and resolving it goes through the same context-scoped
IMeetingParticipantsController. So persistentId does not help us at the
classification step — it would only help if we already had a trusted
node_id → user mapping, which is exactly what we are trying to establish.
3) Inject a customer_key at join
Works only for participants we control. End-users join via the regular Zoom
client, so we cannot rely on this.
Questions
- Is there an officially supported way, at the Meeting SDK level, to
distinguish raw audio originating from a host’s “Broadcast Voice” from raw
audio of a breakout-room participant? - Is there a callback or flag indicating when broadcast voice starts/stops,
so that a bot inside a BO can apply windowed filtering? (We did not find
one such asonBroadcastBOVoiceStatusin the Linux SDK headers.) - Could you confirm the intended scope of the
node_idparameter delivered
toonOneWayAudioRawDataReceived? Our observation suggests it is scoped
per room/context rather than globally unique within the meeting — is this
by design? - If no current API supports the above, would the SDK team consider exposing
one of the following? Any of these would unblock our use case:- A source flag on
AudioRawData(e.g.MainSessionBroadcastvs.
LocalBO) - A delegate event that fires on the broadcast-voice receiving side
(begin/end), visible to BO participants - A globally-unique identifier (e.g.
persistentId) delivered alongside
node_idin audio callbacks
- A source flag on
This affects every bot use case that needs accurate per-speaker attribution
inside breakout rooms — interpretation, transcription, meeting minutes,
analytics — so we believe the benefit extends well beyond our team.
Thanks in advance for any guidance, and happy to provide additional logs or
a minimal reproduction if helpful.