Multiple audio tracks/Observer users


We’re currently looking at a migration from Twilio for our application (group chats about market research) - we currently have multiple two separate audio streams running, enabling a translator user to provide an audio feed for users monitoring the session in another language. This works roughly as follows:

Two audio channels: “Main” and “Translator”

  • Respondents can only ever hear the Main channel, and their microphone audio is sent to the Main channel
  • Moderators hear and speak in the Main channel by default, but can switch to the Translator audio channel to both hear and speak there instead
  • Observers are silent participants, not visible to Respondents, but can choose to either hear the Main or Translator audio channel

All users are expected to see video feeds from all Respondents and Moderators at all times, but no video feeds are provided for Observers. We’re also looking to also expand this going forwards so Observers have a third audio channel where they can speak in private.

We currently manage this with Twilio’s Track Subscription APIs, which let us choose which users receive audio from each user - which gives pretty flexible control over how tracks are sent/received. I’ve not seen anything in the Web Video SDK that looks equivalent - is there any good way to achieve the above?

Thanks in advance!

Edit: Probably worth noting that we’re planning PSTN integration as a fallback, so any solution would need to make sure PSTN users aren’t just hearing everyone.

Hi @vl-jamescheese ,

Similar to our current productin of two video tracks that can be used, a second audio channel is in the works to be released soon.

Additionally, if you have two users in the session (translator and main speaker), you could lower the volume of the main speaker, using this method, when the translator is speaking. This would be most similar to Zoom client handles translator use-cases.


Hi Rehema,

If it’s anything like the existing support for multiple video tracks (ie: just allowing a each user to push multiple streams), then it’s not really relevant to our scenario.

We might be able to use Stream.adjustUserAudioVolumeLocally to achieve part of this - though given the users we had, it would be more like “mute all users who you shouldn’t be hearing” than just adjusting the volume of an individual - so presumably muteUserAudioLocally would be more useful.

Any thoughts regarding how this would impact PSTN? Without any particular support for it, I’d assume that all users would hear both main and translated audio at full, rendering the PSTN audio somewhat useless?



Hey @vl-jamescheese ,

Exactly. This is how the Zoom Client handles the language interpretation feature. The original language volume is low, and the interpreter volume is high. This is how TV networks do it as well when they are interviewing someone speaking a different language.

For PSTN, you can’t control the client side of the user who joined via phone. However, since they are considered a user within the session, you can still lower/stop their volume for other users.


Thanks again for following up.

Yeah, sadly that’s a bit of a problem - PSTN users are still users, and if you can’t stop them hearing the translator at the same volume as the rest of the audio, they’ll be unable to take part in any practical manner.

Makes it a bit of a deal-breaker for us, but in case it helps with future development - something like the old API in Twilio Video for handling Track Subscriptions solves this sort of problem pretty well:

Basically a centralized API for controlling which user can subscribe to which streams - so you can implement these once centrally for all client types.

Certainly aware that that sort of thing won’t be implemented for one client, but figured it’s worth pointing just in case!

Hey @vl-jamescheese ,

There is an option with PSTN audio, if the user who is connecting to session audio via phone is already in the session on the web, you should be able to still use the stream.adjustUserAudioVolumeLocally function. Simply set callMe: true in the stream.inviteByPhone function.

With the PSTN call out function, there are two use cases:

  1. A user is a PSTN only user, meaning they join completely over the phone.
  2. A user joins on the web, but uses the above mentioned PSTN audio option, meaning they are in the session on the web, but their audio choice is PSTN vs. computer audio (VOIP).

That being said, I am curious how you accomplished this with Twilio even with the track-subscription approach you sent, since the PSTN connection does not have a client side to call this function for.


Hi @tommy,

That sounds like an interesting option - given all of our dialins are users who are already connected in the browser (to consume video content etc), that might be viable. I’ll take a look into it - thanks!

Re: how we implemented it with Twilio - their Track Subscription API is a separate REST API separate from their client SDK, so we call it from our backend. Twilio fires a webhook call to our backend whenever a participant (PSTN included) joins, then we look up the user and set subscriptions as appropriate.

The one notable issue we had with this setup is that rules couldn’t be set before a participant connected - whilst web clients can just go with “don’t subscribe to your assigned tracks until they’ve been assigned”, PSTN dialins, as you’ve rightly identified, don’t have the ability to do that, so there was technically a few ms of time where all PSTN dialins received all audio - though practically speaking, our tests indicate that this was short enough to be inaudble.

Hope that clarifies - certainly aware it’s a bit of an odd scenario!



1 Like

Makes sense, thanks @vl-jamescheese .

Glad to hear the users are connected to the web, it sounds feasible than. :slight_smile:

Let me know if you face any issues.