We’re currently looking at a migration from Twilio for our application (group chats about market research) - we currently have multiple two separate audio streams running, enabling a translator user to provide an audio feed for users monitoring the session in another language. This works roughly as follows:
Two audio channels: “Main” and “Translator”
Respondents can only ever hear the Main channel, and their microphone audio is sent to the Main channel
Moderators hear and speak in the Main channel by default, but can switch to the Translator audio channel to both hear and speak there instead
Observers are silent participants, not visible to Respondents, but can choose to either hear the Main or Translator audio channel
All users are expected to see video feeds from all Respondents and Moderators at all times, but no video feeds are provided for Observers. We’re also looking to also expand this going forwards so Observers have a third audio channel where they can speak in private.
We currently manage this with Twilio’s Track Subscription APIs, which let us choose which users receive audio from each user - which gives pretty flexible control over how tracks are sent/received. I’ve not seen anything in the Web Video SDK that looks equivalent - is there any good way to achieve the above?
Thanks in advance!
Edit: Probably worth noting that we’re planning PSTN integration as a fallback, so any solution would need to make sure PSTN users aren’t just hearing everyone.
Similar to our current productin of two video tracks that can be used, a second audio channel is in the works to be released soon.
Additionally, if you have two users in the session (translator and main speaker), you could lower the volume of the main speaker, using this method, when the translator is speaking. This would be most similar to Zoom client handles translator use-cases.
If it’s anything like the existing support for multiple video tracks (ie: just allowing a each user to push multiple streams), then it’s not really relevant to our scenario.
We might be able to use Stream.adjustUserAudioVolumeLocally to achieve part of this - though given the users we had, it would be more like “mute all users who you shouldn’t be hearing” than just adjusting the volume of an individual - so presumably muteUserAudioLocally would be more useful.
Any thoughts regarding how this would impact PSTN? Without any particular support for it, I’d assume that all users would hear both main and translated audio at full, rendering the PSTN audio somewhat useless?
Yeah, sadly that’s a bit of a problem - PSTN users are still users, and if you can’t stop them hearing the translator at the same volume as the rest of the audio, they’ll be unable to take part in any practical manner.
Makes it a bit of a deal-breaker for us, but in case it helps with future development - something like the old API in Twilio Video for handling Track Subscriptions solves this sort of problem pretty well:
That sounds like an interesting option - given all of our dialins are users who are already connected in the browser (to consume video content etc), that might be viable. I’ll take a look into it - thanks!
Re: how we implemented it with Twilio - their Track Subscription API is a separate REST API separate from their client SDK, so we call it from our backend. Twilio fires a webhook call to our backend whenever a participant (PSTN included) joins, then we look up the user and set subscriptions as appropriate.
The one notable issue we had with this setup is that rules couldn’t be set before a participant connected - whilst web clients can just go with “don’t subscribe to your assigned tracks until they’ve been assigned”, PSTN dialins, as you’ve rightly identified, don’t have the ability to do that, so there was technically a few ms of time where all PSTN dialins received all audio - though practically speaking, our tests indicate that this was short enough to be inaudble.
Hope that clarifies - certainly aware it’s a bit of an odd scenario!