Description
We’re using Puppeteer to launch a Chrome instance with the --use-fake-device-for-media-stream flag. However, we occasionally encounter an issue where the speaker doesn’t work—we can’t hear the other participants speaking.
Upon checking the Zoom management page, we found that the microphone status showed an error: getUserMedia failed.
Could you help investigate the cause of this issue? It only occurs intermittently; in most cases, everything works as expected.
Here is the session ID for reference: 3SS0uapFTLyfj7n33YeFGA==
After analyzing the log, we found that when attempting to capture audio, the browser returned a NotReadableError. This may be due to the improper configuration of the fake media. Did you also set the --use-file-for-fake-audio-capture argument?
If your goal is only to hear remote users’ audio, you can set the speakerOnly option to true.
We only want to hear remote users’ audio. If I set the speakerOnly option to true, will startAudio still work and allow us to hear others even if the browser throws a NotReadableError?
Or does setting speakerOnly = true ensure that the NotReadableError will never occur?
Could you help investigate why we’re unable to render the video stream from other users?
Could this be related to the microphone issue mentioned earlier?
@vic.yang Could you help troubleshoot whether the delay was caused by a Zoom SDK problem, a network issue, or resource limitations on the client side (e.g., memory or CPU usage)?
@vic.yang Could you help investigate the issue where both the microphone and speaker are not accessible for the RECORDING_BOT when starting audio with the speakerOnly option set to true in the following sessions:
After analyzing the logs, these two BOT users did not successfully start audio, possibly due to the browser’s autoplay policy requiring user interaction.
You can listen for the auto-play-audio-failed event.
If you’re using Puppeteer, consider adding the launch flag --autoplay-policy=no-user-gesture-required.
@vic.yang Do you know why this happens randomly with the same code? Could it be related to the timing of the startAudio call — maybe it’s being called too soon?"
If stream.startAudio is called immediately upon joining a session, it might fail. Use the auto-play-audio-failed event to get notified of this issue so you can prompt users to interact with the page to restore audio.
‘Called immediately upon joining a session’ here means there’s no opportunity for user interaction between joining the session and calling startAudio.
For more details on auto-play policy, you can refer to the documentation on MDN.
May I ask a bit more? Is it possible that startAudio sometimes doesn’t return an error directly, and instead we need to listen for the auto-play-audio-failed event? Similarly, for video — in cases where errors aren’t caught by startVideo,…— how should I handle those situations?
startAudio has a timeout mechanism — if it fails to join audio within 40 seconds, it will return a timeout error.
When auto-play requirements aren’t met, the Video SDK interrupts the join process and emits the auto-play-audio-failed event. In that event, simply guiding the user to interact with the page will allow the SDK to resume the join process.
startVideo doesn’t have this issue, as browsers don’t impose autoplay restrictions on video capture.
@vic.yang Continue follow the reason why BOT users sometimes fail to start audio:
it may be due to the browser’s autoplay policy, which requires user interaction before playing audio.
To start a BOT joining a meeting, we launch a separate EC2 instance for each BOT using a headless browser, which joins the meeting without any manual user interaction. However, this issue only occurs in rare cases — most BOT sessions do not encounter autoplay policy errors.
At this time, we haven’t been able to reliably reproduce the error, which makes it difficult to pinpoint and address the root cause with supporting evidence. Do you have any suggestions or recommendations that might help us investigate further? Thank you very much!
Our problem is that we haven’t been able to reproduce the error to confirm whether it’s actually caused by the autoplay policy (user-gesture-required). Because of this, we’re uncertain if adding --autoplay-policy=no-user-gesture-required will fully resolve the issue.
In normal cases, when we check the autoplay policy via about://media-engagement, it already shows no-user-gesture-required—even without explicitly setting the flag.
Could you recommend the best practice for when to start audio?
Should we call startAudio immediately after joining the session, or wait for a specific event?
Is it safe to call stream.startAudio and stream.switchSpeaker in parallel?
Are there any potential issues, and what would be the recommended approach for this scenario?