We are trying to setup alerting for bad zoom calls.
What should be the recommended threshold values for a good call quality :-
Jitter
Avg Packet Loss
Latency
Bitrate
Jitter - ~125-200ms
Latency - ~300-400ms
Packet Loss - ~20%
CPU Usage - ~90%”
When the above values are configured then we get an email every 1 minute because the meeting are happening over the Internet, Home Internet, 3G/4G.
When the jitter and Latency was set to 1000ms then the email alert was on an average every 5 minutes.
Considering the global userbase i.e. 898 active users globally there are always going to be some calls in some part of the world which will be affected and we will always get an alert.
There should be some finite values we can specify in order to capture a real bad quality call ?
You can [go] to http://speedtest.net to check your network bandwidth. Generally, we recommend 1.2 to 1.5Mbps upload/download for an optimal experience on desktop or room systems.
However, you mentioned a variety of connection sources such as: Internet, Home Internet, and 3G/4G, and I wasn’t able to locate any specifics for what your thresholds should be set as while monitoring to be considered.
I asked some of our Customer Success Managers if we had “Optimal Operating Target” values for these metrics, and they provided me with the following information:
| Metrics | Ideal Threshold | Notes |
|-------------|-----------------|------------------------------------------------------|
| Jitter | < 150ms | Variation in time between packets arriving |
| Latency | < 300ms | Delay between packets being sent/received |
| Packet Loss | < 20% | Number of packets failing to reach final destination |
| CPU Usage | < 90% | Send and Receive rate you experience during the call |
Yes we are pooling QOS API stats via splunk for alerting so we know what user is impacted and what could be the possible reason for his or her meeting wrong.
below is the query we use in splunk :-
earliest=-5m@m index=connect2018 source="/opt/splunk/bin/scripts/ZoomLiveMeetingQuality.py"
| rename participants{}.user_id AS userID, participants{}.user_name AS userName, participants{}.user_qos{}.date_time AS DateTime
| stats
max(participants{}.user_qos{}.cpu_usage.zoom_avg_cpu_usage) AS AvgCPU,
max(participants{}.user_qos{}.audio_input.latency) AS AudioInLatency,
max(participants{}.user_qos{}.audio_output.latency) AS AudioOutLatency,
max(participants{}.user_qos{}.video_input.latency) AS VideoInLatency,
max(participants{}.user_qos{}.video_output.latency) AS VideoOutLatency,
max(participants{}.user_qos{}.audio_input.jitter) AS AudioInJitter,
max(participants{}.user_qos{}.audio_output.jitter) AS AudioOutJitter,
max(participants{}.user_qos{}.video_input.jitter) AS VideoInJitter,
max(participants{}.user_qos{}.video_output.jitter) AS VideoOutJitter,
max(participants{}.user_qos{}.video_input.max_loss) AS VideoInMaxLoss,
max(participants{}.user_qos{}.video_output.max_loss) AS VideoOutMaxLoss,
max(participants{}.user_qos{}.audio_input.max_loss) AS AudioInMaxLoss,
max(participants{}.user_qos{}.audio_output.max_loss) AS AudioOutMaxLoss,
max(participants{}.user_qos{}.audio_input.avg_loss) AS AudioInAvgLoss,
max(participants{}.user_qos{}.audio_output.avg_loss) AS AudioOutAvgLoss,
max(participants{}.user_qos{}.video_input.avg_loss) AS VideoInAvgLoss,
max(participants{}.user_qos{}.video_output.avg_loss) AS VideoOutAvgLoss
by meetingId, userName
| table meetingId, userName, AvgCPU, Audio*,Video*
| convert rmunit(Audio*)
| convert rmunit(Video*)
| convert rmunit(AvgCPU)
| where (VideoInLatency>1500 OR VideoOutLatency>1500) AND (AudioInLatency>1500 OR AudioOutLatency>1500) AND (AudioOutJitter>1500 OR AudioInJitter>1500) AND (VidoeOutJitter>1500 OR VideoInJitter>1500) OR (AudioOutAvgLoss>20 OR AudioInAvgLoss>20) AND (VideoOutAvgLoss>20 OR VideoInAvgLoss>20)
I provided the table above which contains what Zoom considers the “Optimal Thresholds”.
You also stated this about your use of Splunk…
Yes we are pooling QOS API stats via splunk for alerting so we know what user is impacted and what could be the possible reason for his or her meeting wrong.
I’m not a Splunk expert, and I’m unfamiliar with its DSL, but the where clause at the bottom of your filter appears to be calculating the VideoOutAvgLoss as a value of 20 instead of 20%. Are you certain the values you’re testing against coincide with the threshold values/units I provided in the optimization table earlier?
What other specific issues/questions do you need us to address for you please?
@suhailpuri , do you happen to have the Python “/opt/splunk/bin/scripts/ZoomLiveMeetingQuality.py” script in Github or any other SCM? I am starting with a Zoom QOS Monitoring project in python and it would be interesting to see how you attacked this matter.
I know this is very old post. However, I was also trying to achieve the same thing i.e. Using Zoom QOS, create alerts in Splunk for users who are experiencing issue with zoom call. Do you happen to have the python script ZoomLiveMeetingQuality.py?
@tommy,
Agreed, However these webhook alerts are triggered when we have performance issue with any meeting. We would like to ingest QOS data in Splunk or tools similar to this, so that we can check trends and predict issues.
Navigate to marketplace.zoom and log in to your Zoom account
Click Develop > Build App
Follow the steps to create a Webhook Only App
Fill in the following App Information:
App Name * Short Description * Company Name * Developer Name * Developer Email Address
Click Continue .
Enable Event Subscriptions .
Click the Add new event subscription button.
Enter the following information:
Subscription Name (For example, Splunk)
Event notification endpoint URL (For example, splunkhf_mycompany_com_4443)
Click the Add events button.
Subscribe to any Webhook Events you want. See the Zoom Webhook Reference page for more information.
Click Save .
Click Continue .
Activate the Webhook Only App
Question is what are ips for zoom to whitelist? Currently seeing 52.202.62.224,52.202.62.227, 52.202.62.196. Is that the whole list?
Please advise.
Thanks
Hi Mike, could you please share what security device did you choose to add the rule to whitelist the Zoom IPs and connection to Splunk HF on port 4443, that is, whether on firewall or on the proxy server ?
Did you also need to add a DNS record for the Splunk HF to map to splunkhf_mycompany_com ?
@bdeanindy I’ve been looking through this thread and wanted to check the ideal threshold you have in your table for Latency and Packet Loss. These values are quite different (Latency <300ms vserus <150ms and Packet Loss <20% versus <2%) to what is listed in the following link. Is there a reason for this or is there something I’m missing?
Thanks for reaching out about this. While these values are estimates, the values listed earlier in this thread were being referenced back in 2018, so I would recommend deferring to the values in this article.
If you have further questions about these values or how they were derived, please feel free to reach out to our Technical Support team as well, as they’re the experts in this domain.
Would it be possible for you to reach out to your tech team and post their response to how these values were derived, please? Also, are these the thresholds that are consulted when you issue meeting alerts (the Meetings Alert Webhook)?
This would help make this a self-contained thread that would be very useful to future visitors and we’d reduce the load on your tech support (avoid prompting them to answer the same questions repeatedly to different individuals).