[urgent] Connectivity & data center issues in Tokyo region

Hi,

We got massive reports from our users that they are facing many issues within our application to connect to meetings.
This started an hour ago and continue.

After checking our monitoring platform, it appears that the issues come from Zoom network & data center hosting (by experience)

Description of the issue

  1. When connecting / join a zoom meeting with the web SDK, users get an error about a web socket connection sisue, resetting the web client.

  2. We also noticed that almost none of our meetings are hosted in Japan while 100% should be and are usually. Instead, meeting are hosted either in Singapore or US which are the two other regions we picked up in our account settings for the gateway ping selection.

Javascript error

Additional information

There does not seems to have any real issue with the gateway, the ping time looks good. But then when trying to join on the Tokyo DC (since the Tokyo gateway is the fastest to reply), users get the mentioned error.
When they can finally successfully join the meeting, even if the Tokyo gateway was the fastest to reply, they actually connect to a Singapore or a US gateway / data center.

@donte.S Can you please investigate this ASAP ?

1 Like

Checking a few meetings that were hosted in Singapore instead of Tokyo, all users faced the issue multiple time, not being able to connect at all.

When checking ping time, they all looks as usual, Tokyo replying in a few milliseconds (50ms) as usual.
Despite the fact that users could not actually join the zoom meeting, on the zoom meeting dashboard, it appears as some attempts (not all) where actually going far enough in the connection process so that you consider the users in the meeting.

Looking at the entries recorded on the zoom meeting dashboard, they all appear to connect to either Singapore or a US data center and gateway, which should not happen.

Also, we found that the ping on the New York gateway is always failing, but this is because you deactivated it as you are planning a maintenance on the New York data center tomorrow.
But could this cause the issues we have ? Not sure, still something to investigate.

Just to be clearer on how this is affecting our users. A meeting is a lesson of 45 minutes. They have to try many times before being able to join the meeting. Some users do not succeed a single time to properly connect within these 45 minutes, which means no meeting at all.

When they are lucky enough so that all participants can be connected to the meeting, the quality is crap as they are all going through different region / data center / gateway, which increase the latency and decrease the quality of both audio & video.

As a conclusion, in such conditions, 0% of our meetings can be conducted.

I have just tested it and I was able to replicate it 4 out of 5 times I tried to join a zoom meeting. Here are the errors I got on my Chrome browser console.

I got the web socket connection error as soon as it tries to join the Zoom meeting. I have already sent the meeting IDs I used during testing via the Zoom Support Ticket we submitted a while ago.

Here is the Zoom support ticket URL
https://support.zoom.us/hc/en-us/requests/14271009

Looking forward to your soonest reply as this issue is affecting almost all of our lessons (causes revenue loss) until now in the past 3 hrs.

Thanks,
Lara

This started around 6:10 p.m JST, still on-going.

This graph show how many time users got the issue per minute since it started, until the end of our business day today.

We really hope this will be fixed before tomorrow 7:00 a.m JST.

Yesterday, we have temporary removed Japan from the list of regions to be pinged by the Web SDK on our production account in order to mitigate this issue.

This morning, testing the Tokyo gateway & DC again from our staging account, we cannot reproduce the issue anymore. TY is correctly pinged again and our meetings are hosted in Tokyo the same as before the incident.

So:

  • did you fix anything ? If yes, could you please report to us what happened and how you fixed it ?
  • if no, it means there is no guarantee this won’t happen again soon, so please continue to investigate the issue.

Got again same symptoms, for one user only, around 10:44 a.m JST.

But this time looks like mutliple gateways went wrong.