Uptick in recording.started webhook delays

Thanks Tommy.

Could you give a bit more data on times you’re seeing these delays in responses? I’ve taken a look at our response times on our webhook ingestion service for the past week:

As you can see we do have some outliers with response times close to 3s, we’ll try getting those down. But our 99th percentile across the board is under 1s. If you can provide some more data like time ranges, we can dig in a bit more.

Hi @Tommy, we’re seeing another significant uptick in delayed webhooks this morning, particularly recording.* and meeting.participant_*. I can provide more data if its helpful.

Confirmed here with meeting.participant_*, biggest uptick to date with events showing a full hour later. Happy to give UUIDs too.

The outage appears to be continuing. >40 minute delays for these webhooks.

The webhook outage appears to have now cleared up, I’m assuming as US east coast people go offline. Any update on the status here @Tommy? Should we communicate with our users that we expect it to be better tomorrow or should we expect a similar outage?

Hey @BenS, I don’t see any contact info in your profile, but if you’d like to compare data at some point, feel free to email me at ryan@grain.co.

Hey @ryan, @BenS,

These delays are due to peak usage. We are working to fix the issue.

Thanks,
Tommy

@tommy

I just created and updated with fresh data this:

The issue is still very much ongoing. On the plus side, no major delay in the past few days. Events still routinely take several minutes to be sent.

As recommended, I started keeping track of how long our server takes to process the events to make sure they are bellow 3 seconds (the second sheet). No one was above 3s in the past few days but they can definitely be close enough that we’ll most likely see one at some point. I do intend on investigating further but I also think this is a separate issue.

Please let me know if you need more IDs, or anything that can help you get these events to us without several minutes of delay.

For the story, we are creating a virtual space layer bellow Zoom (much like Remo, Gather Town and all the other products filling this virtual space niche), and the timely delivery of events is critical to representing an accurate map of people’s locations.

We might open source the project for others wanting to leverage Zoom in this way.

Thank you as always for your attention to this.

What’s the plan for solving these delays @tommy?

The reason I ask is that many apps including ours depend on the timely arrival of webhooks. With the issue ongoing for months, when can we actually expect it to be fixed? Our customers depend on it

Hey @BenS, @jimig,

We have improved the performance of the Webhooks. Are you still seeing delays?

Thanks,
Tommy

@tommy I’m afraid so, do you want any IDs? I just updated the spreadsheet and it shows the usual 1 to 3 minutes delay.

Please let me know if I can do anything to help figure it out.

I’m seeing similar stats to @BenS, spikes of 2-4 minutes around the top of the hour:

Hey @ryan, @BenS,

Thanks for the additional info. We are looking into this and will get back to you.

-Tommy

We are also still seeing delays and this is adversely affecting the performance of our app. Time after time we are having to explain to customers why our app fails and it does not look good on either of us.

Here is an example from yesterday.

[2020-09-08 16:28:59] [Zoom] Webhook meeting.created {"event":"meeting.created","payload":{"account_id":"orHJZnGMSmOlh38YXIGneA","operator":"xxx","operator_id":"4aLd114pS3-1p_sOZ8hiOw","object":{"uuid":"Lq+xLAc0RIeSzjT8nU+FAQ==","id":88144547509,"host_id":"4aLd114pS3-1p_sOZ8hiOw","topic":"yyy","type":2,"start_time":"2020-09-08T16:15:00Z","duration":60,"timezone":"America/New_York"}}}

[2020-09-08 16:28:58] [Zoom] Webhook recording.started {"event":"recording.started","payload":{"account_id":"orHJZnGMSmOlh38YXIGneA","object":{"uuid":"Lq+xLAc0RIeSzjT8nU+FAQ==","id":88144547509,"host_id":"4aLd114pS3-1p_sOZ8hiOw","topic":"xxx","type":2,"start_time":"2020-09-08T16:22:54Z","timezone":"America/New_York","duration":60,"recording_file":{"recording_start":"2020-09-08T16:23:39Z","recording_end":""}}}}

Not only are the events 5 minutes late, they are also out of sequence. Will this ever be fixed? We would rather an honest answer so we can start to re-architect our systems. Bear in mind we’ve been waiting for several months.

Delays are pretty significant today, now trending in to double digit minutes and hold steady.

We have three delays from yesterday evening (UK time) with a huge delay and this is crucial for our customers. Keep in mind these are only the reported ones… It’s getting frustrating that this haven’t been solved for a few months…

81183882713 - 12 minutes
88225683678 - 5 minutes
82058445302 - 4 minutes

@tommy, could we get some kind of an update here, please? We haven’t heard you from a while…

We’re seeing sustained double digit minutes delays again today.

Hey @ryan, @jimig, @plamen.hristanov,

Which webhooks are you experiencing delays with?

-Tommy

Generally across the board aside from meeting.started and meeting.ended, but the ones that are most impactful to our application are recording.started and recording.stopped.