How to protect against losing refresh_token response

Description
refresh_token may be lost by the network

Error
The Zoom OAuth2 documentation https://marketplace.zoom.us/docs/guides/auth/oauth#refreshing
says “The latest refresh token must always be used for the next refresh request.”
This way, if we make a request to get a new access token, and the request is successfully received by Zoom, and new tokens are issued, but on the way back there is a network failure, we would lose the new refresh token and the integration would no longer function correctly, through no fault of the user.

This would create a bad experience for the user, as they would be un-authenticated, and stop receiving updates, or be requested that they re-authenticate. They may associate with both Zoom and our application.

Do you have any workarounds for allowing retrying of the token_refresh or do you plan to add an idempotent version of the API where we can protect against network errors.

Which App Type (OAuth / Chatbot / JWT / Webhook)?
OAuth

Which Endpoint/s?
/oauth/token?grant_type=refresh_token

How To Reproduce (If applicable)
Steps to reproduce the behavior:

  1. Configure OAuth App
  2. Create a fake network error when refreshing a token
  3. The app is now unauthorized, and it is impossible to retry the request

Screenshots (If applicable)

Additional context

Hey @engineering1,

The type of network errors you are referring to are very rare. If you do see the network issue happening, we can discuss how to add a margin with your refreshes.

In the meantime if this happens, you can simply direct the user to the install url which will re-authenticate them automatically and give you a new set of tokens.

Ket me know if that makes sense! :slight_smile:

Thanks,
Tommy

The approach of invalidating refresh_token looks like a protocol flaw. The protocol stops being idempotent, and the users may suffer from reasons which are totally independent on either parties.

Other apps (Google, Asana, Jira, Figma etc.) never invalidate refresh_token, they just return the same refresh_token which was passed (or don’t return it at all), exactly for the explained reasons.

We could’ve also said that “packet loss in TCP is rare”, but still there is a retry mechanism implemented in the protocol. Instead of implementing TCP sequence numbers and retransmits, they could’ve just ask the user to reconnect, right? Not quite. This logic of something being “rare” doesn’t work for protocols.

Tommy, may I ask you to escalate this questions to software engineers for their future consideration, please? Because it may impact your client base actually: imagine all the integrated apps lost all refresh_tokens due to some temporary problem in your datacenter. There will be some significant users churn in this case (not everyone will follow the install url).

In oauth2 RFC, it’s said that if the API wants to issue a new refresh_token, it MAY invalidate the old one (i.e. there is no requirement of invalidating). If refresh_token is so transient, then what’s the theoretical difference between access_token and refresh_token, they both play very similar roles.

Hello @DimitriKo
You’re correct, concurrency issues on distributed systems may arise, and in cases where network connectivity is poor (on mobile in poor coverage area, or network switching between WiFi and Cellular), the probability of this edge case occurring increases.

However, we don’t want to increase the replay attack vulnerability either (which is likely the driver behind our engineering team’s implementation here, but that’s not been verified by our engineers…so take it with a grain of salt please).

All this being said, I understand the issue you’ve described.

Here are a couple of solutions that come to mind our engineering team “could” implement (please note, these are just my ideas, it doesn’t mean engineering will agree or adopt any of these ideas):

  • Only invalidate original refresh_token AFTER the new access_token has made at least one request (indicating to Zoom the application successfully received the refreshed tokens)

  • Add a time-delay grace period (perhaps 60 or 120 seconds) before invalidating refresh_token allowing the application to re-execute refresh flow if refreshed tokens are not received within specific time.

  • Implement an “offline” access_type similar to Google, but that would require some serious refactoring of some critical systems methinks.

We will raise this issue and see what engineering has to say on this matter, and share what we learn.

2 Likes

Thanks @bdeanindy for the explanation and reaching out to your team. Please follow up when you can. As @DimitriKo mentioned, there is currently a quite significant risk not just for applications like us to lose the connection, but also for zoom’s marketplace reputation. A single incident might have oversized impact and hurt the user’s goodwill.

At Slapdash, we like your focus on making the Zoom marketplace a “quality over quantity” and hoping this feedback would make the experience even more reliable for your customers.

2 Likes

Hey @engineering1,

After speaking with our engineers, the current way we will handle this is if the refresh token is lost, then we will increase your token expiry tolerance on a case by case basis.

Thanks,
Tommy

Hi @tommy - thanks for reaching out. Can you provide more details about what process do you have in mind for this case-by-case basis? It sounds like a manual and slow process, and meanwhile users might not be able to derive value out of Zoom Marketplace apps. We’re further worried about the security issues related to even communicating the cases.

Hey @engineering1,

You would reach out here or on developersupport@zoom.us.

We are discussing increasing the token expiry tolerance for all OAuth apps.

In the meantime, please reach out if you have this issue and we will be happy to help.

Thanks,
Tommy