Creating a transcription service for Zoom


I am working on a tool that can take the live audio (maybe also video?) stream from a Zoom call and provide an augmented transcript with extra information about topics and keywords.

I am aware that one non-streaming solution would be to work with recordings, but this is not what I need.

I have seen that there is an API for providing CC to zoom but I can’t figure out how to access the audio stream from the call.

A post that seems to be doing something similar is this one - Questions about creating a custom streaming service for zoom meetings

Does someone has experience with this? It seems like accessing raw audio / video should be a common case.

Thank you!


Hi @gurumov92,

Currently there are two options through which you can obtain access to the audio stream of a meeting:

  • RTMP live streaming: By setting up a custom live streaming service, you can receive the audio/video streams from a given meeting. This approach is ideal if you don’t need to access each user’s individual feeds, as all of the audio will be in one stream.
  • Raw recording: Built on top of the desktop SDK’s local recording feature, you can join a meeting through the SDK and initiate a raw recording to access each user’s individual audio streams. This requires an instance of the macOS or Windows Meeting SDK is in the meeting and has permission to start a local recording. For a high-level overview of how to setup this flow, see our documentation.