Real-time audio/transcript per user extraction and realtime audio sent to the meeting

We are exploring the possibility of accessing real-time audio data from ongoing Zoom meetings using APIs. While Zoom provides cloud recording functionality, we don’t have the access of real time audio.

Our specific requirements are as follows:

  1. Real-Time Audio Access: We need to capture the audio data from ongoing meetings in real time for processing purposes.
  2. Real-Time Response Delivery: After processing the audio, we generate a response in audio format which needs to be sent back to the meeting. This response should be spoken within the meeting through a bot.

Kindly help us without including any third party api solution.

Thank you !

Hi Xcelyst,

Thanks for reaching out!

To achieve real-time audio access and response delivery within Zoom meetings without third-party solutions, you’ll need to consider the following approaches:

1. Capturing Real-Time Audio (Options Available):

To access real-time audio from Zoom meetings, you have multiple options:

  • Building a Meeting SDK Bot (Recommended):
    • Using the Native Linux Meeting SDK is highly recommended, as it allows you to capture audio per participant and also send audio back to the meeting.
    • This approach provides the most control and flexibility for real-time processing.
  • Live Streaming Option:
    • You can live stream the meeting audio to a custom RTMP server for processing.
    • However, this method provides a mixed audio stream rather than per-participant audio.
  • Real-Time Messaging (RTMs) API:
    • While Zoom has RTMs capabilities, real-time audio features are currently not live, making this option unavailable for now.

2. Sending Back Processed Audio (Options Available):

Once you’ve processed the audio, sending it back to the meeting requires one of the following approaches:

  • Native Meeting SDK (Preferred Option):
    • This allows seamless integration and transmission of processed audio per participant back into the meeting.
    • It ensures precise audio control for better user experience.
  • Zoom In-Client (Embedded App):
    • You can build an embedded app within the Zoom client to inject processed audio.
    • However, this approach requires you to handle your own audio transmission (e.g., translated audio based on end-user preferences).

Let me know if you need further details or guidance in setting up any of these solutions.

Best,
Naeem Ahmed