Need help on Zoom meeting bot

Hi, zoom talents.
I am working on zoom meeting bot which can join meeting and transcript it per user.
I used GitHub - zoom/meetingsdk-headless-linux-sample: A demo on creating a headless meeting bot using the Zoom Meeting SDK for Linux and Docker as boilerplate.
At first, I got mixed audio and sent it to google stt api. This works for contents.
But I want to add names for every text of users in transcription like:
Yaskiv: Hi, friend, how are you doing?
Mycola: I am good, what about you?

Give me advice technically in detail.
Thanks

@yaskivartur0830 you should use the one way audio, where individual’s audio are seperately returned to you as raw audio

Hey @yaskivartur0830,

It sounds like you’re generating a transcript already but are interested in diarizing it. We run meeting bots at scale to record/transcribe video conferences and so this is definitely something we are very familiar with - hopefully I can provide some guidance here!

Option 1: Linux SDK speaker changes

The key here is that you need to know:

  1. Which participant ID is speaking at any given time
  2. What the underlying name of a given participant ID is

Since you’re using the Linux SDK already, you’ll likely want to look into the onActiveSpeakerVideoUserChanged() callback, which will tell you when the active speaker changes.

Then, you can use the IMeetingParticipantsController’s GetUserByUserID method to get the underlying user’s info including their display name. Once you have this, you’ll be able to map a given transcript utterance to their corresponding speaker label.

Option 2: Recall.ai

Another option is to use Recall.ai. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video, diarized transcriptions, and metadata from meetings in just a few lines of code.

Let me know if you have questions!

Hi @yaskivartur0830

I’m working on an open source API for creating Zoom Bots called Attendee. The source code shows how to transcribe meeting audio and add the speaker names. It uses the one-way audio streams instead of mixed audio.

Some relevant parts of the codebase:

Please let me know if you have any questions