Hey everyone, I’ve built a Linux app using Linux Zoom SDK which joins Zoom meetings and records the audio of individual participants
And now I’m facing 3 different problems which require some solution
As the audio file is different for different participants, if the meeting is 30 mins there will be 3 audio files with a combined length of 90 minutes which is expensive to process for getting transcriptions
I want to combine the transcripts of all the users from different audio files like User1: Hi
User2: Hey, what’s up?
I want the time stamp as well for those transcripts
P.S I have my own ASR model which I use for speech-to-text but the speaker diarization is currently missing So I’m not looking for any 3rd party service providers.
The final result I’m expecting is similar to how Zoom provides transcripts
00:00:47.000 → 00:00:52.000
David: Hey, Good Morning
00:00:53.154 → 00:00:55.025
Goggins: Very Good Morning, David
@rakshithgowdahr if you are using raw recording, you will need to work on the timestamp on your side.
There might be a way for you to achieve this, but I do not guarantee it will work.
This is a suggestion, and it does not represent the functionality of Zoom’s SDK.
Use the onOneWayAudioRawDataReceived to identify who is speaking, and record the speaking time into metadata
However you will need to handle if there is overlapping of audio.
Thanks @chunsiong.zoom ,
I have a follow-up question on this,
I’ve built a Zoom bot using Linux SDK for accessing meeting transcripts but I don’t want to raw record audio and process the audio for transcripts by myself So I have 2 solutions in mind
Can the Linux Zoom bot access Closed Captions or Live transcripts? Because I could not find the support for Closed Captions in Linux SDK though they’re present in Mac and Windows SDK
If I build a Zoom Marketplace App and publish it Can it access Closed Captions or Live transcripts?
If Linux SDK does not support ZoomSDKCloseCaptionController when will the support be added?
Can the Linux Zoom bot access Closed Captions or Live transcripts? Because I could not find the support for Closed Captions in Linux SDK though they’re present in Mac and Windows SDK
No, it is not available on Linux.
If I build a Zoom Marketplace App and publish it Can it access Closed Captions or Live transcripts?
Not on Linux. But it is available on other platforms
If Linux SDK does not support ZoomSDKCloseCaptionController when will the support be added
There are no plans to do so at the moment. If you could elaborate on the use-case / user story, I can put in a feature request for this.
Thanks for the detailed explanation @chunsiong.zoom ,
I have a couple of more questions
If I build a Zoom bot using Windows SDK, Can I access Closed Captions or Live transcripts with speaker identification?
Does the bot has to be the host to get this data or any participant can access it?
Does the host have to enable Live transcripts for every meeting?
Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?
How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?
@chunsiong.zoom Sorry, the post was not updated, there are few more questions left
Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?
How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?
Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?
Yes, the bot can start transcription service, and the bot can also receive those transcript from Windows Meeting SDK.
How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?
The Web SDK cannot programmatically start transcription. The Web SDK can receive the transcripts.
Hey @chunsiong.zoom ,
Based on my research and understanding I believe that the Web SDK does receive transcripts but it does not have speaker diarization, Am I wrong? If speaker identification is supported in Web SDK can you guide me to some good docs?
As Chun Siong said, the support for each platform varies:
Zoom Linux SDK
Does not support closed captions
Does not support live transcripts
Zoom Windows SDK
Supports closed captions
Supports live transcripts with speaker identification
Can start transcription service
Web SDK
Can receive transcripts
Cannot programmatically start transcription
Does not provide speaker diarization detail
These are all important implementation details to consider when deciding on how to build your bot solution.
Another alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots. Recall also provides the transcripts from Zoom out-of-the-box without having to worry about these implementation details.
In the Windows Meeting SDK documentation, the StartLiveTranscription() method says If the meeting allows multi-language transcription,all users can start live transcription.Otherwise only the host can start it
Does this mean the Windows meeting bot (the bot is not the host) cannot start the transcription service? If yes, then what are all the requirements/actions needed for the bot to start the transcription service?
If the transcription is disabled by host, no one can start live transcription.
If the transcription is enabled by host, the bot can start the transcription service.
@chunsiong.zoom
Thanks, I’ve used the demo apps you’ve set up.
Currently, I’m using the CaptionDemo app and I have a few questions.
Can it start the live transcription though it’s not the host?
And for some reason, it’s not joining the meeting.
Meeting Id, password, jwt_token everything is correct
@chunsiong.zoom
It’s not an external meeting, the meeting belongs to the same user who created the app on the marketplace. It’s still in development So we haven’t published the app yet but in the future, we will. And one more thing, using the same credentials I’m able to join the meeting using the SDK demo app included in the Windows SDK kit.