Need help in extracting transcripts from the audio file

rakshithgowdahr · March 12, 2024, 11:09am

Hey everyone, I’ve built a Linux app using Linux Zoom SDK which joins Zoom meetings and records the audio of individual participants
And now I’m facing 3 different problems which require some solution

As the audio file is different for different participants, if the meeting is 30 mins there will be 3 audio files with a combined length of 90 minutes which is expensive to process for getting transcriptions
I want to combine the transcripts of all the users from different audio files like User1: Hi
User2: Hey, what’s up?
I want the time stamp as well for those transcripts

P.S I have my own ASR model which I use for speech-to-text but the speaker diarization is currently missing So I’m not looking for any 3rd party service providers.
The final result I’m expecting is similar to how Zoom provides transcripts

00:00:47.000 → 00:00:52.000
David: Hey, Good Morning

00:00:53.154 → 00:00:55.025
Goggins: Very Good Morning, David

chunsiong.zoom · March 14, 2024, 4:09am

@rakshithgowdahr if you are using raw recording, you will need to work on the timestamp on your side.

There might be a way for you to achieve this, but I do not guarantee it will work.
This is a suggestion, and it does not represent the functionality of Zoom’s SDK.

Use the onOneWayAudioRawDataReceived to identify who is speaking, and record the speaking time into metadata

However you will need to handle if there is overlapping of audio.

rakshithgowdahr · March 14, 2024, 5:15am

Thanks @chunsiong.zoom ,
I have a follow-up question on this,
I’ve built a Zoom bot using Linux SDK for accessing meeting transcripts but I don’t want to raw record audio and process the audio for transcripts by myself So I have 2 solutions in mind

Can the Linux Zoom bot access Closed Captions or Live transcripts? Because I could not find the support for Closed Captions in Linux SDK though they’re present in Mac and Windows SDK
If I build a Zoom Marketplace App and publish it Can it access Closed Captions or Live transcripts?

If Linux SDK does not support ZoomSDKCloseCaptionController when will the support be added?

chunsiong.zoom · March 14, 2024, 5:49am

@rakshithgowdahr

Can the Linux Zoom bot access Closed Captions or Live transcripts? Because I could not find the support for Closed Captions in Linux SDK though they’re present in Mac and Windows SDK

No, it is not available on Linux.

If I build a Zoom Marketplace App and publish it Can it access Closed Captions or Live transcripts?

Not on Linux. But it is available on other platforms

If Linux SDK does not support ZoomSDKCloseCaptionController when will the support be added

There are no plans to do so at the moment. If you could elaborate on the use-case / user story, I can put in a feature request for this.

?

rakshithgowdahr · March 14, 2024, 6:07am

Thanks for the detailed explanation @chunsiong.zoom ,
I have a couple of more questions

If I build a Zoom bot using Windows SDK, Can I access Closed Captions or Live transcripts with speaker identification?
Does the bot has to be the host to get this data or any participant can access it?
Does the host have to enable Live transcripts for every meeting?

Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?

How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?

chunsiong.zoom · March 14, 2024, 6:54am

@rakshithgowdahr

If I build a Zoom bot using Windows SDK, Can I access Closed Captions or Live transcripts with speaker identification?

yes

Does the bot has to be the host to get this data or any participant can access it

All participants will have access to it, you do not need to be host

Does the host have to enable Live transcripts for every meeting?

Yes it needs to be enabled, it can be enabled from participant’s side.

rakshithgowdahr · March 14, 2024, 7:01am

@chunsiong.zoom Sorry, the post was not updated, there are few more questions left

Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?

How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?

chunsiong.zoom · March 14, 2024, 7:38am

@rakshithgowdahr

Can the bot enable Live transcripts on/from the bot’s account? You know a bot is also like any other participant So If a bot tries to enable Live transcripts and the meeting host provides the permission Can I access those transcripts from my Windows SDK?

Yes, the bot can start transcription service, and the bot can also receive those transcript from Windows Meeting SDK.

How about the Web SDK instead of Windows SDK? Can the Web SDK do the things mentioned above or do I still need Windows SDK for that?

The Web SDK cannot programmatically start transcription. The Web SDK can receive the transcripts.

rakshithgowdahr · March 15, 2024, 5:37am

Hey @chunsiong.zoom ,
Based on my research and understanding I believe that the Web SDK does receive transcripts but it does not have speaker diarization, Am I wrong? If speaker identification is supported in Web SDK can you guide me to some good docs?

ZoomMtg.onCaptions(data => {
    console.log('Closed caption data:', data);
});

Sample data

{
    "sequence": 3,
    "type": 1,
    "lang": "en-US",
    "text": "Hello, this is an example of closed caption text."
}

chunsiong.zoom · March 15, 2024, 6:20am

@rakshithgowdahr ,

Closed captions is not transcription.

Closed caption is for a person to caption it manually.

Transcription is speech to text service, and this is most likely what you are looking for.

ZoomMtg.inMeetingServiceListener('onReceiveTranscriptionMsg', function (data) {console.log('onReceiveTranscriptionMsg', data);});

Web SDK does support receiving transcription, but you cannot programmatically start the transcription service from Web SDK.

No, we do not provide speaker diarization detail from this callback.

amanda-recallai · March 15, 2024, 10:29pm

Hey @rakshithgowdahr!

As Chun Siong said, the support for each platform varies:

Zoom Linux SDK

Does not support closed captions

Does not support live transcripts

Zoom Windows SDK

Supports closed captions

Supports live transcripts with speaker identification

Can start transcription service

Web SDK

Can receive transcripts

Cannot programmatically start transcription

Does not provide speaker diarization detail

These are all important implementation details to consider when deciding on how to build your bot solution.

Another alternative is to use Recall.ai for your meeting bots instead. It’s a simple 3rd party API that lets you use meeting bots to get raw audio/video from meetings without you needing to spend months to build, scale and maintain these bots. Recall also provides the transcripts from Zoom out-of-the-box without having to worry about these implementation details.

Let me know if you have any questions!

rakshithgowdahr · March 23, 2024, 4:43pm

Hi @chunsiong.zoom ,

In the Windows Meeting SDK documentation, the StartLiveTranscription() method says
If the meeting allows multi-language transcription,all users can start live transcription.Otherwise only the host can start it
Does this mean the Windows meeting bot (the bot is not the host) cannot start the transcription service? If yes, then what are all the requirements/actions needed for the bot to start the transcription service?

chunsiong.zoom · March 25, 2024, 2:45am

@rakshithgowdahr

If the transcription is disabled by host, no one can start live transcription.
If the transcription is enabled by host, the bot can start the transcription service.

rakshithgowdahr · April 2, 2024, 8:10am

@chunsiong.zoom
Thanks, I’ve used the demo apps you’ve set up.
Currently, I’m using the CaptionDemo app and I have a few questions.
Can it start the live transcription though it’s not the host?
And for some reason, it’s not joining the meeting.
Meeting Id, password, jwt_token everything is correct

chunsiong.zoom · April 2, 2024, 8:37am

@rakshithgowdahr error 63 means you are trying to join an external meeting without publish your app.

You will need to publish your meeting SDK app if you want to join a meeting created by an external host.

rakshithgowdahr · April 2, 2024, 9:42am

@chunsiong.zoom
It’s not an external meeting, the meeting belongs to the same user who created the app on the marketplace. It’s still in development So we haven’t published the app yet but in the future, we will. And one more thing, using the same credentials I’m able to join the meeting using the SDK demo app included in the Windows SDK kit.

chunsiong.zoom · April 2, 2024, 10:12am

I’ll pm you for more details

rakshithgowdahr · April 2, 2024, 10:47am

@chunsiong.zoom Here’s the details,

    "sdk_jwt": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzZGtLZXkiOiI1UGd6bTFtTlFudXVRTHlWTEt3eW93IiwiYXBwS2V5IjoiNVBnem0xbU5RbnV1UUx5VkxLd3lvdyIsIm1uIjo4Mzc2MzI5OTczMywicm9sZSI6MCwiaWF0IjoxNzEyMDU0Mzk4LCJleHAiOjE3MTIwNjE1OTgsInRva2VuRXhwIjoxNzEyMDYxNTk4fQ.4c6lU725xS9tSYBosnoHodi0HXmHLF8IS3uLxlFkkc8",
    "meeting_number": 83763299733,
    "passcode": "redacted",
    "video_source": "Big_Buck_Bunny_1080_10s_1MB.mp4",
    "zak":""

chunsiong.zoom · April 2, 2024, 11:12am

@rakshithgowdahr could you try with a legacy meeting sdk app and check if this happens as well?

rakshithgowdahr · April 2, 2024, 12:14pm

@chunsiong.zoom I’ve tried all the below SDKs and it’s the same for all
5.15.7.20385
5.16.5.24346
5.17.5.31085

Topic		Replies	Views
Using Zoom Meeting Linux SDK to Get Transcription Meeting SDK	5	97	July 4, 2024
Create full-flow transcript by user of usage in meeting from client side Web	2	384	February 5, 2021
API Endpoint(s) - Meeting Recording - Transcripts not available for each participant Meetings recording , api	2	287	July 5, 2023
Is there any API or any other way we can use to transcript audio from zoom call in real-time so that we can use it in our backend for further processes ofcourse with the consent of the user Feature Requests meeting-sdk	1	573	October 8, 2022
Participients timecode Meeting SDK	3	329	January 26, 2021

Need help in extracting transcripts from the audio file

Related Topics