Get Captions in real time

noahviktorschenk · August 6, 2024, 10:52pm

Hey there,

I wanted to hear if it would be possible to get the live captions from the meeting. I am building a headless bot using the Linux SDK and wanted to retrieve the captions. In a perfect world, I would query it every x seconds and get the captions since the last time it was queried. Is there any way to do this?

Thanks

chunsiong.zoom · August 7, 2024, 5:25am

@noahviktorschenk seems like Meeting SDK for Linux does not have this capability at this moment.

On the other hand, Meeting SDK for Windows does have the IClosedCaptionControllerEvent which provides you with captions and transcription.

noahviktorschenk · August 7, 2024, 11:13am

@chunsiong.zoom yeah, okay. We have already built the application using the Linux SDK.

Am I correct in thinking that we would be able to get the live audio track and from that transcribe it ourselves using a third party?

If so, do you have any resources on how this could be implemented that you could point to?

chunsiong.zoom · August 8, 2024, 2:59am

@noahviktorschenk if you are looking at a live-stream type of scenario, you will need to

get the raw audio in PCM (this is provided by zoom)
convert the buffer/frames /streaminto 3rd party accepted format
send the converted audio frames/stream to the 3rd party service
get back the translated caption from the 3rd party service

You will need to implement step 2,3,4 on your end.

amanda-recallai · August 8, 2024, 3:22am

Hey @noahviktorschenk ,

If you’re open to using a third party API, you could consider the Recall.ai API.

Here is the guide to get real-time transcription from Zoom with the Recall API: Real-Time Transcription

noahviktorschenk · August 8, 2024, 3:28pm

@chunsiong.zoom I will most likely need them as an m4a, mp3, mp4, mpeg, mpga, wav or webm in 10 seconds intervals.

Just to make sure, this is possible, correct?

Also, is the audio stream split up into the different participants or just one singular?

chunsiong.zoom · August 12, 2024, 1:57am

@noahviktorschenk , we only provide you with step 1.

You will need to convert the raw audio in PCM format to m3a, mp3, mp4, mpg etc…

There are 2 callbacks, one which returns multiple individual audio, and the other which returns single audio with everyone in it.

noahviktorschenk · August 12, 2024, 4:41pm

Okay, no problem.

Do you have some sample code or recourse for how to implement step 1?

chunsiong.zoom · August 13, 2024, 4:34am

@noahviktorschenk

In this sample

There is a boolean variable GetAudioRawData in meeting_sdk_demo.cpp which shows what are some of the methods to call.

Once the audio has been subscribed, the callback would be found in
ZoomSDKAudioRawData.cpp

the sample code found inside the class saves the audio into a PCM file

void ZoomSDKAudioRawData::onMixedAudioRawDataReceived(AudioRawData* audioRawData)
{
	std::cout << "Received onMixedAudioRawDataReceived" << std::endl;
	//add your code here


	static std::ofstream pcmFile;
	pcmFile.open("audio.pcm", std::ios::out | std::ios::binary | std::ios::app);

	if (!pcmFile.is_open()) {
		std::cout << "Failed to open wave file" << std::endl;
		return;
	}
	
		// Write the audio data to the file
		pcmFile.write((char*)audioRawData->GetBuffer(), audioRawData->GetBufferLen());
		//std::cout << "buffer length: " << audioRawData->GetBufferLen() << std::endl;
		std::cout << "buffer : " << audioRawData->GetBuffer() << std::endl;

		// Close the wave file
		pcmFile.close();
		pcmFile.flush();
}

noahviktorschenk · August 13, 2024, 5:23pm

@chunsiong.zoom Ah, perfect. Thank you so much.

Just read through the documentation for the meeting SDK, and it has an IClosedCaptionController, which has a StartLiveTranscription function. Would these not work?

chunsiong.zoom · August 14, 2024, 5:08am

@noahviktorschenk
Yes it will work, for Windows.
There is no such controller in Linux.

Topic		Replies	Views
Live meeting audio / transcription API and Webhooks	2	2247	October 18, 2020
Streaming closed captions produced by zoom to third party app via API API and Webhooks	2	1374	September 20, 2023
Get stream from conference to transcribe API and Webhooks	4	792	October 25, 2021
Need help in extracting transcripts from the audio file Meeting SDK	24	701	April 8, 2024
Live transcription API / Webhooks Meetings	1	106	February 22, 2025

Get Captions in real time

Related topics