Hey there,
I wanted to hear if it would be possible to get the live captions from the meeting. I am building a headless bot using the Linux SDK and wanted to retrieve the captions. In a perfect world, I would query it every x seconds and get the captions since the last time it was queried. Is there any way to do this?
Thanks
chunsiong.zoom
(Chun Siong (tag me for response))
August 7, 2024, 5:25am
2
@noahviktorschenk seems like Meeting SDK for Linux does not have this capability at this moment.
On the other hand, Meeting SDK for Windows does have the IClosedCaptionControllerEvent which provides you with captions and transcription.
@chunsiong.zoom yeah, okay. We have already built the application using the Linux SDK.
Am I correct in thinking that we would be able to get the live audio track and from that transcribe it ourselves using a third party?
If so, do you have any resources on how this could be implemented that you could point to?
chunsiong.zoom
(Chun Siong (tag me for response))
August 8, 2024, 2:59am
4
@noahviktorschenk if you are looking at a live-stream type of scenario, you will need to
get the raw audio in PCM (this is provided by zoom)
convert the buffer/frames /streaminto 3rd party accepted format
send the converted audio frames/stream to the 3rd party service
get back the translated caption from the 3rd party service
You will need to implement step 2,3,4 on your end.
Hey @noahviktorschenk ,
If you’re open to using a third party API, you could consider the Recall.ai API .
Here is the guide to get real-time transcription from Zoom with the Recall API: Real-Time Transcription
@chunsiong.zoom I will most likely need them as an m4a, mp3, mp4, mpeg, mpga, wav or webm in 10 seconds intervals.
Just to make sure, this is possible, correct?
Also, is the audio stream split up into the different participants or just one singular?
chunsiong.zoom
(Chun Siong (tag me for response))
August 12, 2024, 1:57am
7
@noahviktorschenk , we only provide you with step 1.
You will need to convert the raw audio in PCM format to m3a, mp3, mp4, mpg etc…
There are 2 callbacks, one which returns multiple individual audio, and the other which returns single audio with everyone in it.
Okay, no problem.
Do you have some sample code or recourse for how to implement step 1?
chunsiong.zoom
(Chun Siong (tag me for response))
August 13, 2024, 4:34am
9
@noahviktorschenk
In this sample
Contribute to zoom/meetingsdk-linux-raw-recording-sample development by creating an account on GitHub.
There is a boolean variable GetAudioRawData
in meeting_sdk_demo.cpp
which shows what are some of the methods to call.
Once the audio has been subscribed, the callback would be found in
ZoomSDKAudioRawData.cpp
the sample code found inside the class saves the audio into a PCM file
void ZoomSDKAudioRawData::onMixedAudioRawDataReceived(AudioRawData* audioRawData)
{
std::cout << "Received onMixedAudioRawDataReceived" << std::endl;
//add your code here
static std::ofstream pcmFile;
pcmFile.open("audio.pcm", std::ios::out | std::ios::binary | std::ios::app);
if (!pcmFile.is_open()) {
std::cout << "Failed to open wave file" << std::endl;
return;
}
// Write the audio data to the file
pcmFile.write((char*)audioRawData->GetBuffer(), audioRawData->GetBufferLen());
//std::cout << "buffer length: " << audioRawData->GetBufferLen() << std::endl;
std::cout << "buffer : " << audioRawData->GetBuffer() << std::endl;
// Close the wave file
pcmFile.close();
pcmFile.flush();
}
@chunsiong.zoom Ah, perfect. Thank you so much.
Just read through the documentation for the meeting SDK, and it has an IClosedCaptionController, which has a StartLiveTranscription function. Would these not work?
chunsiong.zoom
(Chun Siong (tag me for response))
August 14, 2024, 5:08am
11
@noahviktorschenk
Yes it will work, for Windows.
There is no such controller in Linux.