I am accessing raw audio and video streams using StartRawRecording() and receiving the raw data frames as well. I have a couple of questions regarding the mapping of Audio and Video data.
Does Zoom provide a complete recording of all the users in the meeting? (Using Local Recording? )
If not, Can we map the raw audio and video streams that we are getting from the SDK?
By complete recording, I mean audio and video recordings of all the participants in the meeting.
Just like in raw recording where we get the video and audio streams for all the participants, I want to access the recordings of all the participants after the meeting has ended without managing the individual user streams.
I am getting individual streams for both audio and video for each user. The issue I face is mapping audio and video on my end. The timestamps for both renderers might vary. For example: My camera might be ON but the mic is OFF so I’m getting the video stream but not the audio stream and vice versa.
Preparing a final video for each user becomes difficult when this happens.
So my question to you is:
How can we map audio stream + video stream for each user such that they fit in perfectly (A final MP4 should be generated at the end where audio sits over video and if the screen is OFF we get a black screen)
Again, does Zoom provide a recording of all participants post-meeting, is there a way to do so?
How can we map audio stream + video stream for each user such that they fit in perfectly (A final MP4 should be generated at the end where audio sits over video and if the screen is OFF we get a black screen)
I would propose using some libraries such as FFMPEG which would by default use black frames when there is no raw video received. The same goes for the audio track in the same MP4 file which FFMPEG will output. When there is no raw audio received, there will be silence.
Again, does Zoom provide a recording of all participants post-meeting, is there a way
to do so?
@striver.strikes, we use the Linux SDK extensively to power our meeting bots, and this is something we’ve encountered before.
It is pretty complicated to stitch together a composite video from the separate video and audio streams provided by the SDK. @chunsiong.zoom is correct that you can use a library such as FFMPEG to produce a composite video. The high level approach for video would be to:
Keep track of when each participant’s stream starts and stops.
Use a library such as FFMPEG to pad each individual stream to be the length of the full meeting
Use a library such as FFMPEG to composite the individual streams into a combined view showing everyone’s video, and screen share if enabled.
For audio, the approach would be similar:
You’d need to keep track of when each person’s audio stream started or stopped
Pad the audio streams to be the same length as the full meeting, which aligns them with the video
Either mix together the separate audio streams into a single audio stream to put into an MP4, or if you’re looking to analyze the audio (e.g. transcription), you can analyze each speaker separately for increased accuracy.
If you want to get a recording of all participants without all this processing, you have 2 options
Option 1
Ask your users to record to the cloud on Zoom, then pull the recording from their Zoom cloud storage using the Zoom cloud API.
Option 2
Use Recall.ai . It’s an API for meeting bots to get the raw audio/video from meetings + output video/audio without you needing to spend months to build, scale and maintain these bots.
I would propose using some libraries such as FFMPEG which would by default use black frames when there is no raw video received. The same goes for the audio track in the same MP4 file which FFMPEG will output. When there is no raw audio received, there will be silence.
FFMPEG library requires some info for processing data(raw audio and raw video) like bit rate, frame rate, etc
Do you know how can we convert char* data that we receive OnDataReceived fn to mp3 and mp4 formats using ffmpeg library