Provide video frames as bitmaps to allow higher performance recording

This is for apps using the Video SDK for Android although it applies to virtually all the other SDKs.

On my Android device, I host multiple participants in a meeting. I want to record the meeting to an mp4 file. I also want the recording to look similar to the way it looks as the meeting is actually happening (grid layout, etc). Please note, that I don’t want a screen capture. The final mp4 will contain after effects that are not visibile during a live meeting.

Recording a video is very memory and CPU/GPU intense. Even if a lot of the work is offloaded to the GPU, the recording process needs to contend with multiple video frames and still generate a single video that is at least 30 fps. Having the ability to include effects in the recording would be nice too, such as zooming in on one speaker, doing animated transitions, etc. These are nice-to-have but they are expensive if recording in real-time, so I can live without them, although techically, high end mobile devices are fully capable of creating these effects and have enough processing power to generate the kind of mp4 I am interested in.

The biggest bottleneck is video frame data. A single YUV420P frame for a 480x640 video is around 460k for a single participant. Multiply that by 6 participants and the memory/GPU costs go up.

You can forget about adding “interactive views” the way Zoom’s new feature works, as that would increase the processing time significantly.

Recording a video requires converting each of the frames from each participant to a bitmap and merging it into a single bitmap. On Android, this bitmap is part of a canvas or EGLSurface that gets rendered onto a surface which then gets fed into a video encoder and finally mixed with the audio. Doing all that while keeping up with the fps is challenging. Using hardware acceleration will help.

Zoom could help out here by providing video frames as bitmaps and not just as YUV data. Internally, Zoom is already using OpenGL ES, SurfaceTexture and SurfaceView to generate a custom SurfaceView to display the video of a user using the ZoomInstantSDKVideoView class. It appears that you are then converting the data from this surface to YUV and then sending this as raw data to the client’s onRawDataFrameReceived. Regardless how you are creating the YUV data sent to onRawDataFrameReceived, it is being created from data that already went through a conversion from the video stream received over the web.

I should also note that in my app, I allow users to leave the current screen hosting the video meeting but without terminating the meeting. The meeting continues and will still record the user. This allows the user to access a different part of the app during the meeting or even leave the app to access a different app. For this reason, it isn’t necessary that an activity is even shown while a video session is under way. If you decide to implement the feature to provide bitmaps for frames, you should take this into consideration. You also should not include this feature in the ZoomInstantSDKRawDataPipeDelegate as another method unless you provide a way to disable generating YUV frames and sending them to onRawDataFrameReceived . If the app only wants to work with bitmap data, YUV data should not have to be sent to onRawDataFrameReceived. It makes no sense to waste processing generating data the app doesn’t need. Likewise, if they want YUV data, then don’t generate bitmap data.

I would recommend that you don’t use a callback that gets called to provide the bitmap data. If you stop and think about it, it doesn’t make sense to just assume that the callback can handle the rate at which data is being delivered to it. If the callback needs more time to process the frames, it could just skip frames but that doesn’t really make sense. Why waste time creating data that is only going to get ignored? A better approach is that the app itself decides when it needs frame data and requests it. Normally this will be at a rate that is equal to or less than the fps. If the processing can’t handle say 30 fps, it could drop the rate to 25. While it would still miss frames, at least there is no time wasted on your side creating the frames that are never going to get used.

If you do decide to add support for bitmap frames, just make sure that you use hardware acceleration to create them, as that will give the best performance.

Finally, it should be noted that most Android developers are almost never going to work with YUV data. They might use OpenGL ES but most will prefer to use 2D rendering and the canvas provides support for that, including hardware acceleration support. The canvas API is also far easier to work with than OpenGL ES.

1 Like