Applying Machine Learning to Zoom Video feed

Hi all,

I am new to the dev community. I am working on a project that relies on zoom video feed of participants, I want to get the images from the video and apply an ML model on top of it that includes some text over the video feed in the end. Is this possible in any way (maybe using video SDK)? If yes, where I can start ?