Hello,
i would like to ask if it is possible to build AI avatar that will be able to communicate real time with other participant and will have access to the audio/video? Is there any api in video sdk to create such converstational bot that will be meeting participant?
Hi @Piotr At the moment, there is no AI avatar for Video SDK . We do have feature raw data that you could use to implement that feature in your application.
Video SDK’s native SDKs (Linux, Windows, macOS) support raw data access for remote participants, so you can process audio/video streams directly for your AI avatar.
For the Web SDK, direct raw data access isn’t provided. You’ll need to apply alternatives — for example, access the underlying <video> and <audio> tags or their media stream objects, and then pipe or process those streams as needed.
In both native and web cases, I suggest you:
Create virtual audio and video devices and pipe input/output streams to and from these devices.
On Linux, for example, you can use tools like v4l2loopback (for virtual video devices) and snd-aloop (for virtual audio devices) to route and process streams cleanly.
Optionally explore third-party avatar/AI APIs that provide conversational or visual avatars, such as HeyGen, Tavus, or similar platforms that can integrate with your media streams.
Let me know if you’d like guidance on setting up virtual devices or integrating external avatar APIs!