Cloud recordings with separated audio channels?


We provide a service that provides a company intranet where you can watch internal meetings async. While I’m aware that you can get a local recording with all the audio channels separated out, we prefer to import cloud recordings (they’re more likely to be available, don’t need to worry about which computer you recorded on, if it got deleted, etc…).

One of our weaknesses is lack of good speaker diaterization. We can transcribe what is said, but it’s hard to accurately identify who is saying what when we only have a single audio channel. I know that some other services will use a “zoom bot” to workaround this limitation or rely on zoom’s provided transcripts if they become available. In our case we would prefer to be able to transcribe ourselves (per-word timestamps, custom vocab, etc).

Today I was on a call with someone using Gong and I was surprised to see there was no Gong bot in the call, but we still ended up with a transcribed recording after that had a timeline of who spoke when, along with transcripts marked with speakers. Is there any new zoom api features that would make this possible? Was curious how they were doing it… if anyone knows.

Which App Type (OAuth / Chatbot / JWT / Webhook)?

Hey @scottjg,

Thanks for reaching out about this, and good question. Have you considered using our Cloud Recording APIs and Webhooks to get audio data, in conjunction with a timeline file? The timeline file will show who was actively speaking when, which might be what you’re looking for.

Let me know if this helps!