Video SDK - Session transcript retrieval

Hello,

We have integrated the Video SDK to host some sessions between the users. The sessions are one hour long and have <10 users. The sessions are accessed from the client side (mobile apps) using a session name, key and JWT generated on our backend (NestJS application).

What we need is to obtain the trasncript of the call. However,

The problem

We do not need the transcript as captions on the client side. We need it during or at the end of the call, but on the server side. This is where the problem comes in.

What solutions do we have here?

  1. The Zoom SDK from npm does not work on a server environment. Can we listen to the transcription stream without using the sdk? Via websockets, maybe?
  2. If we do it via websockets, do we also need to activate the recording of the session, since we could listen to the transcript in real time?
  3. If we need to activate the recording, however, we would need to delete it afterwards, when the transcript is completed. How does this affect costs? Are those storage GBs paid reused if the recording is deleted?

Thank you in advance!

Hey there - I think you are indeed looking for a websocket - session.recording_transcript_completedZoom API Events - Video SDK

I believe you do need to start a recording to get the transcript from the recording, yes.

I’m not 100% sure how the billing for video storage works however if you don’t have any need for the recording after, you can delete it.

Let me know if this helps.

Thank you for the response.

We have been playing around with the PATCH request to https://api.zoom.us/v2/videosdk/sessions/{sessionId}/events that starts the recording. We want to call this endpoint server-side. However, using a token generated here, we receive

{
    "code": 124,
    "message": "Invalid access token."
}

and status code 401.

We also generate tokens server-side, with the following payload

    const payload = {
      app_key: Config.zoom.videoSdkAppKey,
      role_type: 1,
      tpc: meetingId,
      version: 1,
      iat,
      exp,
      session_key: meetingId,
      cloud_recording_option: 0,
      cloud_recording_election: 0,
    };

but to no success so far, even if we switch the flags of cloud_recording_option and cloud_recording_election from 0 to 1.

Any ideas?

You will want to use this → Make API requests

The JWT for a client session and the JWT for the API are two different tokens.

Let me know if this helps.

Thank you, that worked!

I managed to start and stop the recording for a video session. Still have some more follow-up questions.

  1. Can we turn on auto-recordings for our sessions? So we don’t have to start and stop them manually?
  2. Also, will the transcript be automatically generated? How do we start it or trigger the speech-to-text for a session recording?

Regards,
Victor

I think the way to start recordings automatically would be to programmatically call the startCloudRecording method. Users would still need to approve, but it could at least start the process.

Transcripts are generated for you and will notify you when complete with a webhook - Zoom API Events - Video SDK

You can also add live transcription and translation with the VideoSDK → Video SDK - web - Live transcription and translation

Let me know if this helps.

Thank you, that is clear.

My questions is: do the transcriptions start working automatically without any API call?
Our flow does not include any session transcription control from the client side. The clients just enter the calls. We inform them that the call is recorded. We record an audio file of the session by starting the recording on the API.

Will I get a webhook eventthat the transcription is complete automatically, without activating them by any call?

Hope this makes sense.
Thank you!

You would need to start the recording and the transcription separately. You will need to use the live transcription feature to add transcription from the session. Video SDK - web - Cloud recording

But we don’t have a web client, just the mobile client.

We want to start the transcription without any client code, from server-side, so we can get it at the end via the webhook event. Is this currently doable?

My fault I misunderstood which platform you were using - however - the process is the same and the Live Transcription service would have to be utilized with the mobile client.