Identifying speaker in onAudio Metadata no longer works

Help!
I have a zoom app working with @zoom/sdk 0.0.2 and just tried to upgrade to @zoom/sdk 1.0.1
I had to add duration and frameSize to my audio params:

  const audioParams: AudioParams = {
    channel: rtms.AudioChannel.MONO,
    sampleRate: rtms.AudioSampleRate.SR_16K,
    codec: rtms.AudioCodec.L16,
    contentType: rtms.AudioContentType.RAW_AUDIO,
    dataOpt: rtms.AudioDataOption.AUDIO_MULTI_STREAMS,
    duration: 20,
    frameSize: 320, // 16000 * 0.02 * 1
  };

And everything seemed to work until I receive data in my onAudio ClientEventHandler and it never makes it past the check to see if the selected participantId matches the metadata.userId:

import type { Metadata } from '@zoom/rtms';
import type { ServerContext } from '../../context.js';
import { type ClientEventHandler, RecordingState } from '../../services/client-registry/types.js';

export const onAudio: ClientEventHandler = async (
  context: ServerContext,
  meetingUuid: string,
  buffer: Buffer,
  _size: number,
  _timestamp: number,
  metadata: Metadata
) => {
  try {
    const selectedParticipantId = context.services.clients.getData(meetingUuid)?.participant?.participantId;

    if (
      metadata.userId == null ||
      !meetingUuid ||
      metadata.userId === -1 ||
      (selectedParticipantId != null && selectedParticipantId !== metadata.userId.toString())
    ) {
      return;
    }

    const state = context.services.clients.getState(meetingUuid);

    if (state === RecordingState.Recording_Start) {
      if (context.services.audio.hasReachRecordingMaxTime(context, meetingUuid)) {
        context.services.clients.setState(meetingUuid, RecordingState.Recording_Stop);
        context.log.info(`Max recording time reached for meeting ${meetingUuid}, stopping RTMS.`);

        // BUG: Will this cause any issues? What is the flow here?
        context.services.clients.pause(meetingUuid, 'audio');
        await context.services.audio.processStream(context, meetingUuid);
        return;
      }

      context.services.audio.pushBytes(context, meetingUuid, buffer, metadata.userId);
    }
  } catch (error: any) {
    context.log.error(`Error handling audio data for ${meetingUuid}:`, error.message);
  }
};

In debugging I’ve found that metadata.userId = -1 and metadata.userName = “" for both rtms.AudioDataOption.AUDIO_MULTI_STREAMS and rtms.AudioDataOption.AUDIO_MIXED_STREAM

Hopefully, there is a simple way to get information about which speaker is speaking so I can isolate to just their voice and the app will continue to work the way it did in the past.

@Canary_Speech Thank you for reaching out about this! I was able to replicate this issue and am working to identify the root cause. I’ll be sure to keep you in the loop as I work to release a fix.

2 Likes

@Canary_Speech Thanks for your patience on this. I’ve released v1.0.2 today which includes some fixes around setting audio parameters.

It seems that AUDIO_MULTI_STREAMS should have been the default but there was a bug preventing that from being set in all cases. I also ensured that calling setAudioParams after setting the OnAudioData callback configured the parameters as expected. In my testing, this resolved the edge cases preventing metadata from being displayed.

When it comes to using AUDIO_MIXED_STREAM there is only one stream that is sent for all participants. In this case, you’ll want to use the OnActiveSpeakerEvent function to receive information about the active speaker.

You can find the latest version here. I hope that helps! Let me know if you are still running into any issues.

1 Like

Thank you! I will pull it in and try it now.

@MaxM That unfortunately didn’t work… The code may be fine but I now have a problem in that I can’t get any events to fire. I even downloaded your rtms-quickstart-js app and tried and I’m getting a CSRF error from ngrok I think

Here are the details:

POST / HTTP/1.1
Host: 8f1d22f5efc6.ngrok.app
User-Agent: Zoom Marketplace/1.0a
Content-Length: 267
Authorization: BLABLA
Clientid: kkYCzJpZQFKbZsbHd2FOzQ
Content-Type: application/json; charset=utf-8
Traceparent: 00-ea21930aea161ac612edb563b03b268c-4f8fbe0ffe928160-00
X-Forwarded-For: 170.114.6.87
X-Forwarded-Host: 8f1d22f5efc6.ngrok.app
X-Forwarded-Proto: https
X-Zm-Request-Id: 9cc5bc35_3068_49a2_9666_2e223ca4e117
X-Zm-Request-Timestamp: 1769827344
X-Zm-Signature: v0=2976ed88baee0dc410a71c6f74468d796508e0112f5994c57a8750e6758bd161
X-Zm-Trackingid: Webhook_7c0044ad39584fbaa05439a77a2080c2
Zm-Trace-Upstream: Meeting_Web_marketplace-consumer
Accept-Encoding: gzip

{“event”:“meeting.rtms_started”,“payload”:{“meeting_uuid”:“qt2W4HBAQgSzdPnhvKzYWQ==”,“operator_id”:“yzsAxeKsRx-MXL2-EnxLfg”,“rtms_stream_id”:“d5002d86af074c53bd7757b532d12544”,“server_urls”:“wss://zoomsjc144-195-54-108zssgw.sjc.zoom.us:443”},“event_ts”:1769827344365}

@Canary_Speech On my end when I test the onAudioData function out of the box on v1.0.2 I’m seeing the metadata is returned. I’m using this code to obtain the metadata:

client.onAudioData((Buffer, size, timestamp, metadata) => {

   console.log(`${metadata.userName} - ${metadata.userId}`)

})


// For AUDIO_MIXED_STREAM do this

const audioParams = {
  dataOpt: rtms.AudioDataOption.AUDIO_MULTI_STREAMS,
};

client.setAudioParams(audioParams);

// This works for all dataOpt types
client.onParticipantEvent((event, timestamp, participants)=> {
  console.log(`${event} ${timestamp} ${JSON.stringify(participants)}`)
})

Let me know if that’s helpful. If not I sent you a DM so that we can coordinate on a time to meet.