Unicode block character (�) appears in recording.completed webhook

Format Your New Topic as Follows:

API Endpoint(s) and/or Zoom API Event(s)
recording.completed webhook

Description
Details on your question, workflow or the problem you’re trying to solve.
Since April 17th JST, unicode block character (�) has started to appear in request body of recording.completed webhook, inside payload.object.participant_audio_files[0].file_name.

e.g. set the user’s display name to プロフィット and start a meeting and cloud recording, the file_name used to have been Audio only - プロフィッ㲁 before April 14th, but after April 17th it’s become Audio only - プロフィッ�.
The unicode block character appears in many other cases and is causing our app to skip processing this participant_audio_file and our users are not being able to use our app correctly.

Is this change done intentionally, or is it a bug?
If it’s a bug, please fix it as soon as possible.

Sample

{
    "payload": {
        "object": {
            "participant_audio_files": [
                {
                    "id": "c7f88d6f-4ab5-4338-8db9-1056e3396b3d",
                    "recording_start": "2023-04-22T00:23:12Z",
                    "recording_end": "2023-04-22T00:23:20Z",
                    "file_name": "Audio only - プロフィッ�",
                    "file_type": "M4A",
                    "file_extension": "M4A",
                    "file_size": 111758,
                    "play_url": "edited",
                    "download_url": "edited",
                    "status": "completed"
                }
            ]
        }
    },
    "event_ts": 1682123090696,
    "event": "recording.completed"
}

How To Reproduce
Steps to reproduce the behavior:
1. Request URL / Headers (without credentials or sensitive info like emails, uuid, etc.) / Body
2. Authentication method or app type
3. Any errors

  1. Set the user’s display name to “プロフィット”
  2. Turn on setting “Setting > Recording > Cloud recording > Record audio-only files > Record a separate audio file of each participant”
  3. Start a meeting and cloud recording.
  4. Wait for recording.completed webhook.
  5. Check payload.object.participant_audio_files[0].file_name. The last character is Unicode block character.

In further investigation, it started to appear around April 16th noon JST.

I have not gotten any reply but will anyone answer if it’s a bug or not?

This definitely feels like a character encoding problem where the display name is getting truncated and then additional portions of the file name, like the file extension, are being appended, resulting in an invalid sequence of characters that results in a validation step swapping in the Unicode replacement character (U+FFFD) you’re seeing in an attempt to recover from the problem.

I don’t feel that Zoom is very transparent or accurate about character length limitations — for example, they’ll document “Max 64 chars” or “This value cannot exceed more than 12 Chinese characters.” which I doubt accurately reflects the actual limitation, or they wouldn’t be using those measurement units.

I’m guessing UTF-8 encoding is being used at some point.

Aside from getting official changes, perhaps you can ask people to use shorter display names, or express it as romaji (or at least the last few characters that are likely to be truncated) so that truncation will always result in a valid UTF-8 sequence?

1 Like

Thanks for sharing your thoughts!

Aside from getting official changes, perhaps you can ask people to use shorter display names, or express it as romaji (or at least the last few characters that are likely to be truncated) so that truncation will always result in a valid UTF-8 sequence?

I hope we could do like you suggest, but in Japan, it’s not very common to use romaji and we cannot enforce our customers to use only valid alphabets. (More difficult as our customer’s customers names are often Chinese characters.)

For the time being, we have changed our app not to stop process on unicode block characters. But with 1 character less information, the file name has only 5 valid characters and it serves almost no meaning in identifying who’s audio file it is.

I hope I can get a good answer from Zoom. (Hopefully, they can extend the length or offer user_id.)