Difference between recording transcript and vtt transcript

Hi there. I use Zoom for a lot of qualitative research in my work. I’ve noticed the scrolling transcript on the meeting recording and the vtt transcript file I receive after are different. Each seem to be more accurate in different instances.

Can someone help me with 1: how are each of these distinct transcriptions created? 2: Why are they different? 3: which tends to be more accurate and why?

I’ve search online and can’t find a clear answer. Unfortunately, I don’t have the plan needed to request live tech support.

I do dozens, potentially hundreds, of interviews and focus groups a year; so this would be such helpful information. Thank you for your help!