Concatenated audio doesn't work on the Whisper.

So I tried the approach in which I concatenate the marker and the audio, but I've seen instances where the concatenated audio, when transcribed on Whisper, only transcribes the marker. It omits the actual audio, although when you hear the web file, you can hear the marker plus the actual audio.