Why is my music louder than my voice?
The music was dropped in at full level, your voice was recorded a touch quiet, and your ears stopped hearing the music ten plays ago. Here is why it happens and how to put the voice back on top in a few minutes.
By Thomas, founder of CutScore · Updated June 2026
I have shipped this exact mistake. A clip I was proud of, music that felt cinematic in my headphones, and a comment within the hour: "great, but I can't hear what you're saying." The track was not too loud in any absolute sense. It was too loud next to a voice I had recorded slightly low, and I had stopped noticing because I had heard the song forty times.
That is the trap. Music and speech fight for the same space, especially in the low-mid frequencies where a warm voice lives and where most pads and basslines also park themselves. This is called masking. When two sounds share a frequency range, the louder one hides the quieter one, even when both are technically present in the file. So your voice is there. The viewer just cannot pull it out from under the bed.
And here is the part nobody warns you about: your gear lies. Good headphones separate the voice and music cleanly, so the balance sounds fine to you. Then someone plays it on a single phone speaker on a noisy train, where everything collapses into one thin band, and the music swallows the speech whole. You are mixing for the best case. Your audience is in the worst case.
How to put the voice back on top.
Four moves, fastest first. You can do the first one in thirty seconds, and most clips are saved by it alone.
| Move | What to do | Why it works |
|---|---|---|
| 1. Pull the music down | cut 4 to 6 dB | The single fastest fix. Lower the music bed under speech until the voice clearly wins. |
| 2. Set the gap | music 15 to 20 dB under voice | A sane default for talking-over-music. Wide enough that the words never have to compete. |
| 3. Add ducking | sidechain or keyframes | Music drops automatically when you talk and lifts back in the gaps, so it never masks a word. |
| 4. Lift the voice first | gain stage, then mix | If the voice was recorded quiet, raise it to a healthy level before you even think about music. |
Nobody can, after the tenth playback. CutScore measures the voice-to-music balance for you and tells you the exact gain change, with the timestamp where the music wins.
Loudness, balance and the one target everyone gets wrong.
Overall loudness is not the same as balance
This is where most people get confused, so let me separate the two ideas cleanly. Overall loudness is how loud the whole video is compared to every other video in the feed. You want that near −14 LUFS for YouTube, and a similar ballpark for the other platforms, because they all normalise toward roughly that level anyway. Balance is something else entirely: it is how loud the music is relative to the voice inside your own mix. A video can be perfectly on target at −14 LUFS and still have the music drowning the speech. The platform fixes total volume. It never fixes balance. That part is on you.
Keep an eye on peaks while you are in there
When you lift the voice to win over the music, you can accidentally push the loudest syllables into distortion. Watch your true peak and keep it at or below −1 dBTP, so nothing crackles after the platform re-encodes your file. The order matters: raise the voice to a healthy level first, then bring the music in underneath it, then check the peak last. Raising music to match a loud voice is backwards, and it is exactly how you end up here, with the bed creeping over the speech again.
Ducking is the move that makes it effortless
Once you understand ducking, you stop fighting your mix by hand. Ducking pulls the music down automatically the instant the voice starts, then lets it swell back up in the silences between sentences. You set it once and it tracks the whole video. Most editors do this with a sidechain compressor, where the voice triggers the dip, or with simple volume keyframes if your tool has no sidechain. For talking-head clips, vlogs and especially anything where the energy of the music matters, this is the difference between a mix that feels designed and one that feels like a wrestling match. CutScore checks for it as part of what we analyze, because a missing duck is one of the most common reasons a voice gets buried.
Frequency, not just volume
Sometimes the music is not actually louder, it just lives in the same range as your voice. A track heavy in the low-mids will mask speech even at a polite volume, because both are crowding the same shelf. A light touch of EQ on the music, scooping a little around where the voice sits, can clear room without you dropping the music further. You do not need to be a mastering engineer for this. Pull a gentle dip in the music between roughly 1 and 4 kHz and you will often hear the voice step forward on its own.
Here is a real CutScore report for an everyday video: loudness, peaks and the voice-to-music balance, scored, with timestamps and the exact fixes.
The fastest path to a clean mix.
Most of the jump from "I can't hear you" to "this sounds produced" comes from these three. Do them in this order.
Frequently asked.
Hear the problem before your viewers do.
CutScore measures the voice-to-music balance, the loudness and the peaks, then tells you exactly what to fix and where. Join the waitlist for early access.
Join the waitlist