AUDIO MIX BLOG / 8 MIN READ

Why is my music louder than my voice?

The music was dropped in at full level, your voice was recorded a touch quiet, and your ears stopped hearing the music ten plays ago. Here is why it happens and how to put the voice back on top in a few minutes.

Check your mix automatically Jump to the fix

4 to 6 dBtypical music cut

−14 LUFSoverall loudness target

15 to 20 dBmusic below voice

1phone-speaker test

By Thomas, founder of CutScore · Published June 4, 2026 · Updated June 11, 2026

MIX CHECK · talk_over_music.mp4

Two people talking into microphones at a podcast desk, the exact setup where background music quietly creeps up over the voice if nobody checks the balance.

CRAFT SCORE

FIXES ADVISED

the music is winning, by a lot

Music masks the voice · pull bed −5 dB00:12

No ducking under speech · add sidechain00:31

Overall loudness fine · −14 LUFS✓

The 30-second answer Your music is louder than your voice for three reasons that stack up. The music was added at its default level and nobody pulled it down. The voice was recorded a little quiet, so it had less room to start with. And you have heard the track so many times that your ears tuned it out, which makes the buried voice feel fine to you and only to you. The fix is to lower the music until the voice clearly wins, usually a 4 to 6 dB cut under speech, sit the bed roughly 15 to 20 dB below the voice while anyone is talking, add ducking so the music steps back during words, then keep the whole mix near −14 LUFS. Test it on a phone speaker, not your good headphones.

WHY IT KEEPS HAPPENING

I have shipped this exact mistake. A clip I was proud of, music that felt cinematic in my headphones, and a comment within the hour: "great, but I can't hear what you're saying." The track was not too loud in any absolute sense. It was too loud next to a voice I had recorded slightly low, and I had stopped noticing because I had heard the song forty times.

That is the trap. Music and speech fight for the same space, especially in the low-mid frequencies where a warm voice lives and where most pads and basslines also park themselves. This is called masking. When two sounds share a frequency range, the louder one hides the quieter one, even when both are technically present in the file. So your voice is there. The viewer just cannot pull it out from under the bed.

And here is the part nobody warns you about: your gear lies. Good headphones separate the voice and music cleanly, so the balance sounds fine to you. Then someone plays it on a single phone speaker on a noisy train, where everything collapses into one thin band, and the music swallows the speech whole. You are mixing for the best case. Your audience is in the worst case.

THE FIX, IN ORDER

How to put the voice back on top.

Four moves, fastest first. You can do the first one in thirty seconds, and most clips are saved by it alone.

Move	What to do	Why it works
1. Pull the music down	cut 4 to 6 dB	The single fastest fix. Lower the music bed under speech until the voice clearly wins.
2. Set the gap	music 15 to 20 dB under voice	A sane default for talking-over-music. Wide enough that the words never have to compete.
3. Add ducking	sidechain or keyframes	Music drops automatically when you talk and lifts back in the gaps, so it never masks a word.
4. Lift the voice first	gain stage, then mix	If the voice was recorded quiet, raise it to a healthy level before you even think about music.

The one test that settles every argumentBounce the mix and play it on the worst speaker you own, a laptop or a single phone speaker, at a normal volume. If you can hear every word without leaning in, you are done. If you reach for the rewind to catch a sentence, the music is still too loud. Trust the cheap speaker over your headphones every time.

CAN'T TRUST YOUR OWN EARS?

Nobody can, after the tenth playback. CutScore measures the voice-to-music balance for you and tells you the exact gain change, with the timestamp where the music wins.

Get early access

THE NUMBERS THAT ACTUALLY MATTER

Loudness, balance and the one target everyone gets wrong.

Overall loudness is not the same as balance

This is where most people get confused, so let me separate the two ideas cleanly. Overall loudness is how loud the whole video is compared to every other video in the feed. You want that near −14 LUFS for YouTube, and a similar ballpark for the other platforms, because they all normalise toward roughly that level anyway. Balance is something else entirely: it is how loud the music is relative to the voice inside your own mix. A video can be perfectly on target at −14 LUFS and still have the music drowning the speech. The platform fixes total volume. It never fixes balance. That part is on you.

Keep an eye on peaks while you are in there

When you lift the voice to win over the music, you can accidentally push the loudest syllables into distortion. Watch your true peak and keep it at or below −1 dBTP, so nothing crackles after the platform re-encodes your file. The order matters: raise the voice to a healthy level first, then bring the music in underneath it, then check the peak last. Raising music to match a loud voice is backwards, and it is exactly how you end up here, with the bed creeping over the speech again.

People leaning over a mixing board covered in faders and knobs, the place where the voice fader should always sit higher than the music fader. — Voice fader high, music fader low, and ducking doing the rest. Photo: cottonbro studio / Pexels.

Ducking is the move that makes it effortless

Once you understand ducking, you stop fighting your mix by hand. Ducking pulls the music down automatically the instant the voice starts, then lets it swell back up in the silences between sentences. You set it once and it tracks the whole video. Most editors do this with a sidechain compressor, where the voice triggers the dip, or with simple volume keyframes if your tool has no sidechain. For talking-head clips, vlogs and especially anything where the energy of the music matters, this is the difference between a mix that feels designed and one that feels like a wrestling match. CutScore checks for it as part of what we analyze, because a missing duck is one of the most common reasons a voice gets buried.

Frequency, not just volume

Sometimes the music is not actually louder, it just lives in the same range as your voice. A track heavy in the low-mids will mask speech even at a polite volume, because both are crowding the same shelf. A light touch of EQ on the music, scooping a little around where the voice sits, can clear room without you dropping the music further. You do not need to be a mastering engineer for this. Pull a gentle dip in the music between roughly 1 and 4 kHz and you will often hear the voice step forward on its own.

WANT TO SEE IT SCORED?

Here is a real CutScore report for an everyday video: loudness, peaks and the voice-to-music balance, scored, with timestamps and the exact fixes.

See a sample report

IF YOU ONLY DO THREE THINGS

The fastest path to a clean mix.

Most of the jump from "I can't hear you" to "this sounds produced" comes from these three. Do them in this order.

30-SEC FIXAUDIO

Pull the music down 4 to 6 dB

Before anything clever, just lower the music. Grab the bed, drop it a few decibels, and listen back. Nine times out of ten the voice was only losing by a small margin, and this one move hands the win straight back to the speech.

How Lower the music track gain, or let CutScore measure the balance and tell you the exact cut.

SETUPAUDIO

Add ducking so the music steps back when you talk

Set the music to dip automatically whenever the voice is present and rise in the gaps. One sidechain or a few keyframes covers the whole video, so you never have to ride the fader by hand again or argue with a swelling chorus.

How Sidechain the music to the voice, or keyframe the music volume down under each line of speech.

QUICKAUDIO

Test it on the worst speaker you own

Your headphones are flattering you. Play the bounce on a laptop or a single phone speaker at normal volume. If you can hear every word without effort, you are done. If you lean in to catch a line, the music is still too loud, full stop.

How Export, play it on a phone, and listen the way a viewer on a bus actually would.

How CutScore checks the balance for you CutScore is an AI video quality coach for pre-publish QC. It measures your audio deterministically, the overall loudness with an EBU R128 meter, the true peak, and crucially the level of the voice against the music across the whole timeline, so it can tell you not just "the music is loud" but "the bed masks the voice at 00:12, pull it down about 5 dB." You get one score, the evidence behind it, and the fix, before anyone leaves a comment about not being able to hear you. It judges the craft of the video, so it sits next to a growth tool rather than competing with one. More on the method and the standards.

Get your mix scored automatically See everything we check

KEEP READING

Why is my audio so quiet? How loud should a YouTube video be? Make your audio sound professional LUFS for YouTube True peak

QUESTIONS

Frequently asked.

Why is my music louder than my voice?

Usually because the music track was added at a default level and never pulled down, while the voice was recorded a little quiet to begin with. Your ears also stop hearing the music after the tenth playback, so it feels fine to you while it buries the speech for everyone else. The fix is to lower the music until the voice clearly wins, normally a 4 to 6 dB cut under speech.

How much quieter than the voice should background music be?

As a starting point, sit the music roughly 15 to 20 dB below the voice whenever someone is talking. That is not a law, it is a sane default. The real test is simpler: play it on a phone speaker and check you can hear every word without effort. If the music makes you lean in to catch the speech, it is still too loud.

What is ducking and do I need it?

Ducking automatically lowers the music whenever the voice is present and lifts it back up in the gaps. Most editors do this with a keyframe or a sidechain compressor. You do not strictly need it for a short clip, but for anything with talking over a bed of music it is the difference between a clean mix and a constant fight between the two.

Will the platform fix my voice-to-music balance for me?

No. YouTube, TikTok and Instagram normalise the overall loudness of your file toward roughly −14 LUFS, but they do not touch the balance inside your mix. If the music is louder than the voice in your export, it stays louder than the voice after upload. The balance is your job, the platform only adjusts the total volume.

EARLY ACCESS

Hear the problem before your viewers do.

CutScore measures the voice-to-music balance, the loudness and the peaks, then tells you exactly what to fix and where. Join the waitlist for early access.

You’re on the list.

We’ve noted your email. You’re in line for priority access and a free report when early access opens.