F · AUDIO QUALITY & BALANCE
Voice-to-music balance
How far the music bed should sit under the speech.
By Thomas Linck, founder · Updated June 2026
Voice-to-music balance is how far the music bed sits below the speech in your mix. For spoken content, common guidance is to keep the music 15–20 dB under the voice while anyone is talking, ducking the bed under each line. Get it wrong and the viewer rewinds to catch the words.
WHY IT MATTERS
Platforms normalize your overall loudness, but they never touch the balance inside your mix — if the music buries the voice in your export, it stays buried after upload. The tell is unmistakable: viewers rewind to catch words. Keep the bed 15–20 dB under the speech, duck it while anyone talks, then test on a single phone speaker, where the balance collapses first.
TARGET · STANDARD
| Music under voice | 15–20 dB below | while anyone is talking |
| Ducking | drop the bed under speech | sidechain or keyframes |
| The test | single phone speaker | every word, no effort |
How CutScore measures it
CutScore measures how far the music bed sits under your speech across the file and flags the stretches where the music is winning, with timestamps and the exact decibel cut to make — so you fix the masking before a viewer has to rewind.
QUESTIONS
Frequently asked.
Roughly 15–20 dB below the voice whenever someone is talking. The real test is a phone speaker: if you catch every word without effort while still hearing the music, the balance is right.
Ducking automatically lowers the music whenever the voice is present and lifts it back in the gaps. Most editors do it with a sidechain compressor or keyframes, and it is the difference between a clean mix and a constant fight between the two.