Why is my speech hard to understand in my video?
Muffled, distant, mumbled or buried under the music: unclear speech almost always traces to a short list of fixable causes. Here is how to find which one is hurting you, and exactly what to change.
By Thomas, founder of CutScore · Updated June 2026
I have shipped videos where I could understand every word and not one viewer could. That is the trap. You wrote the script, you know what comes next, so your brain fills in the mumbled bits automatically. The viewer has no script. They get the audio cold, on a phone speaker, on a train, and a word that is 80 percent clear to you is a coin flip to them.
There is also a comforting lie we tell ourselves: that clarity is a hardware problem. Buy a better mic and the mumbling stops. It does not. A 30 dollar mic held a palm-width from your mouth, in a room with a rug and a sofa, will out-clarify a studio mic sitting across an empty kitchen. Most "bad mic" problems are really distance problems, room problems, or delivery problems wearing a mic costume.
So before you spend a cent, rule out the five usual suspects. Clarity is a chain: your mouth, the air in the room, the mic, the mix, and the listener's speaker. The weakest link decides everything. Here is how to test each one.
Why your speech is hard to understand.
Run down this table top to bottom. The first row you fail is usually the one doing the most damage, so fix it before you touch the next.
| Cause | How it sounds | The fix |
|---|---|---|
| Room reverb | Distant and washy, a hollow tail on every word | Move closer, add soft surfaces (rug, sofa, curtains, blankets). |
| Mic too far | Thin, weak, the room as loud as you are | Get the mic to roughly a palm-width from your mouth. |
| Loudness too low | You keep reaching for the volume knob | Normalise the mix toward −14 LUFS, peaks ≤ −1 dBTP. |
| Music on the voice | You hear the track fine, the words less so | Pull music down 4 to 5 dB, duck it further under speech. |
| Talking too fast | Word endings clipped, no time to follow | Slow to 140–160 wpm, pause at the commas. |
CutScore listens to the actual file, finds the reverb, the masking and the low loudness, and tells you which one to fix first, with the timestamp.
Five fixes, in the order that pays off.
1. Kill the room before you touch the mic
Reverb is the clarity killer almost nobody hears in themselves, because your ear filters out the room you live in. Record ten seconds and play it back on headphones. If every word has a faint hollow tail, like you are talking in a small hall, the room is the problem. You cannot remove reverb cleanly in the edit, so you stop it at the source: get closer to the mic, and break up the hard parallel walls. A rug, a sofa, a bookshelf, even a duvet pinned behind the camera will soak up the slap. Soft room, clear voice. Empty room, soup.
2. Get the mic close, then check the level
Distance is the single biggest lever you are probably not pulling. The closer the mic, the more of your voice and the less of the room it captures, which is why a cheap lav clipped to your collar often beats a fancy mic on a far stand. Aim for roughly a palm-width from your mouth, just out of frame. Then set the overall loudness near −14 LUFS for YouTube with a true peak at or below −1 dBTP. A clear voice that is also too quiet still loses, because the viewer turns it up, the noise floor comes up with it, and now they are straining. Quiet audio and unclear audio are cousins.
3. Let the voice win the fight with the music
If you can hum the backing track but keep missing words, the music is masking the voice. Our ears are bad at separating two things in the same frequency range at the same volume, and most music lives right where speech consonants live. Two moves fix it. First, pull the music down 4 to 5 dB as a baseline, because it is almost always too loud to the person who chose it. Second, duck it: drop the music another few dB automatically whenever you speak, and bring it back in the gaps. This is the most common single tell I flag, and it is covered in detail in why your music is louder than your voice.
4. Slow down and let the words land
When you are nervous on camera, you speed up, and speed eats consonants. The ends of words get clipped, "going to" becomes "gonna" becomes "gnna," and the listener spends the next sentence reconstructing the last one. Most clear delivery sits around 140 to 160 words per minute, with actual pauses where the commas are. The pauses matter as much as the speed, because they give the ear a beat to catch up. Pile of filler words on top of a fast pace and you have a double clarity tax. More on this in how fast you should talk.
5. Shape the tone so consonants cut through
Once it is close, dry, loud and well-paced, a light EQ pass is the finishing touch. Intelligibility lives in the consonants, and the consonants live up high, roughly 2 to 6 kHz. A gentle cut in the low-mids (200 to 400 Hz) removes the boxy mud, and a small presence lift around 3 to 5 kHz pushes the "t," "s" and "k" sounds forward so words separate. Go easy: too much top end turns into harshness and sibilance. If you still hear hiss or hum under everything, that is a separate job, and cleaning up background noise comes before EQ, not after.
Here is a real CutScore report on an everyday talking-head clip: the reverb, the masking, the loudness and the pace, each scored with timestamps and the exact fix.
If you only fix three things.
Most of the jump from "what did they say?" to "crystal clear" comes from these three. Do them first, in this order.
By ear, by meter, or in one pass.
By ear, on bad speakers
Free and surprisingly good. Play it on phone speakers, in a noisy room, and turn the volume to normal, not editor-loud. If you miss a word, your viewer will miss two. The catch: you know the script, so your brain cheats. Get someone who has not heard it to listen instead.
With a meter and an EQ
Accurate, if you know what you are doing. A loudness meter confirms −14 LUFS, a spectrum analyser shows reverb and mud, and an EQ lets you carve. The cost is time and a learning curve. You have to read the tools correctly for every video, and most people would rather be making the next one.
With a coach in one pass
Hand the file or a link to CutScore. It measures loudness, finds reverb and music masking, checks your pace, and returns a 0 to 100 score with timestamps and the exact fix for each. No meters to read. See a sample report.
Frequently asked.
Find out why before you publish.
CutScore listens to your audio and tells you exactly why the words are not landing, with the timestamp and the fix. Join the waitlist for early access.
Join the waitlist