VOICE CLARITY BLOG / 9 MIN READ

Why is my speech hard to understand in my video?

Muffled, distant, mumbled or buried under the music: unclear speech almost always traces to a short list of fixable causes. Here is how to find which one is hurting you, and exactly what to change.

5causes to rule out
−14 LUFSloudness target
140–160words per minute
0–100craft score

By Thomas, founder of CutScore · Updated June 2026

SPEECH CLARITY CHECK · talking_head.mp4
A presenter speaking to camera, where clear speech depends far more on mic distance, the room and the mix than on an expensive microphone.
CRAFT SCORE
FIXES ADVISED
why the words are not landing
Room reverb on voice · washy tail00:12
Music masking speech · pull −5 dB01:24
Loudness on target · −14 LUFS
The 30-second answer Your speech is hard to understand for one of five reasons, and almost none of them are your microphone. The room is too echoey, so words arrive with a washy tail. The mic is too far away, so the voice sounds distant and thin. The loudness is too low, so listeners crank the volume and still miss words. The music is sitting on top of the voice. Or you are talking too fast and swallowing the start and end of words. Fix them in that order: distance, room, mix, pace. If you would rather have the cause pointed out for you, that is exactly what CutScore does in one pass.
THE PART NOBODY WARNS YOU ABOUT

I have shipped videos where I could understand every word and not one viewer could. That is the trap. You wrote the script, you know what comes next, so your brain fills in the mumbled bits automatically. The viewer has no script. They get the audio cold, on a phone speaker, on a train, and a word that is 80 percent clear to you is a coin flip to them.

There is also a comforting lie we tell ourselves: that clarity is a hardware problem. Buy a better mic and the mumbling stops. It does not. A 30 dollar mic held a palm-width from your mouth, in a room with a rug and a sofa, will out-clarify a studio mic sitting across an empty kitchen. Most "bad mic" problems are really distance problems, room problems, or delivery problems wearing a mic costume.

So before you spend a cent, rule out the five usual suspects. Clarity is a chain: your mouth, the air in the room, the mic, the mix, and the listener's speaker. The weakest link decides everything. Here is how to test each one.

THE FIVE CAUSES

Why your speech is hard to understand.

Run down this table top to bottom. The first row you fail is usually the one doing the most damage, so fix it before you touch the next.

CauseHow it soundsThe fix
Room reverbDistant and washy, a hollow tail on every wordMove closer, add soft surfaces (rug, sofa, curtains, blankets).
Mic too farThin, weak, the room as loud as you areGet the mic to roughly a palm-width from your mouth.
Loudness too lowYou keep reaching for the volume knobNormalise the mix toward −14 LUFS, peaks ≤ −1 dBTP.
Music on the voiceYou hear the track fine, the words less soPull music down 4 to 5 dB, duck it further under speech.
Talking too fastWord endings clipped, no time to followSlow to 140–160 wpm, pause at the commas.
The sixth, sneaky oneMuddy EQ. Even close, dry and loud, a voice can sound boxy if there is too much low-mid build-up. A gentle cut around 200 to 400 Hz and a small lift around 3 to 5 kHz brings the consonants forward, which is where intelligibility actually lives.
STOP GUESSING WHICH ONE IT IS

CutScore listens to the actual file, finds the reverb, the masking and the low loudness, and tells you which one to fix first, with the timestamp.

Join the waitlist
HOW TO ACTUALLY FIX EACH ONE

Five fixes, in the order that pays off.

1. Kill the room before you touch the mic

Reverb is the clarity killer almost nobody hears in themselves, because your ear filters out the room you live in. Record ten seconds and play it back on headphones. If every word has a faint hollow tail, like you are talking in a small hall, the room is the problem. You cannot remove reverb cleanly in the edit, so you stop it at the source: get closer to the mic, and break up the hard parallel walls. A rug, a sofa, a bookshelf, even a duvet pinned behind the camera will soak up the slap. Soft room, clear voice. Empty room, soup.

2. Get the mic close, then check the level

Distance is the single biggest lever you are probably not pulling. The closer the mic, the more of your voice and the less of the room it captures, which is why a cheap lav clipped to your collar often beats a fancy mic on a far stand. Aim for roughly a palm-width from your mouth, just out of frame. Then set the overall loudness near −14 LUFS for YouTube with a true peak at or below −1 dBTP. A clear voice that is also too quiet still loses, because the viewer turns it up, the noise floor comes up with it, and now they are straining. Quiet audio and unclear audio are cousins.

A presenter speaking into a handheld microphone held close to the mouth, the simplest fix for distant, unclear speech that no amount of post-processing can match.
Distance does more than gear: a palm-width from the mouth beats an expensive mic across the room. Photo: Henri Mathieu-Saint-Laurent / Pexels.

3. Let the voice win the fight with the music

If you can hum the backing track but keep missing words, the music is masking the voice. Our ears are bad at separating two things in the same frequency range at the same volume, and most music lives right where speech consonants live. Two moves fix it. First, pull the music down 4 to 5 dB as a baseline, because it is almost always too loud to the person who chose it. Second, duck it: drop the music another few dB automatically whenever you speak, and bring it back in the gaps. This is the most common single tell I flag, and it is covered in detail in why your music is louder than your voice.

4. Slow down and let the words land

When you are nervous on camera, you speed up, and speed eats consonants. The ends of words get clipped, "going to" becomes "gonna" becomes "gnna," and the listener spends the next sentence reconstructing the last one. Most clear delivery sits around 140 to 160 words per minute, with actual pauses where the commas are. The pauses matter as much as the speed, because they give the ear a beat to catch up. Pile of filler words on top of a fast pace and you have a double clarity tax. More on this in how fast you should talk.

5. Shape the tone so consonants cut through

Once it is close, dry, loud and well-paced, a light EQ pass is the finishing touch. Intelligibility lives in the consonants, and the consonants live up high, roughly 2 to 6 kHz. A gentle cut in the low-mids (200 to 400 Hz) removes the boxy mud, and a small presence lift around 3 to 5 kHz pushes the "t," "s" and "k" sounds forward so words separate. Go easy: too much top end turns into harshness and sibilance. If you still hear hiss or hum under everything, that is a separate job, and cleaning up background noise comes before EQ, not after.

RATHER SEE IT THAN READ IT?

Here is a real CutScore report on an everyday talking-head clip: the reverb, the masking, the loudness and the pace, each scored with timestamps and the exact fix.

See a sample report
SHORT ON TIME

If you only fix three things.

Most of the jump from "what did they say?" to "crystal clear" comes from these three. Do them first, in this order.

1
FREE FIXRECORDING
Get the mic close and soften the room
This one move fixes the two biggest causes at once: distance and reverb. Bring the mic to a palm-width from your mouth and record where there are soft surfaces, not bare walls. A close, dry voice is already 80 percent of the way to clear before you open an editor.
How Record ten seconds, listen on headphones. Hollow tail means move closer and add soft stuff.
2
EDITMIX
Make the voice win against the music
If a backing track is playing, pull it down 4 to 5 dB and duck it under your speech. Music and voice fight for the same frequencies, and the music almost always wins by default because the person who chose it already loves it. The voice has to be the loudest thing in the room.
How Listen on phone speakers. If the words blur into the track, the music is still too loud.
3
DELIVERYPACE
Slow down to about 140 to 160 words a minute
Speed clips your consonants and leaves the listener no time to assemble each sentence. Slowing slightly and leaving real pauses at the commas does more for clarity than any plugin, and it is free. Read your script out loud once before you hit record so the rushed bits surface.
How Count roughly 2 to 3 words per second. If you are past that, you are racing the viewer.
THREE WAYS TO DIAGNOSE IT

By ear, by meter, or in one pass.

OPTION 01

By ear, on bad speakers

Free and surprisingly good. Play it on phone speakers, in a noisy room, and turn the volume to normal, not editor-loud. If you miss a word, your viewer will miss two. The catch: you know the script, so your brain cheats. Get someone who has not heard it to listen instead.

OPTION 02

With a meter and an EQ

Accurate, if you know what you are doing. A loudness meter confirms −14 LUFS, a spectrum analyser shows reverb and mud, and an EQ lets you carve. The cost is time and a learning curve. You have to read the tools correctly for every video, and most people would rather be making the next one.

OPTION 03

With a coach in one pass

Hand the file or a link to CutScore. It measures loudness, finds reverb and music masking, checks your pace, and returns a 0 to 100 score with timestamps and the exact fix for each. No meters to read. See a sample report.

How CutScore catches unclear speech for you CutScore is an AI video quality coach for pre-publish QC. It measures the audio deterministically (loudness with an EBU R128 meter, true peak, the music-to-voice balance, signs of reverb and noise) and uses AI for the genuinely subjective calls, like whether your pace is fighting the viewer. You get one score, the evidence behind it, and a prioritised list of fixes, so you know whether it is the room, the mix or your delivery before you publish. It judges the craft of the video itself. More on the method and the standards.
QUESTIONS

Frequently asked.

Usually one of five things: the room is too echoey, the mic is too far from your mouth, the overall loudness is too low, the music is sitting on top of your voice, or you are talking too fast and swallowing word endings. Clarity is rarely about an expensive microphone. It is about distance, the room, the mix, and your delivery, and all four are fixable.
Get the mic closer (a palm-width from your mouth), record in a softer room with fewer hard walls, then in the edit cut a little low rumble, add a touch of presence around 3 to 5 kHz, and set your loudness near −14 LUFS with peaks under −1 dBTP. If music is fighting the voice, pull it down four or five decibels. Do those in order and most clarity problems disappear.
More often the room than the mic. A cheap mic up close in a soft room beats an expensive mic across an empty, hard-walled room. If your speech sounds distant and washy with a hollow tail on every word, that is reverb, and no microphone fixes it. Move closer, add soft surfaces, and the same mic suddenly sounds clear.
Yes. When you rush, you clip the start and end of words, and the listener has no time to assemble the sentence. Most clear on-camera delivery sits around 140 to 160 words per minute with real pauses at the commas. Slowing down slightly and leaving gaps does more for clarity than any plugin, and it costs nothing.
EARLY ACCESS

Find out why before you publish.

CutScore listens to your audio and tells you exactly why the words are not landing, with the timestamp and the fix. Join the waitlist for early access.

Join the waitlist