VOICE LEVELS BLOG / 8 MIN READ

What is a good audio level for voice in a video?

Your viewers forgive a soft shot. They do not forgive a voice they have to strain to hear. Here are the real numbers for voice level, the targets I actually use, and how to hit them without owning a single plugin you cannot pronounce.

−14 LUFSintegrated target
−1 dBTPtrue peak ceiling
4 to 6 dBvoice over music
0 to 100craft score

By Thomas, founder of CutScore · Updated June 2026

VOICE LEVEL CHECK · interview_cut.mp4
An audio engineer leaning over a mixing console with a pair of monitoring headphones, the moment where a video's voice level gets set before it is exported.
CRAFT SCORE
FIXES ADVISED
how the voice actually measures
Integrated loudness on target · −14 LUFS
True peak too hot · −0.2 dBTP, pull to −102:14
Music masking voice · duck 5 dB under speech00:48
The 30-second answer A good audio level for voice in a video means the whole file is normalised to an integrated loudness near −14 LUFS, with your spoken voice sitting in the −16 to −12 LUFS short-term range while you talk. Keep the true peak at or below −1 dBTP so nothing crackles after the platform re-encodes your file, and keep the voice roughly 4 to 6 dB above the music whenever someone is speaking. On a peak meter, that usually puts your dialogue peaks around −12 to −6 dBFS, never touching zero. Hit those four numbers and your voice sounds present and clear on cheap phone speakers and good headphones alike. If checking them by hand sounds tedious, that is the exact job CutScore does in one pass.
WHY THIS IS CONFUSING

There is a reason "what is a good audio level for voice" has no single, clean answer floating around. The number you grew up with, the one your editor shows on its meter, is dBFS, and dBFS measures peaks. It tells you how close your loudest spike is to zero, which is the point where digital audio distorts. Useful, but it says almost nothing about how loud your voice actually feels.

Two voices can peak at the exact same level and feel wildly different in loudness. One is a calm, steady narration that sits at a consistent level. The other is a whispered intro with three shouted words that spike to the top. Same peak, very different experience. That gap is why platforms stopped caring about peaks and started measuring loudness instead, in a unit called LUFS, which roughly tracks how loud a human actually perceives the sound over time.

So the honest version of the answer needs both. A peak target so your voice never clips, and a loudness target so it sits at the right perceived level next to every other video in the feed. I have shipped videos that nailed one and ignored the other, and they sounded amateur either way. The fix is knowing which number does which job.

THE NUMBERS

The voice level targets I actually use.

Four numbers carry almost all the weight. Hit these and your voice will sound clear, present and roughly as loud as everything else people are watching.

What to measureTarget for voiceWhy it matters
Integrated loudness≈ −14 LUFSThe perceived loudness of the whole video, matched to what YouTube normalises toward.
Short-term voice level−16 to −12 LUFSWhere your dialogue should sit while you are actually speaking, moment to moment.
True peak≤ −1 dBTPThe ceiling that stops the voice crackling after the platform re-encodes the file.
Voice over music+4 to +6 dBHow far the voice should sit above the background track while someone speaks.
Dialogue peaks (dBFS)−12 to −6 dBFSA rough peak-meter range that leaves headroom and keeps you off the zero line.
One caveat on the platformThe −14 LUFS target is the YouTube and Spotify ballpark. Short-form is often mixed a touch louder, so TikTok and Reels mixes frequently land nearer −13 to −10 LUFS, where the feed feels punchier. The voice-over-music gap and the −1 dBTP ceiling stay the same everywhere.
SKIP THE METER-READING

Reading loudness, true peak and the voice-to-music balance by hand on every video is a chore. CutScore measures all three in one pass and tells you the exact gain change to make.

Join the waitlist
HOW TO ACTUALLY HIT THEM

How do I get my voice to the right level?

1. Record with headroom, not at full tilt

Set your recording level so your normal speaking peaks land around −12 to −6 dBFS, with your loudest moments staying off zero. People hear that range and panic that it looks quiet on the meter. It is meant to. You want room above the voice for the louder moments, because once a recording clips at zero, the distortion is baked in and no plugin can rescue it. Quiet-but-clean beats loud-but-clipped every single time. You can always raise a clean recording later. You cannot un-clip one.

2. Even out the dynamics with light compression

A raw voice swings a lot. You lean in, you lean back, you get excited, you trail off. Compression gently squeezes that range so the quiet words come up and the loud ones stay in check, which makes the whole thing sit at a steadier, more present level. Use a modest amount: you are smoothing the voice, not crushing it into a flat wall. Done well, the listener never notices it, they just notice they can hear every word without reaching for the volume.

Two people working over a mixing board crowded with faders and level meters, setting the voice level on a recording before it is mixed against music.
Even out the dynamics first, then normalise the whole thing toward −14 LUFS. Photo: cottonbro studio / Pexels.

3. Normalise the finished mix toward −14 LUFS

Once the voice is recorded clean and lightly compressed, the last step is loudness. Run a loudness meter over the finished mix and nudge the master gain until the integrated reading sits near −14 LUFS for YouTube. This is the number that decides whether your video feels confident or timid next to the one that autoplays after it. If you upload at, say, −20 LUFS, the platform turns it up to match, but it also drags up your noise floor with it, so a quiet, clean mix is not the same as a quiet, dirty one.

4. Set a true-peak ceiling at −1 dBTP

Before you export, put a limiter on the master with its ceiling at −1 dBTP. Here is why that extra decibel matters: when the platform compresses your audio into AAC, the encoding process can push peaks slightly higher than they were in your file. A track that peaked at exactly 0 in your edit can end up over the line after upload, and that is where the crackle comes from. Leaving a decibel of true-peak headroom keeps the voice clean through the squashing. It costs you nothing audible and saves you a re-upload.

5. Duck the music under the voice

The voice level can be perfect and still get buried. Whenever someone is speaking, the music should drop so the voice sits roughly 4 to 6 dB on top of it. The cleanest way is sidechain ducking, where the track automatically dips the instant the voice comes in and rises again in the gaps. No sidechain plugin? Just automate the music volume down under every line by hand. Music drowning the voice is the most common amateur tell in the whole feed, and it is entirely a mixing decision, not a gear problem.

RATHER SEE IT THAN READ IT?

Here is a real CutScore coaching report for an everyday talking-head video: loudness, true peak and the voice-to-music balance, scored, with timestamps and the exact fixes.

See a sample report
SHORT ON TIME

If you only fix three things.

Most of the perceived jump from "homemade audio" to "this person knows what they are doing" comes from these three. Fix them first.

1
2-MIN FIXAUDIO
Normalise the mix to about −14 LUFS
This is the number that decides whether your voice feels confident or timid against the next video. Run a loudness meter over the export and nudge the master until the integrated reading lands near −14 LUFS, with the true peak under −1 dBTP.
How Use a loudness meter on the finished mix, or let CutScore measure it and hand you the exact gain change.
2
QUICKAUDIO
Get the voice 4 to 6 dB above the music
Music burying the speech is the most common amateur tell there is. Duck the track under every line so the voice clearly sits on top. If you catch every word without effort but still hear the music, you are in the right place.
How Sidechain the music to the voice, or automate the music volume down by hand under each line. See fixing music over voice.
3
SETUPAUDIO
Record with headroom, never at full tilt
Set your input so normal speech peaks around −12 to −6 dBFS and the loudest words stay off zero. A clipped recording is ruined for good, but a clean, quiet one is trivial to raise later. Headroom is free insurance.
How Watch your input meter while you do a loud test line. If it kisses zero, turn the gain down a notch.
THREE WAYS TO CHECK YOUR VOICE LEVEL

By ear, by meter, or in one pass.

OPTION 01

By ear, on the worst speakers you own

Free, and a real test. Play the video on a single phone speaker, not your studio headphones. If you can hear every word clearly over the music, you are close. The catch is that ears adapt: after an hour in the edit, quiet starts to sound normal, so test fresh or on someone else's video.

OPTION 02

With a loudness meter

Accurate and honest. A LUFS meter reads integrated and short-term loudness, and a true-peak meter catches the −1 dBTP ceiling. The cost is time and knowledge: you have to know the targets, open the meter and read it correctly on every video. Great if you enjoy this. Most people do not.

OPTION 03

With a coach in one pass

Hand the file (or a link) to CutScore. It measures integrated loudness, true peak and the voice-to-music balance against the right target for your platform, then gives you a 0 to 100 score with timestamped evidence and the exact fixes. No meters to read. See a sample report.

How CutScore checks your voice level for you CutScore is an AI video quality coach for pre-publish QC. It measures the audio deterministically with an EBU R128 loudness meter, so the integrated loudness, the short-term voice level, the true peak and the voice-to-music balance are real numbers, not opinions. You get one score, the evidence behind it, and a prioritised list of fixes with the exact decibel changes to make, before anyone else hears the video. It judges the craft of the audio itself, so it sits happily next to a growth tool rather than competing with one. More on the method and the standards.
QUESTIONS

Frequently asked.

Aim for an integrated loudness around −14 LUFS for the whole video, with your spoken voice sitting in the −16 to −12 LUFS short-term range while you talk. Keep the true peak at or below −1 dBTP so nothing crackles after the platform re-encodes the file. That combination reads as clear, present and professional on phone speakers and headphones alike.
On a peak meter, aim for your voice peaks to land roughly between −12 dBFS and −6 dBFS, with the loudest moments never touching 0. But dBFS only tells you the peaks, not how loud it feels. LUFS is the better target, so normalise the finished mix to about −14 LUFS and check the true peak stays under −1 dBTP.
As a starting point, keep the voice roughly 4 to 6 dB above the background music whenever someone is speaking. If you can hear the music clearly but still catch every word without effort, you are close. When the words start to blur into the track, the music is winning, so pull it down a few decibels and duck it under the voice.
Usually because the loud peaks are fine but the average level is low, so you have lots of quiet gaps and a few spikes. Raising the master only pushes the peaks into distortion. The fix is compression to even out the dynamics first, then normalise the whole thing toward −14 LUFS, which raises the perceived loudness without clipping.
EARLY ACCESS

Stop guessing whether your voice is loud enough.

CutScore measures your integrated loudness, true peak and voice-to-music balance, then tells you the exact decibel changes to make. Join the waitlist for early access.

Join the waitlist