ON-CAMERA DELIVERY BLOG / 9 MIN READ

How do I improve my on-camera delivery?

Stiff, flat, too many ums: the things that make people click away from a talking head are almost all fixable, and most of them are habits, not talent. Here is what actually moves delivery, with targets you can hit before you publish.

140–160words per minute
3sto earn the view
<3fillers per minute
0–100craft score

By Thomas, founder of CutScore · Updated June 2026

DELIVERY CHECK · talking_head.mp4
A creator sitting in front of a camera and a softbox light, mid-sentence, the everyday talking-head setup where on-camera delivery is either earned or lost.
CRAFT SCORE
FIXES ADVISED
how your delivery reads to a stranger
Pace on target · 152 wpm
Filler words high · 7 per minute00:42
Energy dips mid-video · flat after 2:1002:10
The 30-second answer To improve your on-camera delivery, work on five things in order: cut filler words (aim for under three "ums" or "likes" a minute), fix your pace (most engaging talking-head delivery sits around 140 to 160 words per minute, with deliberate variation), lift your energy about one notch past what feels natural, because the camera flattens everyone, talk to the lens instead of your own face in the preview, and script your first three seconds so they give the viewer a reason to stay. Drill one at a time, record, and watch yourself back honestly. That last part is the hard bit, which is exactly the job CutScore does in one pass.
WHY IT FEELS SO HARD

Delivery is the one part of video that nobody can fix for you in post. You can pay an editor to grade your colour and balance your audio. You cannot pay anyone to give you eye contact you never gave the lens. So the thing that most separates a video that holds attention from one that loses it is the part most creators avoid looking at: how you actually came across when you pressed record.

And it is uncomfortable, because you are watching yourself. I have shipped talking-head videos that I thought were warm and confident, then rewatched them six months later and physically winced. Flat voice, dead eyes, a verbal tic I had no idea I had. The performance that felt fine in the room read as half-asleep on screen. That gap between how it feels and how it lands is the whole problem.

Here is the reassuring part. Almost none of this is about charisma you were born with. Delivery is a stack of habits, and habits have targets. Filler words have a count. Pace has a number. Energy has a baseline you can push. None of it requires a personality transplant. It requires watching yourself back, honestly, and changing one thing at a time. Let me give you the five things that move the needle.

WHAT ACTUALLY MOVES DELIVERY

The five things that fix on-camera delivery.

Ranked by how much they change the viewer's experience per minute of effort. Fix them top to bottom, not all at once.

HabitTarget to hitWhat it costs you if you skip it
Filler words< 3 / minA dozen "ums" a minute quietly tells people you are not sure of yourself.
Speaking pace≈ 140–160 wpmToo slow and the scroll wins; a flat, even pace reads as boring.
Energy levelone notch upThe camera flattens everyone, so "natural" reads as flat on screen.
Eye contactto the lensLooking at your own preview reads as shifty, like you are avoiding the viewer.
Pitch variationrise and fallA monotone flattens even good writing into a lecture nobody finishes.
Pauseson purposeFilling every silence with sound makes you sound nervous, not natural.
Body languageopen, still-ishCrossed arms and constant sway pull focus off what you are saying.
First 3 secondsone reason to stayMost of your drop-off happens right here, before you have warmed up.
The one nobody wants to hearRewatch yourself. Every fix on this list starts with watching your own footage back, with the sound up, pretending it is a stranger. It is the least fun part of getting better on camera and the single most effective one.
SKIP THE SELF-CRINGE

Counting your own fillers and judging your own energy is brutal. CutScore measures pace, filler rate and where your energy dips, with timestamps, so you can fix the take instead of agonising over it.

Join the waitlist
HOW TO ACTUALLY FIX EACH ONE

Five fixes, one at a time.

1. Cut the filler words first

This is the fastest visible win, so start here. Record a normal take, then play it back and count every "um," "like," "you know" and "so." Most people are shocked by the number. The fix is not willpower, it is the pause: when you feel an um coming, close your mouth and let the silence sit for half a second instead. Silence reads as confident on camera. The verbal tic reads as nervous. You can track this directly as filler words per minute, and a good target for talking-head video is under three a minute. A separate piece on how to stop saying um and like goes deeper on the drills.

2. Get your pace right, then vary it

Most engaging talking-head delivery runs around 140 to 160 words per minute, a touch faster than ordinary conversation. But the number is the easy part. The thing that actually kills delivery is a pace that never changes, the same metronome rhythm from the first line to the last. Speed up on the connective tissue, the bits nobody needs to remember. Then slow right down on the one sentence you want to land, and let a pause hang after it. If you are not sure whether you drone, the detail is in how fast you should talk in a video.

Hands on an audio console with the level meters lit, a reminder that how your voice carries on camera is half performance and half a clean, well-set recording chain.
Delivery is half performance, half a clean recording chain. Photo: Tima Miroshnichenko / Pexels.

3. Push your energy one notch past comfortable

The camera flattens everyone, by roughly a notch. The level that feels natural and warm to you, sitting in your room, arrives on screen as muted and a bit tired. So the move that feels like over-acting in the moment usually reads as completely normal to the viewer. Sit up. Smile before the first word, even if you cut it. Let your voice rise and fall instead of holding one flat pitch, because monotone is what people actually mean when they say a video is "boring." If you suspect that is your problem, why you sound boring on camera breaks it down.

4. Talk to the lens, not to yourself

This one is small and it changes everything. If your camera shows a live preview, you will instinctively watch yourself instead of the lens, and on screen that reads as shifty eyes that never quite meet the viewer. Cover the preview, or stick a tiny arrow above the lens, and talk to the glass as if it were one person you like. Eye contact is most of what we read as confidence. Pair it with open body language, hands visible, shoulders down, and stop the nervous sway. Stillness is not stiffness. Stillness is authority.

5. Script and rehearse the first three seconds

Your delivery is judged hardest at the exact moment you are least warmed up: the opening. Most drop-off happens in the first three seconds, so do not waste them clearing your throat or saying "hey guys, so today." Write that opening line out, say it out loud ten times, and deliver it with your best energy of the whole video. Everything after it has a little forgiveness baked in. The first three seconds have none, so over-prepare them. What the first three seconds should be covers what to actually say.

RATHER SEE IT THAN READ IT?

Here is a real CutScore coaching report for an everyday talking-head video: pace, filler rate, energy dips and the hook, scored, with timestamps and the exact fixes.

See a sample report
SHORT ON TIME

If you only fix three things.

Most of the jump from "stiff and forgettable" to "this person knows how to hold a camera" comes from these three. Fix them first.

1
2-MIN FIXSPEECH
Replace your fillers with silence
A dozen "ums" a minute is the loudest amateur tell in delivery, and it costs nothing to fix. When you feel a filler coming, close your mouth and pause for half a second. The silence reads as confidence. The tic reads as nerves.
How Record a take, count your fillers, then halve it next time. Or let CutScore count them and mark every timestamp.
2
PERFORMANCEENERGY
Push your energy one notch up
The camera flattens everyone, so the level that feels natural arrives muted. Sit up, vary your pitch, and let what feels like over-acting happen. To you it feels too much. To the viewer it lands as a person who actually wants to be there.
How Record the same line at "normal" and at "too much," then watch both back. The "too much" one is almost always the keeper.
3
QUICKNARRATIVE
Nail the first three seconds
Your delivery is judged hardest at the opening, when you are least warmed up. Most drop-off happens here. Skip the throat-clear and the "hey guys," write your first line, and deliver it with your best energy of the entire video.
How Say the opening line out loud ten times before you record. See the hook.
THREE WAYS TO IMPROVE IT

Practice, a real critic, or one pass.

OPTION 01

Reps and the rewatch

Free, and it works, but slowly. Record, watch yourself back honestly, pick one habit, fix it, repeat. The catch is the same one we opened with: judging your own delivery is hard, and most people quietly stop watching the cringe-worthy parts, which are the parts that need the work.

OPTION 02

A blunt human critic

Honest feedback from someone who will not spare your feelings is gold. The cost is finding that person, and getting it for every video. Friends say "it was great." You want the one who says "you said 'like' eleven times and you trailed off at the end." Those people are rare.

OPTION 03

A coach in one pass

Hand the file (or a link) to CutScore. It measures your pace, your filler rate, where your energy sags and how your hook lands, then gives you a 0 to 100 score with timestamped evidence and the fixes. No friend to bribe, no cringe to push through alone. See a sample report.

Where delivery sits in the bigger picture CutScore is an AI video quality coach. Delivery is one part of what it reads: it measures speaking pace, filler words per minute and where your energy dips, alongside the rest of the craft (loudness, exposure, pacing of the edit, captions, export settings). You get one score, the evidence behind it, and a prioritised list of fixes, before anyone else sees the video. It judges the craft of the video itself, not your tags or thumbnails, so it sits next to a growth tool rather than competing with one. More on everything we check.
QUESTIONS

Frequently asked.

Pick one fix and drill it. The fastest single win is cutting filler words: record one take, count every um and like, and aim to halve it on the next take. After that, work on pace (around 140 to 160 words per minute for most talking-head video) and on looking at the lens, not at yourself in the preview. One change per session beats trying to fix everything at once.
Most engaging talking-head delivery sits around 140 to 160 words per minute, faster than ordinary conversation. The exact number matters less than the variation. A flat, even pace is what reads as boring. Speed up on the throwaway lines, slow right down on the one sentence you actually want people to remember, and let yourself pause.
Usually it is flat energy and a flat pitch, not a flat personality. The camera flattens everyone by about a notch, so the level that feels natural in the room reads as half-asleep on screen. Push your energy slightly past comfortable, vary your pitch, and use pauses on purpose. What feels like over-acting to you usually lands as normal to the viewer.
Partly. A good editor can cut filler words, tighten pauses and lift sagging energy with pace, and a jump cut hides a stumble cleanly. But editing cannot add eye contact you never gave the lens, and it cannot manufacture energy that was not recorded. Editing rescues a decent take. It cannot resurrect a dead one, so fix the delivery at the source first.
EARLY ACCESS

Stop guessing how you came across.

CutScore measures your pace, your filler words and where your energy dips, then tells you exactly what to fix, with timestamps. Join the waitlist for early access.

Join the waitlist