What Is Dictation? How AI Voice-to-Text Actually Works

Dictation is speaking words out loud so that they get written down. On a computer or phone, dictation means using speech-to-text software that listens to your voice and types what you say in real time — so you can write an email, a note, or a document by talking instead of using a keyboard.

That's the short answer. Below is how it actually works, how it differs from transcription, and why modern AI dictation is suddenly good enough to replace typing for a lot of people.

Where the word comes from

Long before computers, "dictation" meant one person speaking while another wrote down their words — an executive dictating a letter to a secretary, a doctor dictating notes for a typist. The meaning hasn't changed; the stenographer has just been replaced by software. Today, when people say dictation they almost always mean talking to a device and watching the text appear.

How dictation actually works

Turning sound into text happens in a few stages, and modern systems run them fast enough to feel instant:

Capture. Your microphone records the sound of your voice as a stream of audio.
Acoustic modeling. The software breaks that audio into tiny slices and identifies the basic units of speech — the phonemes — it contains.
Language modeling. A model trained on enormous amounts of text predicts which actual words those sounds most likely form, using context to resolve ambiguity (so it knows you meant "their" and not "there").
Output. The recognized words are inserted as text, often with automatic punctuation and capitalization added.

Older dictation tools needed long training sessions to learn your voice. Today's AI models are trained on thousands of hours of varied speech, so they work well for most people straight away — no enrollment required.

Dictation vs. transcription vs. voice commands

These terms get used loosely, but they describe different things built on the same speech-recognition foundation:

Dictation is live. You speak and text appears as you go. It's for creating text — writing messages, documents, and notes by voice.
Transcription works from a recording. You feed in audio that was already captured — a meeting, an interview, a lecture — and get text back afterward. It's for converting existing audio.
Voice commands use recognition to do something — "set a timer," "play music" — rather than to write text.

In practice the line blurs: a good voice tool can do both dictation and transcription of recordings, because the engine underneath is the same.

On-device vs. cloud dictation

One of the biggest differences between dictation tools today is where the recognition happens:

Cloud dictation sends your audio to a remote server to be processed. It needs an internet connection, and your voice leaves your device. Most built-in phone and OS dictation works this way.
On-device dictation runs the model locally on your own computer. It works offline, keeps your voice private, and doesn't depend on a service being reachable.

On-device used to mean lower accuracy, but modern local models have closed most of that gap — which is why privacy-first dictation is now practical. Vowen runs transcription on your device by default, so dictation keeps working offline and your voice stays with you.

How accurate is dictation today?

For clear speech in a supported language, modern AI dictation routinely exceeds 95% word accuracy — accurate enough that correcting a few words is faster than typing the whole thing. Accuracy still drops with loud background noise, an accent the model wasn't trained on, or industry-specific jargon. The last one is fixable: many tools let you add a custom vocabulary of names, acronyms, and terms so they stop being misheard.

What people use dictation for

Writing faster. Most people speak far faster than they type, so drafting emails, messages, and documents by voice is quicker.
Accessibility. For anyone with RSI, limited mobility, or a condition that makes typing hard, dictation is a primary way to use a computer.
Hands-free capture. Jotting an idea while walking, cooking, or driving.
Professional documentation. Clinicians, lawyers, and others dictate notes to cut hours of typing — see medical dictation for one example.

Getting started with dictation

Every major platform has built-in dictation you can try right now — press Windows + H on Windows, or enable Dictation in Keyboard settings on a Mac. They're a fine starting point, but they tend to be cloud-dependent, inconsistent app to app, and limited on custom terms. If you find yourself relying on dictation daily, a dedicated tool that runs system-wide and on-device is the upgrade — it works the same everywhere, keeps your voice private, and learns your vocabulary.

The bottom line

Dictation is simply writing by speaking — speech-to-text software turns your spoken words into typed text in real time. The technology has quietly become excellent, and modern on-device tools make it private and reliable enough to use all day. Vowen is free to download, runs on Mac and Windows, and works in every app — a fast way to find out whether talking beats typing for you.

Frequently asked questions

What is dictation in simple terms?

Dictation is the act of speaking words out loud so they're written down. In computing, dictation means using speech-to-text software that converts your spoken words into typed text in real time as you talk.

What is the difference between dictation and transcription?

Dictation is real-time and live — you speak and text appears as you go. Transcription works from an existing recording, converting audio that was already captured into text after the fact. The underlying speech recognition is similar; the difference is timing.

Is dictation the same as speech recognition?

Speech recognition is the underlying technology that identifies words in audio. Dictation is one application of it — using speech recognition to type text as you speak. Voice commands and voice assistants are other applications of the same core technology.

How accurate is modern dictation?

Modern AI dictation is highly accurate for clear speech in a supported language, often above 95% word accuracy. Accuracy drops with heavy background noise, strong accents the model wasn't trained on, or specialized jargon — though custom vocabularies can fix the last one.

Talk instead of type.

Vowen is free voice-to-text that works in any app, on Mac and Windows. No account required.

Download for macOS Download for Windows