Text to Speech Tool | Read Text Aloud
Paste text and listen in your browser with selectable voices, rate, pitch, volume, live word highlighting, queue navigation, speaking-time estimates, and copyable cue sheets.
Playback
Speed preset
Voice
Fine Tuning
What Is the Text to Speech Tool?
The Text to Speech Tool converts any written text into spoken audio directly in your browser using the Web Speech API — no download, no subscription, no plugin required. Paste any text and click Speak to hear it read aloud in the voice, language, speed, and pitch of your choice. The tool works on any modern desktop browser including Chrome, Firefox, Edge, and Safari, using the voices installed on your operating system.
This free online text to speech reader is used for proofreading (hearing your writing often reveals errors that the eye skips), language learning (hearing pronunciation alongside reading), accessibility (making written content listenable for people with visual impairments or dyslexia), script rehearsal (checking how dialogue flows at speaking pace), multitasking (listening to articles or notes while doing other tasks), and converting personal notes into audio for commutes. The split mode breaks long texts into paragraphs or sentences so you can click any section of the queue to jump directly to it, making navigation through long documents easy.
Text to Speech Tool Formula and Method
Estimated speaking time = word count ÷ (130 wpm × rate multiplier).
Rate 0.5× = 65 wpm (very slow). Rate 1.0× = 130 wpm (normal). Rate 1.5× = 195 wpm (fast). Rate 2.0× = 260 wpm (very fast).
Paragraph mode: split on blank lines (\n\n). Sentence mode: split on sentence-ending punctuation [.!?]. Whole text: speak as a single utterance.
Voice quality: local voices (★) use on-device synthesis. Remote voices require an internet connection.
How to Use
- 1
Paste your text into the large editor panel. Any plain text works — articles, emails, scripts, notes, study material, or foreign-language content. The word count and estimated speaking time appear in the status bar at the top.
- 2
Select a language from the Language dropdown. The Voice dropdown then filters to voices available in that language on your device. Voices marked with ★ are local (on-device) and work offline. Unmarked voices may require an internet connection.
- 3
Use the Rate slider to adjust speaking speed. Rate 1.0 is normal conversational pace (about 130 words per minute). Rate 1.25–1.5 is comfortable for audiobook-style listening. Rate 0.75 is helpful for language learning or difficult technical content.
- 4
Adjust Pitch (0 to 2, default 1) to raise or lower the voice tone. Adjust Volume (0 to 1) as needed. These settings let you customise the voice to be more natural and comfortable for extended listening.
- 5
Choose a Split mode: Whole reads the entire text as one utterance. Paragraphs splits on blank lines, creating one queue item per paragraph. Sentences splits on sentence-ending punctuation for the most granular control. The queue panel shows all parts when Paragraphs or Sentences is selected.
- 6
Click Speak to start playback from the beginning. Use Pause and Resume to control playback. Click any item in the queue to jump directly to that paragraph or sentence. Click Stop to end playback completely and return to idle.
Text to Speech Tool Example
A PhD student uses the text to speech tool to proofread a 4,000-word thesis chapter. She pastes the text, sets the rate to 1.2× (slightly faster than normal), and selects Sentence mode. As the tool reads each sentence aloud, she follows along with her eyes. Two grammatical errors and one repeated word become immediately obvious — errors she had read past dozens of times without noticing in visual review. The audio catches them because hearing forces the brain to process each word sequentially rather than pattern-matching familiar phrases.
A language learner uses the tool to practise French listening comprehension. He pastes a French news article, selects a French voice from a native speaker voice model (fr-FR), and sets the rate to 0.75× for comfortable comprehension speed. He listens to each paragraph with the text visible, then plays it again without looking. This combination of reading and listening reinforces both vocabulary recognition and pronunciation — a technique supported by research on language acquisition.
Understanding Text to Speech
How the Web Speech API Powers Browser Text to Speech
This text to speech converter uses the SpeechSynthesis interface of the Web Speech API, a browser-native technology that does not require any external service, plugin, or cloud API key. The API is available in all modern browsers: Chrome, Edge, Firefox, and Safari on both desktop and mobile. It accesses voices that are installed on the operating system — on macOS these are Apple's Siri voices; on Windows they are Microsoft's neural voices; on Android they are Google's voices; on iOS they are the same Apple voices as macOS.
Voice quality varies significantly by operating system and by whether a voice is "local" (processed on-device) or "remote" (processed in the cloud). Local voices are available offline and respond instantly. Remote voices — often labelled as "Enhanced" on macOS or "Online" — require an internet connection but may have noticeably higher quality, more natural intonation, and better handling of unusual words. On macOS, you can download additional high-quality voices in System Preferences → Accessibility → Spoken Content → Manage Voices.
Proofreading by Listening: Why It Works
Proofreading with text to speech is one of the most effective methods for catching errors that visual proofreading misses. The human visual system predicts text based on context — familiar phrases are processed as units, meaning the brain often reads what it expects rather than what is actually there. Hearing the text removes this predictive shortcut. Every word is processed individually and in real time. Missing words, repeated words, incorrect tenses, and awkward sentence rhythms become audible as the synthesised voice reads them. Many professional editors combine screen proofreading with audio playback as standard practice. The Sentence mode in this tool is particularly useful for proofreading — each sentence is a discrete queue item, making it easy to stop and review a specific sentence that sounded wrong.
Text to Speech for Language Learning
Language learners use text to speech tools to train listening comprehension and pronunciation recognition alongside reading. Selecting a native-language voice for a foreign language and setting the rate to 0.6–0.8× gives a slow, clear playback that makes individual words and sounds distinguishable. This is useful for: learning how written words are pronounced in languages with irregular spelling (French, English, Irish); hearing the rhythm and intonation patterns of a new language; checking whether you recognise vocabulary you have been studying before seeing the translation; and matching audio to text while reading news articles or book excerpts in the target language.
Accessibility: Making Written Content Listenable
Screen readers are specialised software (NVDA, JAWS, VoiceOver) designed for blind or visually impaired users to navigate entire operating systems and applications. This text to speech tool serves a different, complementary need: making specific blocks of text audible for anyone who prefers listening to reading, experiences reading fatigue, has dyslexia or a visual processing difference, or wants to consume written content in a hands-free context. The paragraph and sentence queue makes long-form content navigable — click to jump to any section rather than listening from the beginning every time.
Rate, Pitch, and Volume: Getting the Best Experience
The Rate slider controls how quickly the voice speaks. Research on audiobook listening habits shows most people prefer a rate between 1.2× and 1.7× for focused listening of familiar content, and 1.0× for difficult or technical content. Language learners typically use 0.6–0.85× for comprehension practice. Pitch adjustment changes the fundamental frequency of the voice — raising it slightly can make some voices sound clearer and less monotone. Volume controls the synthesiser's output volume independently of system volume. For extended listening sessions, set volume to a comfortable level and use system volume controls for fine adjustment.
Limitations of Browser-Based Text to Speech
The Web Speech API has practical limitations compared to cloud TTS services like Google Cloud Text-to-Speech or Amazon Polly. Voice quality is limited by what the operating system provides — older operating systems have robotic-sounding voices. Some browsers (particularly Firefox) have limited voice selection. Long texts split into many utterances may occasionally have a slight pause between sections. Mobile browsers may suspend audio when the tab is not active, interrupting playback. For the highest quality text-to-speech audio — for producing podcasts, voiceovers, or audio content for distribution — a dedicated cloud TTS service with natural language processing is more appropriate.
Frequently Asked Questions
Why are no voices available in the dropdown?
Voices load asynchronously after the browser initialises the Web Speech API. On some browsers (particularly Chrome on Windows), voices take a moment to load and only become available after a user interaction event such as a button click or keystroke. If the voice list is empty, try clicking in the editor or pressing a key, then wait 1–2 seconds for the list to populate. If it remains empty, refresh the page. On some restricted environments — corporate networks with browser policies, or certain in-app browsers inside other apps — the Web Speech API may be blocked entirely. In that case, try opening the page in a standard desktop browser like Chrome or Edge.
What is the difference between local and remote (cloud) voices?
Local voices (marked with ★ in the dropdown) are installed on your device and processed entirely on-device by your operating system. They work offline, respond instantly with no network latency, and are completely private — no audio data is transmitted. Remote or cloud voices are processed on external servers. They typically have higher quality and more natural intonation — particularly on macOS where Apple's "Enhanced" voices are notably better than the standard local voices. However, remote voices require an internet connection and may transmit your text to Apple's or Microsoft's servers for processing. For sensitive content, stick to local (★) voices.
Why does playback stop when I switch browser tabs on mobile?
Mobile operating systems aggressively suspend background processes to conserve battery, and most mobile browsers suspend JavaScript execution (including the Web Speech API) when a tab is moved to the background or another app is opened. This interrupts TTS playback. To maintain uninterrupted listening on mobile: keep the browser tab in the foreground, enable "Request Desktop Site" in some mobile browsers which may behave differently, or use the mobile device's built-in accessibility text-to-speech feature (VoiceOver on iOS, TalkBack on Android) which operates at the OS level and is not affected by tab switching.
Can I download the audio as an MP3 or WAV file?
No — browser-based text to speech using the Web Speech API does not provide access to the audio output stream in a format that can be saved as a file. The audio is rendered directly to the device's speaker output without exposing an audio buffer for download. To generate downloadable audio from text, you need a cloud text-to-speech service such as Google Cloud Text-to-Speech, Amazon Polly, Microsoft Azure Cognitive Services Text to Speech, or ElevenLabs. These services produce high-quality MP3 or WAV files but require API keys and incur usage costs at scale.
How long can the text be?
There is no hard length limit in this tool. However, the Web Speech API in some browsers (notably Chrome on Windows) has an undocumented limit on single utterance length — typically around 200–300 words per utterance. The Paragraphs or Sentences split modes work around this by breaking the text into smaller chunks that each fall within the browser limit. If you are reading very long text (thousands of words), use Paragraphs or Sentences mode rather than Whole text mode to ensure reliable playback across all browsers.
Does the text to speech tool support non-English languages?
Yes — the tool supports any language for which your operating system has voices installed. The Language dropdown shows all language tags available in your browser's voice list. Common languages on macOS and Windows include English (various regional accents), French, Spanish, German, Italian, Portuguese, Japanese, Chinese (Mandarin), Korean, Arabic, and Hindi. Voice availability varies by platform and operating system version. To add more languages on macOS, go to System Preferences → Accessibility → Spoken Content → System Voice → Manage Voices. On Windows, add language packs in Settings → Time & Language → Language.
You Might Also Like
Explore 360+ Free Calculators
From math and science to finance and everyday life — all free, no account needed.