Voice Prompting lets you optionally shape the tone of your voice-over with instructions about who’s speaking and how they should sound.
Expand the dropdown arrows below to view the information.
Setting Initial Voice Direction
Setting Initial Voice Direction
Start your AI video generation as usual
On the Set Direction screen, open the Options dialog (shown below).
Select the desired language from the dropdown.
Enter a prompt in the Voice Direction text field. (Additional guidance on prompting follows these instructions.)
Select your desired format.
Click Apply
Note: Voice Direction can also be added later, during the editing process.
Note: The entered Voice Direction prompt will apply to additional videos created using the Generate Again button (shown below). However, any instructions added in the Add details on direction textbox (also shown below) will not apply to voice direction. Additional changes to voice direction must be made in the Voice-Over Panel during the editing process.
Editing Voice Direction
Editing Voice Direction
To edit the voice direction after the initial video has been generated, follow these instructions.
1. From the Editor, select Edit in the Voice-Over panel
2. Select Set Voice Direction
3. In the Set Voice Direction dialog, use the Add details on direction textfield to edit your previously entered voice direction prompt, or - if you have not previously provided any voice direction - enter a new prompt.
Tips on Writing Voice Direction Prompts
Tips on Writing Voice Direction Prompts
What to include
Include only instructions that are needed to get the results you are hoping to achieve (i.e., there is no need to specify speaker, accent, or pace if that element is already typical of the voice). Aim for brevity, then add more specific instructions if you are not getting the results you want.
Define who’s speaking and their attitude.
“You are a calm, serious historian.”
“Speak with great joy and excitement.”
Specify a language or accent
“Speak French, in a style that you would hear on the radio in Paris or Lyon.”
“Speak English with a natural, modern New Zealand accent. Make sure to keep the accent consistent and avoid illogical variations within a single take.”
Describe the pace or which words to stress.
“Speak quickly; emphasize the word 'never'.”
What to avoid
Avoid the following:
Specific time references: Do not ask the system to control the final audio duration (e.g., "Make the audio exactly 30 seconds").
References to prior voice-over generations: Do not ask the system to change based on the last audio you made (e.g., "make the voice faster than the last time"). Every request is a new conversation.
English examples
"Speak English in a natural, modern New Zealand accent similar to a person you would hear on the radio in Wellington. Make sure to keep the accent consistent and avoid illogical variations within a single take."
Spanish examples
“Speak in Spanish, in the style of a fast-paced and energetic Mexican female from California.”
“Speak in Spanish, in the style of a regional Mexican male DJ voice with strong intonations (for concerts, restaurants, etc.)”
“Speak in Spanish, in the style of a second-generation Cuban-American - someone who grew up in Miami but maintains Cuban Spanish characteristics.”
“Speak in neutral Caribbean Spanish - a standardized, professional voice that maintains the general Caribbean rhythm and warmth but avoids heavy regional characteristics from any specific island.”
Regional / Character examples
“Read this in a casual South Boston accent—laid-back but assertive. Drop most “r” sounds, flatten vowels slightly, and speak with quick, punchy energy, like someone chatting at a local pub.”
Using Non-Verbal Cues and Pauses
Using Non-Verbal Cues and Pauses
Non-verbal vocal sounds and cues
You can use instructions for non-verbal vocal sounds or reading instruction like [sigh], [laughs], or [whispering] as part of the body of the text, but these are inconsistently recognized - sometimes they are read out loud. There is no officially supported list of words—use single-word descriptors in brackets and try generating multiple times, or change the instructions if you need to make this work. Sound effects (like [applause]) aren’t supported.
A more reliable approach:
Describe the tone instead of inserting sounds.
For example: “Highly amused, barely suppressing laughter.”
Pauses
Exact pause durations aren’t supported. Use “…” or bracketed cues like [hesitates].
Troubleshooting
Troubleshooting
If you are having difficulty, try the following:
Add detailed pronunciation instructions
Use AI or your own knowledge of a language's sounds to provide specific guidance on how to say certain words or sounds.
Example (prompt): Native English speaker performing French accent - warm, charming tone with characteristic French pronunciation patterns (th→z, the→ze, have→'ave, with→wiz), slightly emphasized Rs, and natural French phrasing like "it is magnifique" and "it transports you."
Match script content to the language
Modify the script to reflect the speech's sound, much as a novelist would. You will also get better results if the accent makes sense based on the content.
Example (script): Welcome to Colson Patisserie. We 'ave ze freshest croissants, macarons, and eclairs in ze city. Every morning, we bake wiz traditional French techniques and ze finest ingredients. Our pain au chocolat is magnifique! Come taste ze authentic flavors of Paris. Open Tuesday to Sunday, seven in ze morning until six. Colson Patisserie - where every bite transports you to France.
Add instructions to eliminate specific mistakes
If you hear certain sounds or words that you know are wrong, add commands to the prompt to correct them. (Side note: The example below counteracts a common bias towards Eastern European-sounding accents when the AI struggles.)
Example (added to existing prompt): Do not trill your r's or speak in any way that sounds more Eastern European than French.
Copies, Cutdowns, and Translations
Copies, Cutdowns, and Translations
Voice direction is saved and applies to copies of the video and cut-down Variations. The prompt will not persist for Translations (Variations) because the language itself changes—if desired, write and save a new voice direction prompt in the editor for those ad spots.
Related Articles








