What This Is
This is a voiceover script generator for property tour videos. It’s designed specifically for real estate agents who need professional, timed narration that matches their video footage. The prompt works step by step:
- It asks you for your raw video footage and a few simple details (property description, tone, call to action, etc.).
- It analyzes the video second by second to determine what spaces and features appear on screen.
- It builds a walkthrough outline and then generates a narration script that’s perfectly timed to the video’s length.
- The script is formatted in XML, a markup language similar to HTML. XML here adds instructions for pacing, emphasis, and pauses—so the voiceover sounds natural, not robotic.
- You’ll take that XML script and paste it directly into Elevenlabs. Elevenlabs is a text-to-speech platform known for realistic voice cloning. You can generate narration in your own cloned voice or a professional-sounding AI voice.
The end result: a done-for-you voiceover that syncs seamlessly with your property tour, ready to drop into your video editor (CapCut, Descript, Premiere, etc.).
How To Use It
- Copy the Prompt
- Copy the entire prompt exactly as provided. Do not change or edit anything.
- Paste into ChatGPT
- Start a new chat and paste the prompt in.
- Hit enter to run it.
- Answer the Questions
- The prompt will ask you questions one at a time:
- Upload your exact video footage (final cut).
- Provide a short property description.
- Choose where the video will be posted (Reels, TikTok, YouTube, etc.).
- Select a vibe/tone (professional, warm, cinematic, etc.).
- Decide if you want a call to action.
- Specify anything to include or avoid (e.g., number of bedrooms, avoid Fair Housing issues).
- Review the Walkthrough Outline
- ChatGPT will create a second-by-second outline of what’s happening in the video.
- This ensures the narration matches the flow of the footage.
- Confirm you’re ready to move forward.
- Get Your Script in XML
- ChatGPT will generate a voiceover script in XML, inside a preformatted text block.
- XML includes instructions like pauses, emphasis, and tone, so the speech sounds natural.
- Copy the entire XML script as is.
- Paste into Elevenlabs
- Go to Elevenlabs.io.
- Use the Text-to-Speech feature.
- Paste the XML script directly into the text box.
- Choose your cloned voice or another voice option.
- Generate and download the audio file.
- Add to Your Video
- Import the downloaded voiceover into your video editor (CapCut, Descript, Premiere, etc.).
- Align it with your footage (it will already match the timing).
- Export your finished video with a professional narration track.
Prompt
# ElevenLabs Ready XML Prompt
**Role:** You are tasked with writing expressive narration in **ElevenLabs Ready XML**.
**Rule:** Do **not** produce XML until all questions are answered **and** the exact video cut is attached.
**Output:** All outlines and scripts must be returned in **preformatted code blocks (“` … “`).** XML must contain *only* narration and supported SSML tags.
—
## Intake (ask one at a time)
1. Please attach the **SUBJECT VIDEO FOOTAGE** (the exact cut we’re voicing to).
2. Provide a short **PROPERTY DESCRIPTION** (style, key features, neighborhood, etc.).
3. Where will this video be **posted**? (Instagram Reel, TikTok, YouTube Long-Form, Widescreen Cinematic, Social Media Fun, etc.)
4. What **VIBE / TONE** should the narration carry? (Professional, Warm, Excited, Cinematic, Energetic.)
5. Do you want to include a **CALL TO ACTION** at the end? If yes, what should it say?
6. Are there any specific things you **DO or DON’T want mentioned**? (e.g., must-say facts like beds/baths/square footage, brand-voice preferences, or restrictions such as avoiding Fair Housing issues or certain phrases.)
—
## After Inputs + Footage Are Received
A. **Target Length:** Infer automatically from the video runtime. The primary script must match this length.
B. **1-Second Analysis (optical recognition required):**
– Analyze the uploaded video at one-second intervals.
– Identify what appears in the frame: the room, space, feature, or transition.
– Group consecutive seconds that show the same space into clusters.
– Use **logic + context** to ensure accuracy. Example: if footage shows moving upstairs, then subsequent segments must describe spaces consistent with being upstairs.
– Avoid low-level descriptors (colors, brightness, textures). Focus on semantic property details: rooms, finishes, features, flow.
– Incorporate user-provided facts (e.g., user says “marble” not “granite”). Defer to user if visual inference conflicts.
C. **Walkthrough Outline (semantic + contextual):**
– Return a timestamped outline like a table of contents for the tour.
– Each segment must describe what the viewer is seeing in property-tour terms.
– Examples:
“`
WALKTHROUGH OUTLINE
00:00–00:04 — Exterior approach: curb appeal, two-car garage, modern stucco
00:05–00:09 — Entry/Foyer: glass door, sidelights, staircase in view
00:10–00:16 — Kitchen: large island, stainless appliances, marble counters
00:17–00:22 — Living Room: fireplace, built-ins, sliding doors to patio
00:23–00:28 — Staircase Up: ascending to second floor
00:29–00:34 — Primary Bedroom: accent wall, large windows
“`
– The length of descriptions should be flexible. They should be **detailed enough** to support natural narration that will fit the video timing.
– Ask the user if they want you to proceed to the narration script.
—
## Once User Confirms to Proceed
1. Produce **ONE complete XML script** that matches the video length.
2. Return the XML inside a **single preformatted code block (“` … “`)** so it can be copy/pasted directly into ElevenLabs.
3. The XML must contain **only narration and supported SSML tags** — no timestamps, no outline notes, no comments.
4. Use **only** ElevenLabs-supported tags:
– `<speak>` (root)
– `<p>` and `<s>` (structure)
– `<break time=”###ms”/>` (pauses)
– `<emphasis level=”moderate|strong”>` (key highlights)
– `<prosody rate=”slow|medium|fast” pitch=”+/-N%” volume=”+/-N dB”>` (tone, pacing, inflection)
5. **Narration must be expressive.**
– Punch key property highlights with `<emphasis>`.
– Use `<prosody>` to vary pacing and tone: faster for energy, slower for luxury details.
– Insert `<break>` for scene changes or dramatic beats.
– Match the requested vibe/tone (cinematic, energetic, professional, etc.).
– Ensure pacing fits runtime: do the math so word counts align with video length.
6. **Voiceover Script Rules:**
– Narration must follow the **Walkthrough Outline** exactly.
– Describe the property tour footage — the spaces and features the viewer is seeing — in logical order.
– Use context: if the outline says “stairs up,” narration should naturally lead to a bedroom, not back to the kitchen.
– Language must be conversational, vivid, and aligned with user’s inclusions/exclusions.
—
## After Outputting the Primary Script
– Ask the user if they would like alternate versions (**Shorter ~–10%**, **Longer ~+10%**).
– Only generate alternates if the user confirms.
– If generated, each alternate must also be in its **own preformatted block** and must be **pure XML narration only**.