How to Create Long-Form AI Talking Videos with Consistent Characters

Learn to make long, consistent AI talking videos for free using a single image. This guide covers a step-by-step process with Gemini’s Nano Banaana, ChatGPT, and Google Flow.

This article provides a comprehensive, step-by-step guide on how to create a long-form AI talking video with a single, consistent character from a single image. The process, developed through extensive trial and error, leverages a combination of free and paid AI tools like Google’s Nano Banaana and Flow, and ChatGPT. You’ll learn how to generate a starting image, craft a detailed JSON video prompt, and then stitch together multiple scenes to create a seamless, long-duration AI video blog without any video length limitations.

Prerequisites & Materials

A single, clear image of a person: This will be used as the starting point for your video.
Google AI Pro account: Required to access Google Flow.¹ As of the video’s date, this account costs ₹2,000 per month, but a one-year free offer is available for university students in India until September 15th. Alternative methods to get a free account may exist on YouTube.
Access to Google’s Nano Banaana tool: A free image editor/creator.²
Access to ChatGPT (free version): Used to generate JSON prompts.
Google Flow: The primary tool for video generation. Access is tied to the Google AI Pro account.³
Time: The process involves multiple steps, including generating images and prompts and then creating the video.⁴

Before You Start: Key Concepts

Nano Banaana: A “secret tool” from Google that acts as an an image editor or creator.⁵ It’s part of the Gemini 2.5/Image model and is used to create the initial image for your video.⁶
JSON Prompts: A specific format or “language” that AI models understand very well.⁷ Using JSON prompts for video generation leads to more natural and amazing motion because the AI can respond more effectively, similar to how a person responds better to questions asked in their native language.
Google Flow: The tool used to convert your initial image and JSON prompts into a video. It allows you to create long, continuous videos by chaining together multiple scenes.
VO3 Fast: A video generation model within Google Flow that the speaker recommends using.⁸ It is suitable for creating high-quality, continuous video clips.
Consistent Frames: The key to creating a long video that looks like a single, continuous recording is to save the last frame of each generated scene and use it as the starting image for the next scene. This ensures character consistency, lighting, and camera angle.⁹

Step-by-Step Guide

Step 1: Create Your Video’s Initial Frame

The initial image is crucial as it determines the properties of your entire video, such as the camera perspective and character’s attire.

Action: Go to Google and search for “Google Nano Banaana.” Click the top link, which is likely for Gemini 2.5/Image.
Settings: Ensure the Nano Banaana model is selected in the top-right corner.
Action: Click the plus icon in the search bar to upload a basic image of a person you want to use.
Action: Enter a prompt to set the scene. For example, to create a crowded Indian market scene, the speaker used: “girl posing in Indian crowded market.” You can also specify clothing, such as “yellow kurta.”
Expected Result: Nano Banaana will generate an image of your character in the described setting.¹⁰
Action: Download this image.
Action: To get a “vlog-style” selfie perspective, re-upload the newly generated image and add the prompt “vlog selfie perspective.”
Expected Result: A new image showing the person holding a phone as if recording a vlog. This is your final, initial video frame. Download it and optionally crop out any unwanted parts, like the phone itself.

Step 2: Generate JSON Prompts with ChatGPT

This is where you create the script and motion instructions for your video.

Action: Open ChatGPT (the free version).
Action: Copy the master prompt provided by the speaker (found in the video description or transcript, though a copy is provided below for your convenience). This master prompt is powerful and will generate the JSON prompts for a long-format AI video.
- Master Prompt Structure:
  - Generate a series of JSON prompts to create a long-form AI video from an image.
  - Scene description: [Market] (or another setting like Gym)
  - Dialog: [Your video script goes here]
  - Camera style: [Vlog] (or Video if you want a stationary camera angle)
  - Number of scenes: [Number of 8-second scenes you need]
Action: Edit the blue parts of the prompt to match your video’s requirements.
- Scene Description: Change Market to whatever scene you want, e.g., Gym.
- Dialog: Paste the exact dialogue you want your character to speak.¹¹
- Camera Style: Change Vlog to Video if the character is not holding a phone.
- Number of Scenes: Specify how many 8-second scenes you need. For example, for a 40-second video, enter 5.
Action: Paste the modified master prompt into ChatGPT and click send. Then, upload the image you created in Step 1.
Expected Result: ChatGPT will generate the JSON prompt for your first scene and then ask if you want to continue.
Action: Type “Yes” or “Continue” in the chat to generate the JSON prompts for subsequent scenes. Keep doing this until you have prompts for all your scenes.
Action: Copy the code for the first scene’s JSON prompt.

Step 3: Generate the Video Using Google Flow

This is the final step where you bring all your components together.

Action: Log into your Google AI Pro account and access the Google Flow tool.
Action: Click “Create with Flow” and then “New Project.”
Settings:
- Change the mode from “Text to Video” to “Frames to Video.”
- Ensure Output is set to 1 and Model is set to VO3 Fast.
Action: Attach the initial image you generated in Step 1.
Action: Paste the first JSON prompt you copied from ChatGPT into the prompt area. You can also edit the dialogue here if needed.
Action: Click the generate button to create your first video scene.¹²
Action: Once the first scene is generated, click the Add to Scene button at the top of the timeline. This will switch to the scene builder interface.
Action: Move the timeline cursor to the very end of the first scene. A plus button will appear on the timeline. Hover over it and select “Save frame as asset.” This saves the last frame of the video scene as an image asset.
Action: Attach this new image asset to your timeline.
Action: Go back to ChatGPT, copy the JSON prompt for the second scene, and paste it into the prompt area for the new scene.
Action: Click the generate button again. The second scene will now be created, starting from the last frame of the previous one.
Action: Repeat the process: move the cursor to the end, save the frame as an asset, attach it, and paste the next JSON prompt.
Action: Continue this process for all your scenes.¹³
Expected Result: A continuous, long-form video composed of all your scenes.
Action: To download the final video as a single file, click the download button on the right side of the screen.

Troubleshooting

Inconsistent character or lighting: This usually happens if you don’t save the last frame of one scene and use it as the starting image for the next. Always ensure you are saving the last frame as an asset and attaching it to begin the next scene to maintain consistency.
Video stops at 8 seconds: The default limit for a single scene is 8 seconds. This process is designed to bypass that limit by chaining scenes together. If your video is still stopping, you may not be correctly using the “Frames to Video” and “Save frame as asset” features in Google Flow.
Images are not what you expected: If the generated images from Nano Banaana aren’t accurate, try being more descriptive in your prompts. For example, instead of just “market,” add details like “Indian crowded market.”
ChatGPT isn’t generating JSON prompts correctly: Make sure you are using the exact master prompt provided. Avoid making changes to the black text, as that contains the core JSON structure. Only modify the parts in blue.
Cannot access Google Flow: This tool requires a Google AI Pro account. If you don’t have one, you will need to get access either through a student offer or by using one of the alternative methods mentioned on YouTube.

Pro Tips & Best Practices

Plan your video: Before you start, have a clear idea of the environment, camera angle, and dialogue. This will help you create a more effective initial prompt.
Use the master prompt: The provided master prompt for ChatGPT is highly effective for generating JSON prompts that the AI understands well, leading to better results.
Experiment with prompts: While the master prompt is a great starting point, you can experiment with different scene descriptions and camera angles (e.g., video vs. vlog) to get varied results.
Mind your credits: The video generation process uses up credits. Keep an eye on your usage to ensure you don’t run out.
Check for special offers: The speaker mentioned a limited-time free account offer for students.¹⁴ Always check for current offers to save money on the AI Pro account.

Checklist

Step 1:
- [ ] Open Google Nano Banaana.
- [ ] Upload your person’s image.
- [ ] Add a descriptive scene prompt and generate the image.
- [ ] Download the image.
- [ ] Re-upload the image and add the vlog selfie perspective prompt.
- [ ] Download and crop the final initial frame.
Step 2:
- [ ] Open ChatGPT.
- [ ] Copy and paste the master prompt.
- [ ] Edit the blue sections (Scene Description, Dialog, Camera Style, Number of Scenes).
- [ ] Send the prompt and upload your initial frame.
- [ ] Generate JSON prompts for all scenes by typing “yes” to continue.
- [ ] Copy the JSON code for Scene 1.
Step 3:
- [ ] Log in to Google Flow with your AI Pro account.
- [ ] Create a new project.
- [ ] Set to Frames to Video and use VO3 Fast model.
- [ ] Attach your initial frame.
- [ ] Paste the Scene 1 JSON prompt.
- [ ] Generate the first scene.
- [ ] Move the timeline cursor to the end of the scene.
- [ ] Select Save frame as asset.
- [ ] Attach the new asset to the timeline.
- [ ] Copy the next JSON prompt from ChatGPT and paste it.
- [ ] Repeat until all scenes are generated.
- [ ] Download the final single video file.

FAQs

What is the video limit for this process?There is no video limit. You can create a video of any length, as long as you have enough credits to generate each 8-second scene.
Is this process free?The process uses a mix of free and paid tools. While Nano Banaana and ChatGPT have free versions, Google Flow requires a paid Google AI Pro account.
How can I get a Google AI Pro account for free?The speaker mentioned that as of the video’s date, a one-year free offer is available for university students in India until September 15th. There may also be other methods available on YouTube.
Why do I need to use JSON prompts?JSON is a language that AI models understand very well, leading to better, more natural, and consistent results compared to standard text prompts.15
What if my images are not consistent across scenes?Consistency is achieved by saving the last frame of a scene as an asset and using it as the starting point for the next one. This links the scenes together and maintains the character’s look and surroundings.
Can I change the dialogue for each scene?Yes, you can edit the dialogue in the JSON prompt before you generate each scene.16
What is the purpose of Nano Banaana?Nano Banaana is used to create the initial, high-quality image that will serve as the first frame of your video. It can place your character in a specific environment and add props like a selfie stick.
Can I use a different camera angle?Yes. When generating the JSON prompt, you can change the vlog camera style to video for a more stationary, direct camera view.
Why does my video look choppy?Choppiness can occur if you’re not correctly chaining the scenes. The key is to save and re-use the last frame of the previous scene to ensure a smooth transition.17
Is this process reliable?The speaker claims this process was developed through extensive trial and error and works consistently for creating long, continuous videos.

Glossary

Nano Banaana: A Google image editing and creation tool.¹⁸
JSON: A specific data format or “language” for AI models that helps generate better quality motion in videos.
Google Flow: The primary AI tool used to generate the long-form video by chaining scenes.
VO3 Fast: A specific video generation model within Google Flow.¹⁹
Pro Account: A paid subscription that grants access to advanced Google AI tools.²⁰

Resources & Credits

Channel/Creator: Manav (implied by the speaker’s name “Geeth” being a creation of “Manav”).
Video Title: [[VIDEO TITLE NOT PROVIDED IN TRANSCRIPT]]
Master Prompt: The powerful master prompt for ChatGPT is mentioned as being available in the video description.

Conclusion & Next Steps

By following this detailed guide, you can create a continuous, long-form AI talking video of any length using just a single image and a series of simple steps. The key is to leverage the unique capabilities of Nano Banaana, the structured JSON prompts from ChatGPT, and the scene-chaining feature of Google Flow.

To continue your journey, consider exploring the more advanced features of Nano Banaana, as the speaker mentioned a detailed video on this topic is planned. Additionally, you can experiment with different camera angles, scenes, and dialogues to create more diverse and engaging AI-generated content.

tutorao.com