Creating precise, high-quality images often involves endless trial and error. You need a model that actually understands what you're asking for.
Built on the Gemini 3 family of models, Nano Banana 2 applies deep reasoning capabilities to fully understand your prompt before generating an image. This guide shares the exact prompting strategies and frameworks that deliver the best results.
✨ Try Nano Banana 2 for free — start creating at yesnanobanana2.com
Model Overview
Nano Banana models are advanced image generation and editing models that use real-world knowledge and deep reasoning capabilities to deliver precise, rich visual results. Nano Banana 2 shines in three key areas:
- More accurate visuals: Nano Banana 2 is powered by real-time information and images from web search. This means better educational tools, localized marketing, travel apps, and more.
- Fast, Pro-level features: Premium features are now unlocked — from text rendering and translations, to upscaling to 2K/4K. Creative teams can build cohesive narratives, storyboards, and product mockups.
- Precision control: Generate or edit images to fit any project requirement, with native support for 16:9, 9:16, 2:1, and more. Expect vibrant lighting and richer textures, whether you're generating posters, marketing mockups, or ads.
Technical Specs Breakdown
Before diving into prompting, here is a breakdown of what the models can handle:
Context Windows
- Nano Banana 2 (Gemini 3.1 Flash Image): max 131,072 input tokens
- Nano Banana Pro (Gemini 3 Pro Image): max 65,536 input tokens
- Both models support a maximum of 32,768 output tokens
Resolutions
Both models support built-in generation at 1K, 2K, and 4K. Nano Banana 2 adds the smaller 512px (0.5K) resolution option.
Aspect Ratios
Both models support: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9.
Nano Banana 2 also adds: 1:4, 4:1, 1:8, and 8:1 aspect ratios.
Image Inputs
You can mix up to 14 reference object images in a single prompt. Supported formats include PNG, JPEG, WebP, HEIC, and HEIF.
Trust & Safety
All generated images include C2PA Content Credentials and a SynthID watermark.
Best Practices for Effective Prompting
Here are the core guidelines for getting the visuals you actually asked for:
- Be specific — Provide concrete details on subject, lighting, and composition.
- Use positive framing — Describe what you want, not what you don't want (e.g. "empty street" instead of "no cars").
- Control the camera — Use photographic and cinematic terms like "low angle" and "aerial view".
- Iterate — Refine images with follow-up prompts in a conversational manner.
The key is to start a prompt with a strong verb that tells the model the primary operation you want to perform.
Five Prompting Frameworks
1. Image Generation
Your prompt structure depends on whether you are using reference images or relying solely on text.
Text-to-image generation (without references)
When starting with a blank canvas, you are the director. A simple list of keywords won't cut it — you need to describe the scene narratively.
Formula: [Subject] + [Action] + [Location/context] + [Composition] + [Style]
Example prompt:
"[Subject] A striking fashion model wearing a tailored brown dress, sleek boots, and holding a structured handbag. [Action] Posing with a confident, statuesque stance, slightly turned. [Location/context] A seamless, deep cherry red studio backdrop. [Composition] Medium-full shot, center-framed. [Style] Fashion magazine style editorial, shot on medium-format analog film, pronounced grain, high saturation, cinematic lighting effect."
Multimodal generation (with references)
Nano Banana 2 allows you to combine multiple reference images to guide the final output. This is perfect for maintaining character consistency or merging a specific product into a new environment.
Formula: [Reference images] + [Relationship instruction] + [New scenario]
Example prompt:
"Using the attached napkin sketch as the structure and the attached fabric sample as the texture [References], transform this into a high-fidelity 3D armchair render [Relationship]. Place it in a sun-drenched, minimalist living room [New Scenario]."
2. Image Editing
Editing requires a different mindset than generating. You already have a base image; your prompt needs to focus on what is changing and what is staying the same.
Conversational editing (without new references)
When you generate an image and want to tweak it conversationally:
- Semantic masking (inpainting): You can define a "mask" through text to edit a specific part of an image while leaving the rest untouched.
- Prompting tip: Be explicit about what to keep exactly the same.
Example prompt: "Remove the man from the photo"
Composition and style transfer (with new references)
Bring new images into the prompt to alter an existing one:
- Adding elements: Upload a base image and an object image, and tell the model to combine them.
- Style transfer: Upload a photo and ask the model to recreate its exact content in a different artistic style, such as transforming a photo of a modern city street into a Van Gogh-style painting.
3. Real-Time Information from Web Search
Nano Banana 2 can actively search the web to generate images based on real-time information.
Instead of describing a fictional scene, you instruct the model to retrieve real-world data and then specify how to visualize it.
Formula: [Source/Search request] + [Analytical task] + [Visual translation]
Example prompt:
"[Search for current weather and date in San Francisco] + [Use this data to modify the scene — if raining, make it look grey and rainy] + [Visualize this in a miniature city-in-a-cup concept embedded within a realistic, modern smartphone UI]."
This feature helps teams create more accurate, contextually relevant visuals.
4. Text Rendering & Localization
Nano Banana 2 and Nano Banana Pro excel at rendering sharp, legible text for impactful posters, diagrams, and product mockups. They also support state-of-the-art multilingual text generation in over 10 languages.
To get the best typographic results, follow these rules:
- Use quotes: Enclose your desired words in quotes (e.g., "Happy Birthday" or "URBAN EXPLORER").
- Choose a font: Describe the typography style or name the font. Prompt for a "bold, white, sans-serif font" or "Century Gothic 12px font".
- Translate and localize: Write your prompt in one language and specify a target language for the text output.
- Text-first hack: When generating text for an image, first converse with the model to generate the text concepts, and then ask for an image with that text.
Example prompt:
"A high-end, glossy commercial beauty shot of a sleek, minimalist nude-colored face moisturizer jar resting on a warm studio background. The lighting is soft and radiant. Next to the product, render three lines of text with the following exact styling: For the top line, the word 'GLOW' in a flowing, elegant Brush Script font. For the middle line, the text '10% OFF' in a heavy, blocky Impact font. For the bottom line, the text 'Your First Order' in a thin, minimalist Century Gothic font. Then translate the text into Korean and Arabic."
5. Prompting Like a Creative Director
To elevate your results from good to breathtaking, stop typing keywords and start directing the scene. Here is how:
Design your lighting
Tell the model exactly how the scene is illuminated.
- Studio setups: Ask for a "three-point softbox setup" to evenly light a product.
- Dramatic effects: Prompt for "Chiaroscuro lighting with harsh, high contrast" or "Golden hour backlighting creating long shadows".
Choose your camera, lens, and focus
Use specific hardware and photographic terminology to control the depth, distortion, and perspective of your shot.
- Hardware: Dictate the exact camera type to change the visual DNA of the image. Ask for a shot on a GoPro for an immersive, distorted action feel, a Fujifilm camera for authentic color science, or a cheap disposable camera for a raw, nostalgic flash aesthetic.
- Lens: Force the perspective by requesting a "low-angle shot with a shallow depth of field (f/1.8)". For vast scale, ask for a "wide-angle lens". For intricate details, specify a "macro lens".
Define the color grading and film stock
The texture and color of the final image set the emotional tone.
- For a nostalgic or gritty vibe: "Render as if on 1980s color film, slightly grainy"
- For a modern, moody aesthetic: "Cinematic color grading with muted teal tones"
Emphasize materiality and texture
When generating logos, products, or characters, define their physical makeup. Don't just ask for a suit jacket — ask for "navy blue tweed". Instead of "armor," describe "ornate elven plate armor, etched with silver leaf patterns." For mockups, specify the surface, like a "minimalist ceramic coffee mug."
Go Further: Multi-Model Workflows
Nano Banana 2 is designed to work seamlessly with other generative creation models:
- Nano Banana + Gemini: Gemini 3 can help you create prompts and provide creative direction.
- Nano Banana + Veo: Create keyframes with Nano Banana to direct an animation, then use Veo to generate the video between them.
- Nano Banana + Veo + Lyria: Generate your project's visuals, then add a custom AI soundtrack with Lyria.
Start Creating with Nano Banana 2
Ready to put these five prompting frameworks into practice? Nano Banana 2 is available now — generate professional-quality images in seconds with the power of Gemini 3.1 Flash Image.

