AI Image Model Capabilities — OpenRouter
Last updated: 2026-03-17
Source: OpenRouter model pages and documentation
Purpose: Guide for selecting the right model per creative task, particularly for Clarity Diamonds ad production
Key Finding: Text & Logo in Images
All 8 models now support text rendering in images to varying degrees of quality. Several models also accept image inputs (logos, reference photos, brand assets) — meaning you can pass the Clarity Diamonds logo as a reference image and have the model incorporate it naturally into the generated ad creative.
Model Comparison Table
| Model | Text Rendering | Image Input (logo/refs) | Best For | Speed | Price Tier |
|---|---|---|---|---|---|
| `openai/gpt-5-image` | Excellent | Yes | Complex reasoning, detailed editing, text-heavy ads | Medium | High |
| `openai/gpt-5-image-mini` | Excellent | Yes | Same as above, faster and cheaper | Fast | Low-Mid |
| `google/gemini-3-pro-image-preview` | Industry-leading — including long passages and multilingual | Yes | Professional creative, multi-subject, brand identity preservation | Medium | High |
| `google/gemini-3.1-flash-image-preview` | Good | Yes | Fast iteration, professional output | Fastest | Low |
| `google/gemini-2.5-flash-image` | Good | Yes | Cost-sensitive bulk generation | Fast | Lowest |
| `bytedance-seed/seedream-4.5` | Improved (esp. small text) | Yes | Portrait/lifestyle, colour/lighting preservation | Medium | Very Low (flat $0.04/image) |
| `sourceful/riverflow-v2-fast` | Good + custom font inputs | Yes (URLs preferred) | Production speed, custom typography | Fastest | Mid |
| `black-forest-labs/flux.2-flex` | Excellent — complex typography | Yes | Typography-heavy creative, fine detail | Medium | Mid-High |
Detailed Model Profiles
GPT-5 Image
- Text: Superior instruction following and text rendering. Handles detailed copy, pricing, CTAs reliably.
- Image input: Yes — accepts logo files and reference images. Can incorporate brand assets naturally.
- Sizes: 10 standard + 4 extended aspect ratios; 1K, 2K, 4K resolution
- Best for: Ads with copy baked in, logo placement, complex compositions
- Pricing: $10/M input, $40/M image output
- Context: 400K tokens
GPT-5 Image Mini
- Text: Same quality as GPT-5 Image at 4× lower cost
- Image input: Yes — same capabilities as full GPT-5 Image
- Sizes: Same as GPT-5 Image
- Best for: Most production work — same quality, better economics
- Pricing: $2.50/M input, $8/M image output
- Context: 400K tokens
Gemini 3 Pro Image (Nano Banana Pro)
- Text: Industry-leading — best in class for long text, multilingual, detailed layout
- Image input: Yes — multimodal reasoning, identity preservation for up to 5 subjects. Ideal for passing logo + product shot and asking it to compose a complete ad.
- Sizes: 2K/4K, flexible aspect ratios
- Best for: Final production ads needing precise text and logo integration, consistent brand identity
- Pricing: $2/M input, $12/M output
- Context: 65K tokens
Gemini 3.1 Flash Image (Nano Banana 2)
- Text: Good — handles single headlines and short copy well
- Image input: Yes — accepts images and text
- Sizes: 0.5K to 4K; customisable aspect ratios via
image_config - Best for: Fast iteration and testing, professional output at low cost
- Pricing: $0.50/M input, $3/M output, $60/M image output
- Released: February 2026 (newest Gemini image model)
Gemini 2.5 Flash Image (Nano Banana)
- Text: Good — standard text rendering
- Image input: Yes
- Sizes: Customisable aspect ratios
- Best for: High-volume or budget-sensitive generation
- Pricing: $0.30/M input, $2.50/M output — cheapest option
- Context: 32K tokens
Seedream 4.5
- Text: Improved — particularly good at small text rendering (improved over v4.0)
- Image input: Yes — editing consistency, preserves subject details, lighting, colour tone
- Sizes: Variable
- Best for: Lifestyle/portrait imagery, colour-accurate product editing, preserving brand identity across variations
- Pricing: Flat $0.04 per output image — simplest pricing, great for volume
- Context: 4K tokens
Riverflow V2 Fast
- Text: Good — integrated reasoning for text accuracy. Supports custom font inputs ($0.03 each, max 2 fonts) — you can specify Inter or Montserrat exactly
- Image input: Yes — recommends image URLs rather than base64. Also supports super-resolution references ($0.20 each, max 4) to enhance specific elements.
- Sizes: 1K and 2K — no 4K support
- Best for: Production-speed generation with brand-specific typography
- Pricing: $0.02/image (1K), $0.04/image (2K)
- Limitation: 4.5MB request size limit; no 4K
- Released: February 2026
FLUX.2 Flex
- Text: Excellent — best for complex typography and fine detail rendering
- Image input: Yes — multi-reference editing in a unified architecture (pass multiple reference images in one request)
- Sizes: Flexible aspect ratios, megapixel-based pricing
- Best for: Typography-driven creatives, ads where the headline IS the visual, multi-reference compositions
- Pricing: $0.06/megapixel (input + output combined)
- Note: Does not use submissions for model training. Retains prompts for 30 days only.
Recommendations for Clarity Diamonds Ad Production
For ads with copy baked in (headlines, pricing, CTA)
Best: `google/gemini-3-pro-image-preview` — industry-leading text, accepts logo as input
Good: `black-forest-labs/flux.2-flex` — excellent typography
Budget: `openai/gpt-5-image-mini` — reliable text at lower cost
For passing the Clarity logo as a reference
All 8 models accept image inputs. The recommended workflow:
- Pass
clarity_logo.pngas an image input in the message - Describe placement: "include the Clarity Diamonds logo in the bottom-right corner"
- Gemini 3 Pro handles this most reliably (identity preservation up to 5 subjects)
For lifestyle/warmth (Ad 4A, ring on hand)
Best: `bytedance-seed/seedream-4.5` — excellent portrait refinement, colour warmth
For fast iteration and A/B testing
Best: `google/gemini-3.1-flash-image-preview` — fastest, still professional quality
For carousel card production (volume)
Best: `bytedance-seed/seedream-4.5` — flat $0.04/image, good consistency across variations
For custom brand typography (Inter/Montserrat)
Best: `sourceful/riverflow-v2-fast` — only model supporting custom font file inputs
How to Pass a Logo/Reference Image via OpenRouter API
curl -s https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-3-pro-image-preview",
"messages": [{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,<BASE64_OF_LOGO>"
}
},
{
"type": "text",
"text": "Create a luxury jewellery advertisement. Use the provided logo in the bottom-right corner of the image. [REST OF PROMPT]"
}
]
}],
"modalities": ["image"]
}'
To encode the logo: base64 -i clarity_logo.png | tr -d '\n'
Reference: https://openrouter.ai/docs/guides/overview/multimodal/image-generation