5 min read 1088 words Updated Mar 17, 2026 Created Mar 17, 2026

AI Image Model Capabilities — OpenRouter

Last updated: 2026-03-17
Source: OpenRouter model pages and documentation
Purpose: Guide for selecting the right model per creative task, particularly for Clarity Diamonds ad production

Key Finding: Text & Logo in Images

All 8 models now support text rendering in images to varying degrees of quality. Several models also accept image inputs (logos, reference photos, brand assets) — meaning you can pass the Clarity Diamonds logo as a reference image and have the model incorporate it naturally into the generated ad creative.

Model Comparison Table

Model	Text Rendering	Image Input (logo/refs)	Best For	Speed	Price Tier
`openai/gpt-5-image`	Excellent	Yes	Complex reasoning, detailed editing, text-heavy ads	Medium	High
`openai/gpt-5-image-mini`	Excellent	Yes	Same as above, faster and cheaper	Fast	Low-Mid
`google/gemini-3-pro-image-preview`	Industry-leading — including long passages and multilingual	Yes	Professional creative, multi-subject, brand identity preservation	Medium	High
`google/gemini-3.1-flash-image-preview`	Good	Yes	Fast iteration, professional output	Fastest	Low
`google/gemini-2.5-flash-image`	Good	Yes	Cost-sensitive bulk generation	Fast	Lowest
`bytedance-seed/seedream-4.5`	Improved (esp. small text)	Yes	Portrait/lifestyle, colour/lighting preservation	Medium	Very Low (flat $0.04/image)
`sourceful/riverflow-v2-fast`	Good + custom font inputs	Yes (URLs preferred)	Production speed, custom typography	Fastest	Mid
`black-forest-labs/flux.2-flex`	Excellent — complex typography	Yes	Typography-heavy creative, fine detail	Medium	Mid-High

Detailed Model Profiles

GPT-5 Image

Text: Superior instruction following and text rendering. Handles detailed copy, pricing, CTAs reliably.
Image input: Yes — accepts logo files and reference images. Can incorporate brand assets naturally.
Sizes: 10 standard + 4 extended aspect ratios; 1K, 2K, 4K resolution
Best for: Ads with copy baked in, logo placement, complex compositions
Pricing: $10/M input, $40/M image output
Context: 400K tokens

GPT-5 Image Mini

Text: Same quality as GPT-5 Image at 4× lower cost
Image input: Yes — same capabilities as full GPT-5 Image
Sizes: Same as GPT-5 Image
Best for: Most production work — same quality, better economics
Pricing: $2.50/M input, $8/M image output
Context: 400K tokens

Gemini 3 Pro Image (Nano Banana Pro)

Text: Industry-leading — best in class for long text, multilingual, detailed layout
Image input: Yes — multimodal reasoning, identity preservation for up to 5 subjects. Ideal for passing logo + product shot and asking it to compose a complete ad.
Sizes: 2K/4K, flexible aspect ratios
Best for: Final production ads needing precise text and logo integration, consistent brand identity
Pricing: $2/M input, $12/M output
Context: 65K tokens

Gemini 3.1 Flash Image (Nano Banana 2)

Text: Good — handles single headlines and short copy well
Image input: Yes — accepts images and text
Sizes: 0.5K to 4K; customisable aspect ratios via image_config
Best for: Fast iteration and testing, professional output at low cost
Pricing: $0.50/M input, $3/M output, $60/M image output
Released: February 2026 (newest Gemini image model)

Gemini 2.5 Flash Image (Nano Banana)

Text: Good — standard text rendering
Image input: Yes
Sizes: Customisable aspect ratios
Best for: High-volume or budget-sensitive generation
Pricing: $0.30/M input, $2.50/M output — cheapest option
Context: 32K tokens

Seedream 4.5

Text: Improved — particularly good at small text rendering (improved over v4.0)
Image input: Yes — editing consistency, preserves subject details, lighting, colour tone
Sizes: Variable
Best for: Lifestyle/portrait imagery, colour-accurate product editing, preserving brand identity across variations
Pricing: Flat $0.04 per output image — simplest pricing, great for volume
Context: 4K tokens

Riverflow V2 Fast

Text: Good — integrated reasoning for text accuracy. Supports custom font inputs ($0.03 each, max 2 fonts) — you can specify Inter or Montserrat exactly
Image input: Yes — recommends image URLs rather than base64. Also supports super-resolution references ($0.20 each, max 4) to enhance specific elements.
Sizes: 1K and 2K — no 4K support
Best for: Production-speed generation with brand-specific typography
Pricing: $0.02/image (1K), $0.04/image (2K)
Limitation: 4.5MB request size limit; no 4K
Released: February 2026

FLUX.2 Flex

Text: Excellent — best for complex typography and fine detail rendering
Image input: Yes — multi-reference editing in a unified architecture (pass multiple reference images in one request)
Sizes: Flexible aspect ratios, megapixel-based pricing
Best for: Typography-driven creatives, ads where the headline IS the visual, multi-reference compositions
Pricing: $0.06/megapixel (input + output combined)
Note: Does not use submissions for model training. Retains prompts for 30 days only.

Recommendations for Clarity Diamonds Ad Production

For ads with copy baked in (headlines, pricing, CTA)

Best: `google/gemini-3-pro-image-preview` — industry-leading text, accepts logo as input
Good: `black-forest-labs/flux.2-flex` — excellent typography
Budget: `openai/gpt-5-image-mini` — reliable text at lower cost

For passing the Clarity logo as a reference

All 8 models accept image inputs. The recommended workflow:

Pass clarity_logo.png as an image input in the message
Describe placement: "include the Clarity Diamonds logo in the bottom-right corner"
Gemini 3 Pro handles this most reliably (identity preservation up to 5 subjects)

For lifestyle/warmth (Ad 4A, ring on hand)

Best: `bytedance-seed/seedream-4.5` — excellent portrait refinement, colour warmth

For fast iteration and A/B testing

Best: `google/gemini-3.1-flash-image-preview` — fastest, still professional quality

For carousel card production (volume)

Best: `bytedance-seed/seedream-4.5` — flat $0.04/image, good consistency across variations

For custom brand typography (Inter/Montserrat)

Best: `sourceful/riverflow-v2-fast` — only model supporting custom font file inputs

How to Pass a Logo/Reference Image via OpenRouter API

curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-pro-image-preview",
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<BASE64_OF_LOGO>"
          }
        },
        {
          "type": "text",
          "text": "Create a luxury jewellery advertisement. Use the provided logo in the bottom-right corner of the image. [REST OF PROMPT]"
        }
      ]
    }],
    "modalities": ["image"]
  }'

To encode the logo: base64 -i clarity_logo.png | tr -d '\n'

Reference: https://openrouter.ai/docs/guides/overview/multimodal/image-generation