5 min read 1088 words Updated Mar 17, 2026 Created Mar 17, 2026

AI Image Model Capabilities — OpenRouter

Last updated: 2026-03-17
Source: OpenRouter model pages and documentation
Purpose: Guide for selecting the right model per creative task, particularly for Clarity Diamonds ad production


Key Finding: Text & Logo in Images

All 8 models now support text rendering in images to varying degrees of quality. Several models also accept image inputs (logos, reference photos, brand assets) — meaning you can pass the Clarity Diamonds logo as a reference image and have the model incorporate it naturally into the generated ad creative.


Model Comparison Table

ModelText RenderingImage Input (logo/refs)Best ForSpeedPrice Tier
`openai/gpt-5-image`ExcellentYesComplex reasoning, detailed editing, text-heavy adsMediumHigh
`openai/gpt-5-image-mini`ExcellentYesSame as above, faster and cheaperFastLow-Mid
`google/gemini-3-pro-image-preview`Industry-leading — including long passages and multilingualYesProfessional creative, multi-subject, brand identity preservationMediumHigh
`google/gemini-3.1-flash-image-preview`GoodYesFast iteration, professional outputFastestLow
`google/gemini-2.5-flash-image`GoodYesCost-sensitive bulk generationFastLowest
`bytedance-seed/seedream-4.5`Improved (esp. small text)YesPortrait/lifestyle, colour/lighting preservationMediumVery Low (flat $0.04/image)
`sourceful/riverflow-v2-fast`Good + custom font inputsYes (URLs preferred)Production speed, custom typographyFastestMid
`black-forest-labs/flux.2-flex`Excellent — complex typographyYesTypography-heavy creative, fine detailMediumMid-High

Detailed Model Profiles

GPT-5 Image

  • Text: Superior instruction following and text rendering. Handles detailed copy, pricing, CTAs reliably.
  • Image input: Yes — accepts logo files and reference images. Can incorporate brand assets naturally.
  • Sizes: 10 standard + 4 extended aspect ratios; 1K, 2K, 4K resolution
  • Best for: Ads with copy baked in, logo placement, complex compositions
  • Pricing: $10/M input, $40/M image output
  • Context: 400K tokens

GPT-5 Image Mini

  • Text: Same quality as GPT-5 Image at 4× lower cost
  • Image input: Yes — same capabilities as full GPT-5 Image
  • Sizes: Same as GPT-5 Image
  • Best for: Most production work — same quality, better economics
  • Pricing: $2.50/M input, $8/M image output
  • Context: 400K tokens

Gemini 3 Pro Image (Nano Banana Pro)

  • Text: Industry-leading — best in class for long text, multilingual, detailed layout
  • Image input: Yes — multimodal reasoning, identity preservation for up to 5 subjects. Ideal for passing logo + product shot and asking it to compose a complete ad.
  • Sizes: 2K/4K, flexible aspect ratios
  • Best for: Final production ads needing precise text and logo integration, consistent brand identity
  • Pricing: $2/M input, $12/M output
  • Context: 65K tokens

Gemini 3.1 Flash Image (Nano Banana 2)

  • Text: Good — handles single headlines and short copy well
  • Image input: Yes — accepts images and text
  • Sizes: 0.5K to 4K; customisable aspect ratios via image_config
  • Best for: Fast iteration and testing, professional output at low cost
  • Pricing: $0.50/M input, $3/M output, $60/M image output
  • Released: February 2026 (newest Gemini image model)

Gemini 2.5 Flash Image (Nano Banana)

  • Text: Good — standard text rendering
  • Image input: Yes
  • Sizes: Customisable aspect ratios
  • Best for: High-volume or budget-sensitive generation
  • Pricing: $0.30/M input, $2.50/M output — cheapest option
  • Context: 32K tokens

Seedream 4.5

  • Text: Improved — particularly good at small text rendering (improved over v4.0)
  • Image input: Yes — editing consistency, preserves subject details, lighting, colour tone
  • Sizes: Variable
  • Best for: Lifestyle/portrait imagery, colour-accurate product editing, preserving brand identity across variations
  • Pricing: Flat $0.04 per output image — simplest pricing, great for volume
  • Context: 4K tokens

Riverflow V2 Fast

  • Text: Good — integrated reasoning for text accuracy. Supports custom font inputs ($0.03 each, max 2 fonts) — you can specify Inter or Montserrat exactly
  • Image input: Yes — recommends image URLs rather than base64. Also supports super-resolution references ($0.20 each, max 4) to enhance specific elements.
  • Sizes: 1K and 2K — no 4K support
  • Best for: Production-speed generation with brand-specific typography
  • Pricing: $0.02/image (1K), $0.04/image (2K)
  • Limitation: 4.5MB request size limit; no 4K
  • Released: February 2026

FLUX.2 Flex

  • Text: Excellent — best for complex typography and fine detail rendering
  • Image input: Yes — multi-reference editing in a unified architecture (pass multiple reference images in one request)
  • Sizes: Flexible aspect ratios, megapixel-based pricing
  • Best for: Typography-driven creatives, ads where the headline IS the visual, multi-reference compositions
  • Pricing: $0.06/megapixel (input + output combined)
  • Note: Does not use submissions for model training. Retains prompts for 30 days only.

Recommendations for Clarity Diamonds Ad Production

For ads with copy baked in (headlines, pricing, CTA)

Best: `google/gemini-3-pro-image-preview` — industry-leading text, accepts logo as input
Good: `black-forest-labs/flux.2-flex` — excellent typography
Budget: `openai/gpt-5-image-mini` — reliable text at lower cost

For passing the Clarity logo as a reference

All 8 models accept image inputs. The recommended workflow:

  1. Pass clarity_logo.png as an image input in the message
  2. Describe placement: "include the Clarity Diamonds logo in the bottom-right corner"
  3. Gemini 3 Pro handles this most reliably (identity preservation up to 5 subjects)

For lifestyle/warmth (Ad 4A, ring on hand)

Best: `bytedance-seed/seedream-4.5` — excellent portrait refinement, colour warmth

For fast iteration and A/B testing

Best: `google/gemini-3.1-flash-image-preview` — fastest, still professional quality

Best: `bytedance-seed/seedream-4.5` — flat $0.04/image, good consistency across variations

For custom brand typography (Inter/Montserrat)

Best: `sourceful/riverflow-v2-fast` — only model supporting custom font file inputs


How to Pass a Logo/Reference Image via OpenRouter API

curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3-pro-image-preview",
    "messages": [{
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<BASE64_OF_LOGO>"
          }
        },
        {
          "type": "text",
          "text": "Create a luxury jewellery advertisement. Use the provided logo in the bottom-right corner of the image. [REST OF PROMPT]"
        }
      ]
    }],
    "modalities": ["image"]
  }'

To encode the logo: base64 -i clarity_logo.png | tr -d '\n'


Reference: https://openrouter.ai/docs/guides/overview/multimodal/image-generation