Google Gemini

Configure and use Google Gemini models through InferXgate.

InferXgate supports Google’s Gemini models via the Gemini API, including the latest Gemini 3 Pro.

Configuration

GEMINI_API_KEY=your-gemini-api-key

Available Models

Gemini 3 Series

Model IDDescriptionContext Window
gemini-3-pro-previewBest multimodal understanding1M
gemini-3-pro-image-previewOptimized for image tasks1M

Gemini 2.5 Series

Model IDDescriptionContext Window
gemini-2.5-proAdvanced reasoning1M
gemini-2.5-flashFast and efficient1M
gemini-2.5-flash-liteLightweight flash model1M
gemini-2.5-flash-imageImage-optimized flash1M

Gemini 2.0 Series

Model IDDescriptionContext Window
gemini-2.0-flashFast responses1M
gemini-2.0-flash-liteLightweight model1M

Usage Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gemini-3-pro-preview",
    messages=[
        {"role": "user", "content": "Explain neural networks."}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Write a story."}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Supported Features

  • Chat completions
  • Streaming
  • Multi-turn conversations
  • Long context (up to 1M tokens)
  • Multimodal (images, video, audio, PDF)

Pricing

ModelInput (per 1M tokens)Output (per 1M tokens)
Gemini 3 Pro$1.25$5.00
Gemini 2.5 Pro$1.25$5.00
Gemini 2.5 Flash$0.075$0.30
Gemini 2.5 Flash Lite$0.075$0.30
Gemini 2.0 Flash$0.10$0.40

Best Practices

  1. Use Gemini 3 Pro for quality - Best multimodal understanding available
  2. Use Flash for speed - Best latency for real-time apps
  3. Leverage long context - Gemini excels at long documents
  4. Use multimodal inputs - Text, images, video, audio, and PDFs supported