Google Gemini

Configure and use Google Gemini models through InferXgate.

InferXgate supports Google’s Gemini models via the Gemini API, including the latest Gemini 3 Pro.

Configuration

GEMINI_API_KEY=your-gemini-api-key

Available Models

Gemini 3 Series

Model ID	Description	Context Window
`gemini-3-pro-preview`	Best multimodal understanding	1M
`gemini-3-pro-image-preview`	Optimized for image tasks	1M

Gemini 2.5 Series

Model ID	Description	Context Window
`gemini-2.5-pro`	Advanced reasoning	1M
`gemini-2.5-flash`	Fast and efficient	1M
`gemini-2.5-flash-lite`	Lightweight flash model	1M
`gemini-2.5-flash-image`	Image-optimized flash	1M

Gemini 2.0 Series

Model ID	Description	Context Window
`gemini-2.0-flash`	Fast responses	1M
`gemini-2.0-flash-lite`	Lightweight model	1M

Usage Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gemini-3-pro-preview",
    messages=[
        {"role": "user", "content": "Explain neural networks."}
    ]
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Write a story."}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Supported Features

Chat completions
Streaming
Multi-turn conversations
Long context (up to 1M tokens)
Multimodal (images, video, audio, PDF)

Pricing

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 3 Pro	$1.25	$5.00
Gemini 2.5 Pro	$1.25	$5.00
Gemini 2.5 Flash	$0.075	$0.30
Gemini 2.5 Flash Lite	$0.075	$0.30
Gemini 2.0 Flash	$0.10	$0.40

Best Practices

Use Gemini 3 Pro for quality - Best multimodal understanding available
Use Flash for speed - Best latency for real-time apps
Leverage long context - Gemini excels at long documents
Use multimodal inputs - Text, images, video, audio, and PDFs supported