Configure and use Google Gemini models through InferXgate.
InferXgate supports Google’s Gemini models via the Gemini API, including the latest Gemini 3 Pro.
Configuration
GEMINI_API_KEY=your-gemini-api-key
Available Models
Gemini 3 Series
| Model ID | Description | Context Window |
|---|
gemini-3-pro-preview | Best multimodal understanding | 1M |
gemini-3-pro-image-preview | Optimized for image tasks | 1M |
Gemini 2.5 Series
| Model ID | Description | Context Window |
|---|
gemini-2.5-pro | Advanced reasoning | 1M |
gemini-2.5-flash | Fast and efficient | 1M |
gemini-2.5-flash-lite | Lightweight flash model | 1M |
gemini-2.5-flash-image | Image-optimized flash | 1M |
Gemini 2.0 Series
| Model ID | Description | Context Window |
|---|
gemini-2.0-flash | Fast responses | 1M |
gemini-2.0-flash-lite | Lightweight model | 1M |
Usage Example
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="gemini-3-pro-preview",
messages=[
{"role": "user", "content": "Explain neural networks."}
]
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Write a story."}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Supported Features
- Chat completions
- Streaming
- Multi-turn conversations
- Long context (up to 1M tokens)
- Multimodal (images, video, audio, PDF)
Pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|
| Gemini 3 Pro | $1.25 | $5.00 |
| Gemini 2.5 Pro | $1.25 | $5.00 |
| Gemini 2.5 Flash | $0.075 | $0.30 |
| Gemini 2.5 Flash Lite | $0.075 | $0.30 |
| Gemini 2.0 Flash | $0.10 | $0.40 |
Best Practices
- Use Gemini 3 Pro for quality - Best multimodal understanding available
- Use Flash for speed - Best latency for real-time apps
- Leverage long context - Gemini excels at long documents
- Use multimodal inputs - Text, images, video, audio, and PDFs supported