Anthropic Claude

Configure and use Anthropic Claude models through InferXgate.

InferXgate provides full support for Anthropic’s Claude models, including the latest Claude 4 and Claude 3.5 series.

Configuration

Add your Anthropic API key to the environment:

ANTHROPIC_API_KEY=sk-ant-api03-...

Available Models

Claude 4 Series

Model ID	Description	Context Window
`claude-opus-4-5-20251101`	Most capable, extended thinking	200K
`claude-sonnet-4-5-20250929`	Advanced performance and speed	200K
`claude-opus-4-1-20250414`	Previous flagship model	200K
`claude-sonnet-4-20250514`	Balanced Claude 4	200K
`claude-opus-4-20250514`	Claude 4 base	200K

Claude 3.5 & 3 Series

Model ID	Description	Context Window
`claude-3-5-haiku-20241022`	Fast and efficient	200K
`claude-3-haiku-20240307`	Legacy fast model	200K

Usage Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="claude-opus-4-5-20251101",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."}
    ],
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Write a poem."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported Features

Chat completions
Streaming responses
System messages
Multi-turn conversations
Tool/function calling
Vision (image inputs)
Extended thinking (Opus 4.5)

Pricing

Costs are passed through from Anthropic:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Opus 4.5	$15.00	$75.00
Claude Sonnet 4.5	$3.00	$15.00
Claude Haiku 4.5	$0.80	$4.00
Claude Opus 4.1	$15.00	$75.00
Claude Sonnet 4	$3.00	$15.00
Claude Opus 4	$15.00	$75.00
Claude 3.5 Haiku	$0.80	$4.00
Claude 3 Haiku	$0.25	$1.25

Best Practices

Use Haiku for simple tasks - Save costs on classification, extraction
Use Opus 4.5 for complex reasoning - Best for analysis, coding, writing with extended thinking
Set appropriate max_tokens - Avoid unnecessary token usage
Enable caching - Reduce costs for repeated queries