Anthropic Claude

Configure and use Anthropic Claude models through InferXgate.

InferXgate provides full support for Anthropic’s Claude models, including the latest Claude 4 and Claude 3.5 series.

Configuration

Add your Anthropic API key to the environment:

ANTHROPIC_API_KEY=sk-ant-api03-...

Available Models

Claude 4 Series

Model IDDescriptionContext Window
claude-opus-4-5-20251101Most capable, extended thinking200K
claude-sonnet-4-5-20250929Advanced performance and speed200K
claude-opus-4-1-20250414Previous flagship model200K
claude-sonnet-4-20250514Balanced Claude 4200K
claude-opus-4-20250514Claude 4 base200K

Claude 3.5 & 3 Series

Model IDDescriptionContext Window
claude-3-5-haiku-20241022Fast and efficient200K
claude-3-haiku-20240307Legacy fast model200K

Usage Example

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="claude-opus-4-5-20251101",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing."}
    ],
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Write a poem."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported Features

  • Chat completions
  • Streaming responses
  • System messages
  • Multi-turn conversations
  • Tool/function calling
  • Vision (image inputs)
  • Extended thinking (Opus 4.5)

Pricing

Costs are passed through from Anthropic:

ModelInput (per 1M tokens)Output (per 1M tokens)
Claude Opus 4.5$15.00$75.00
Claude Sonnet 4.5$3.00$15.00
Claude Haiku 4.5$0.80$4.00
Claude Opus 4.1$15.00$75.00
Claude Sonnet 4$3.00$15.00
Claude Opus 4$15.00$75.00
Claude 3.5 Haiku$0.80$4.00
Claude 3 Haiku$0.25$1.25

Best Practices

  1. Use Haiku for simple tasks - Save costs on classification, extraction
  2. Use Opus 4.5 for complex reasoning - Best for analysis, coding, writing with extended thinking
  3. Set appropriate max_tokens - Avoid unnecessary token usage
  4. Enable caching - Reduce costs for repeated queries