Caching

Configure Redis caching to reduce costs and improve response times.

InferXgate’s intelligent caching can reduce your LLM API costs by 60-90%.

How It Works

Request comes in with a prompt
InferXgate generates a cache key from the request
If cached, returns immediately (cache hit)
If not cached, forwards to provider and caches response

Configuration

# Redis connection
REDIS_URL=redis://localhost:6379

# Enable caching (default: true)
ENABLE_CACHING=true

# Cache TTL in seconds (default: 3600 = 1 hour)
CACHE_TTL_SECONDS=3600

# Max cache size in MB (default: 1024)
CACHE_MAX_SIZE_MB=1024

Cache Key Generation

Cache keys are generated from:

Model name
Messages content
Temperature
Max tokens
Other parameters

Requests with identical parameters return cached responses.

Cache Headers

Response headers indicate cache status:

X-Cache: HIT          # Served from cache
X-Cache: MISS         # Fetched from provider
X-Cache-TTL: 3540     # Seconds until expiry

Bypassing Cache

Force a fresh response:

response = client.chat.completions.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Cache-Control": "no-cache"}
)

Cache Statistics

View cache metrics:

curl http://localhost:3000/stats

Response:

{
  "cache": {
    "hits": 8500,
    "misses": 1500,
    "hit_rate": 0.85,
    "size_bytes": 104857600,
    "entries": 5000
  }
}

Redis Setup

Docker

services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes

Redis Cluster

For high availability:

REDIS_URL=redis://node1:6379,node2:6379,node3:6379
REDIS_CLUSTER=true

Best Practices

Set appropriate TTL - Balance freshness vs. cost savings
Monitor hit rate - Aim for 60%+ hit rate
Size cache appropriately - Enough for your working set
Use Redis persistence - Preserve cache across restarts

Cost Savings Example

Metric	Without Cache	With Cache (80% hit)
Requests	10,000	10,000
API Calls	10,000	2,000
Cost (at $0.01/call)	$100	$20
Savings	-	$80 (80%)