Caching
Configure Redis caching to reduce costs and improve response times.
InferXgate’s intelligent caching can reduce your LLM API costs by 60-90%.
How It Works
- Request comes in with a prompt
- InferXgate generates a cache key from the request
- If cached, returns immediately (cache hit)
- If not cached, forwards to provider and caches response
Configuration
# Redis connection
REDIS_URL=redis://localhost:6379
# Enable caching (default: true)
ENABLE_CACHING=true
# Cache TTL in seconds (default: 3600 = 1 hour)
CACHE_TTL_SECONDS=3600
# Max cache size in MB (default: 1024)
CACHE_MAX_SIZE_MB=1024
Cache Key Generation
Cache keys are generated from:
- Model name
- Messages content
- Temperature
- Max tokens
- Other parameters
Requests with identical parameters return cached responses.
Cache Headers
Response headers indicate cache status:
X-Cache: HIT # Served from cache
X-Cache: MISS # Fetched from provider
X-Cache-TTL: 3540 # Seconds until expiry
Bypassing Cache
Force a fresh response:
response = client.chat.completions.create(
model="claude-3-opus-20240229",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={"X-Cache-Control": "no-cache"}
)
Cache Statistics
View cache metrics:
curl http://localhost:3000/stats
Response:
{
"cache": {
"hits": 8500,
"misses": 1500,
"hit_rate": 0.85,
"size_bytes": 104857600,
"entries": 5000
}
}
Redis Setup
Docker
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
command: redis-server --appendonly yes
Redis Cluster
For high availability:
REDIS_URL=redis://node1:6379,node2:6379,node3:6379
REDIS_CLUSTER=true
Best Practices
- Set appropriate TTL - Balance freshness vs. cost savings
- Monitor hit rate - Aim for 60%+ hit rate
- Size cache appropriately - Enough for your working set
- Use Redis persistence - Preserve cache across restarts
Cost Savings Example
| Metric | Without Cache | With Cache (80% hit) |
|---|---|---|
| Requests | 10,000 | 10,000 |
| API Calls | 10,000 | 2,000 |
| Cost (at $0.01/call) | $100 | $20 |
| Savings | - | $80 (80%) |