InferXgate vs LiteLLM Benchmark
We ran a comprehensive benchmark suite comparing InferXgate against LiteLLM, one of the most popular LLM proxy solutions. The results demonstrate InferXgate's competitive performance with superior reliability under sustained load.
Results at a Glance
| Metric | InferXgate | LiteLLM |
|---|---|---|
| Baseline Avg Latency | 3.39s | 3.55s |
| Baseline P95 Latency | 4.66s | 5.10s |
| Sustained Load Errors | 0.00% | 0.14% |
| Max Latency Spike | 15.04s | 120s |
Benchmark Overview
Our testing methodology included three distinct scenarios designed to evaluate real-world performance.
| Test | Description | Duration |
|---|---|---|
| Baseline Latency | Single user, sequential requests | 30 requests |
| Throughput Ramp | Scaling from 1 to 50 concurrent users | 5.5 minutes |
| Sustained Load | Constant 10 concurrent users | 10 minutes |
Test Environment
- Backend: Anthropic Claude API
- Infrastructure: Docker Compose (Redis, PostgreSQL)
- Load Testing: Grafana k6
- Monitoring: Prometheus, cAdvisor
Detailed Results
1. Baseline Latency Test
Single user performing 30 sequential requests
InferXgate demonstrated lower overhead across all latency percentiles:
| Metric | InferXgate | LiteLLM | Difference |
|---|---|---|---|
| Average | 3.39s | 3.55s | -4.5% |
| Median | 3.15s | 3.27s | -3.7% |
| P90 | 4.44s | 4.64s | -4.3% |
| P95 | 4.66s | 5.10s | -8.6% |
| Max | 5.17s | 7.58s | -31.8% |
| Error Rate | 0.00% | 0.00% | — |
Takeaway: InferXgate adds minimal proxy overhead, with significantly better tail latency performance.
2. Throughput Ramp Test
Ramping from 1 to 50 concurrent users over 5.5 minutes
Both solutions handled increasing load effectively:
| Metric | InferXgate | LiteLLM |
|---|---|---|
| Total Requests | 1,971 | 1,982 |
| Throughput | 5.93 req/s | 5.94 req/s |
| Average Latency | 3.34s | 3.31s |
| P95 Latency | 4.59s | 4.57s |
| Tokens/Second | 9.64 | 9.69 |
| Error Rate | 0.05% | 0.05% |
Takeaway: Performance is virtually identical under ramping concurrent load, with both systems experiencing a single timeout.
3. Sustained Load Test
10 concurrent users for 10 minutes
This is where InferXgate's reliability advantage becomes clear:
| Metric | InferXgate | LiteLLM |
|---|---|---|
| Total Requests | 1,372 | 1,350 |
| Throughput | 2.27 req/s | 2.23 req/s |
| Average Latency | 3.38s | 3.46s |
| P95 Latency | 4.74s | 4.61s |
| Max Latency | 15.04s | 120s (timeout) |
| Error Rate | 0.00% | 0.14% |
Takeaway: InferXgate completed 10 minutes of sustained load with zero errors, while LiteLLM experienced 2 request timeouts.
Why InferXgate?
Based on our benchmarks, InferXgate delivers measurable advantages for production workloads.
Superior Reliability
Zero errors during sustained load testing means your production workloads won't experience unexpected failures.
Lower Latency Overhead
4-8% lower latency in baseline tests translates to a snappier experience for your users.
Predictable Performance
Maximum latency of 15s vs 2-minute timeouts means more consistent response times under pressure.
Efficient Resource Usage
Significantly lower network overhead (934 KB vs 3.7 MB in sustained tests) reduces bandwidth costs.
Run Your Own Benchmark
Want to verify these results in your own environment? Our benchmark suite is open source.
git clone https://github.com/jasmedia/inferxgate
cd inferxgate/benchmark
./scripts/run-benchmark.sh all Requirements
- Docker & Docker Compose
- k6 load testing tool
- Valid
ANTHROPIC_API_KEY
Methodology
All tests were conducted under identical conditions for fair comparison.
Same hardware
Both proxies ran on the same Docker network
Same backend
Both proxied to Anthropic's Claude API
Same prompts
Identical request payloads for fair comparison
Cool-down periods
30-60 seconds between tests to prevent interference
Results are reproducible using our open-source benchmark scripts.
Conclusion
InferXgate delivers comparable throughput to LiteLLM while providing better reliability and lower latency overhead. For production workloads where uptime and predictable performance matter, InferXgate is the clear choice.
Benchmark conducted December 2025. Results may vary based on hardware, network conditions, and API provider performance.