Performance

InferXgate vs LiteLLM Benchmark

We ran a comprehensive benchmark suite comparing InferXgate against LiteLLM, one of the most popular LLM proxy solutions. The results demonstrate InferXgate's competitive performance with superior reliability under sustained load.

Results at a Glance

Metric InferXgate LiteLLM
Baseline Avg Latency 3.39s 3.55s
Baseline P95 Latency 4.66s 5.10s
Sustained Load Errors 0.00% 0.14%
Max Latency Spike 15.04s 120s

Benchmark Overview

Our testing methodology included three distinct scenarios designed to evaluate real-world performance.

Test Description Duration
Baseline Latency Single user, sequential requests 30 requests
Throughput Ramp Scaling from 1 to 50 concurrent users 5.5 minutes
Sustained Load Constant 10 concurrent users 10 minutes

Test Environment

  • Backend: Anthropic Claude API
  • Infrastructure: Docker Compose (Redis, PostgreSQL)
  • Load Testing: Grafana k6
  • Monitoring: Prometheus, cAdvisor

Detailed Results

1. Baseline Latency Test

Single user performing 30 sequential requests

InferXgate demonstrated lower overhead across all latency percentiles:

Metric InferXgate LiteLLM Difference
Average 3.39s 3.55s -4.5%
Median 3.15s 3.27s -3.7%
P90 4.44s 4.64s -4.3%
P95 4.66s 5.10s -8.6%
Max 5.17s 7.58s -31.8%
Error Rate 0.00% 0.00%

Takeaway: InferXgate adds minimal proxy overhead, with significantly better tail latency performance.

2. Throughput Ramp Test

Ramping from 1 to 50 concurrent users over 5.5 minutes

Both solutions handled increasing load effectively:

Metric InferXgate LiteLLM
Total Requests 1,971 1,982
Throughput 5.93 req/s 5.94 req/s
Average Latency 3.34s 3.31s
P95 Latency 4.59s 4.57s
Tokens/Second 9.64 9.69
Error Rate 0.05% 0.05%

Takeaway: Performance is virtually identical under ramping concurrent load, with both systems experiencing a single timeout.

3. Sustained Load Test

10 concurrent users for 10 minutes

This is where InferXgate's reliability advantage becomes clear:

Metric InferXgate LiteLLM
Total Requests 1,372 1,350
Throughput 2.27 req/s 2.23 req/s
Average Latency 3.38s 3.46s
P95 Latency 4.74s 4.61s
Max Latency 15.04s 120s (timeout)
Error Rate 0.00% 0.14%

Takeaway: InferXgate completed 10 minutes of sustained load with zero errors, while LiteLLM experienced 2 request timeouts.

Why InferXgate?

Based on our benchmarks, InferXgate delivers measurable advantages for production workloads.

Superior Reliability

Zero errors during sustained load testing means your production workloads won't experience unexpected failures.

Lower Latency Overhead

4-8% lower latency in baseline tests translates to a snappier experience for your users.

Predictable Performance

Maximum latency of 15s vs 2-minute timeouts means more consistent response times under pressure.

Efficient Resource Usage

Significantly lower network overhead (934 KB vs 3.7 MB in sustained tests) reduces bandwidth costs.

Run Your Own Benchmark

Want to verify these results in your own environment? Our benchmark suite is open source.

git clone https://github.com/jasmedia/inferxgate
cd inferxgate/benchmark
./scripts/run-benchmark.sh all

Requirements

  • Docker & Docker Compose
  • k6 load testing tool
  • Valid ANTHROPIC_API_KEY

Methodology

All tests were conducted under identical conditions for fair comparison.

Same hardware

Both proxies ran on the same Docker network

Same backend

Both proxied to Anthropic's Claude API

Same prompts

Identical request payloads for fair comparison

Cool-down periods

30-60 seconds between tests to prevent interference

Results are reproducible using our open-source benchmark scripts.

Conclusion

InferXgate delivers comparable throughput to LiteLLM while providing better reliability and lower latency overhead. For production workloads where uptime and predictable performance matter, InferXgate is the clear choice.

Benchmark conducted December 2025. Results may vary based on hardware, network conditions, and API provider performance.