Performance

InferXgate vs LiteLLM Benchmark

We ran a comprehensive benchmark suite comparing InferXgate against LiteLLM, one of the most popular LLM proxy solutions. The results demonstrate InferXgate's competitive performance with superior reliability under sustained load.

Results at a Glance

Metric	InferXgate	LiteLLM
Baseline Avg Latency	3.39s	3.55s
Baseline P95 Latency	4.66s	5.10s
Sustained Load Errors	0.00%	0.14%
Max Latency Spike	15.04s	120s

Benchmark Overview

Our testing methodology included three distinct scenarios designed to evaluate real-world performance.

Test	Description	Duration
Baseline Latency	Single user, sequential requests	30 requests
Throughput Ramp	Scaling from 1 to 50 concurrent users	5.5 minutes
Sustained Load	Constant 10 concurrent users	10 minutes

Test Environment

Backend: Anthropic Claude API
Infrastructure: Docker Compose (Redis, PostgreSQL)
Load Testing: Grafana k6
Monitoring: Prometheus, cAdvisor

Detailed Results

1. Baseline Latency Test

Single user performing 30 sequential requests

InferXgate demonstrated lower overhead across all latency percentiles:

Metric	InferXgate	LiteLLM	Difference
Average	3.39s	3.55s	-4.5%
Median	3.15s	3.27s	-3.7%
P90	4.44s	4.64s	-4.3%
P95	4.66s	5.10s	-8.6%
Max	5.17s	7.58s	-31.8%
Error Rate	0.00%	0.00%	—

Takeaway: InferXgate adds minimal proxy overhead, with significantly better tail latency performance.

2. Throughput Ramp Test

Ramping from 1 to 50 concurrent users over 5.5 minutes

Both solutions handled increasing load effectively:

Metric	InferXgate	LiteLLM
Total Requests	1,971	1,982
Throughput	5.93 req/s	5.94 req/s
Average Latency	3.34s	3.31s
P95 Latency	4.59s	4.57s
Tokens/Second	9.64	9.69
Error Rate	0.05%	0.05%

Takeaway: Performance is virtually identical under ramping concurrent load, with both systems experiencing a single timeout.

3. Sustained Load Test

10 concurrent users for 10 minutes

This is where InferXgate's reliability advantage becomes clear:

Metric	InferXgate	LiteLLM
Total Requests	1,372	1,350
Throughput	2.27 req/s	2.23 req/s
Average Latency	3.38s	3.46s
P95 Latency	4.74s	4.61s
Max Latency	15.04s	120s (timeout)
Error Rate	0.00%	0.14%

Takeaway: InferXgate completed 10 minutes of sustained load with zero errors, while LiteLLM experienced 2 request timeouts.

Why InferXgate?

Based on our benchmarks, InferXgate delivers measurable advantages for production workloads.

Superior Reliability

Zero errors during sustained load testing means your production workloads won't experience unexpected failures.

Lower Latency Overhead

4-8% lower latency in baseline tests translates to a snappier experience for your users.

Predictable Performance

Maximum latency of 15s vs 2-minute timeouts means more consistent response times under pressure.

Efficient Resource Usage

Significantly lower network overhead (934 KB vs 3.7 MB in sustained tests) reduces bandwidth costs.

Run Your Own Benchmark

Want to verify these results in your own environment? Our benchmark suite is open source.

git clone https://github.com/jasmedia/inferxgate
cd inferxgate/benchmark
./scripts/run-benchmark.sh all

Requirements

Docker & Docker Compose
k6 load testing tool
Valid ANTHROPIC_API_KEY

Methodology

All tests were conducted under identical conditions for fair comparison.

Same hardware

Both proxies ran on the same Docker network

Same backend

Both proxied to Anthropic's Claude API

Same prompts

Identical request payloads for fair comparison

Cool-down periods

30-60 seconds between tests to prevent interference

Results are reproducible using our open-source benchmark scripts.

Conclusion

InferXgate delivers comparable throughput to LiteLLM while providing better reliability and lower latency overhead. For production workloads where uptime and predictable performance matter, InferXgate is the clear choice.

Get Started with InferXgate View on GitHub

Benchmark conducted December 2025. Results may vary based on hardware, network conditions, and API provider performance.