Changelog

Track the latest updates, improvements, and fixes to InferXgate.

View on GitHub Install Latest

v0.3.0

latest

2025-12-01

Added

Semantic caching with configurable similarity threshold
Cost-based load balancing strategy
Azure OpenAI provider support
Request/response logging with configurable verbosity
Prometheus histogram metrics for latency percentiles

Improved

Redis connection pooling for better performance
Streaming response handling with proper backpressure
Error messages now include provider-specific details
Documentation site with comprehensive guides

Fixed

Memory leak in long-running streaming connections
Race condition in concurrent cache writes
Incorrect token counting for Claude 3 models

v0.2.0

2025-11-15

Added

Google Gemini provider support
JWT authentication with custom claims
Virtual API keys for team management
Health check endpoint with provider status
Configurable request timeout per provider

Improved

Reduced memory usage by 40% under load
Better error handling for network timeouts
Cache key generation is now deterministic

Fixed

Streaming responses not flushing properly
Rate limiter not resetting at window boundary
Config file hot-reload causing brief downtime

v0.1.0

2025-11-01

Added

Initial release of InferXgate
OpenAI-compatible API endpoint
Anthropic Claude provider support
OpenAI provider support
Redis response caching
Round-robin load balancing
Basic rate limiting
Prometheus metrics endpoint
Docker image and docker-compose setup

Stay Updated

Get notified about new releases and important updates.

RSS Feed Watch on GitHub