How to Use AI Gateway (LiteLLM): Unified Management of AI Large Language Models

Mar 19, 2025

中文: 如何使用AI 网关(LiteLLM): 统一管理AI大语言模型接入

Table of Contents

What is LiteLLM Gateway Server? #

LiteLLM is an open-source unified interface tool designed to simplify interactions with various large language models (such as OpenAI, Anthropic, Google, etc.). Acting as a proxy layer, it provides standardized API interfaces, helping enterprises centrally manage access to multiple AI service providers, monitor usage, control costs, and simplify technical integration. LiteLLM is particularly suitable for enterprise environments that need to use multiple LLM services simultaneously, significantly reducing management complexity and improving security.

Enterprise Pain Points #

In today’s era of rapid development of AI large language models (LLMs), enterprises face numerous challenges:

Multi-vendor Management Dilemma
- Different AI service providers (such as OpenAI, Anthropic, Google, etc.) require different API keys
- Each vendor has independent billing systems and usage statistics
- Teams need to manage multiple platform accounts and keys
Cost Control Challenges
- Difficulty in tracking actual API usage across different teams
- Lack of unified cost accounting and budget management
- Inability to effectively monitor and optimize API call costs
Technical Integration Complexity
- Different vendors have varying API interface formats
- Need to maintain separate integration code for each vendor
- Inconsistent model parameters and calling methods
Security and Compliance Challenges
- Scattered API key storage increases security risks
- Difficulty in implementing unified access control and permission management
- Lack of complete audit logs and compliance records

Quick Start #

1. Environment Preparation #

Before starting, ensure your system meets the following requirements:

Docker and Docker Compose are installed
Required API keys are obtained (such as OpenRouter API Key or OpenAI Key)

2. Installation Steps #

mkdir litellm-server
cd litellm-server

echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="sk-1234"' > .env

source .env

2. Configure LiteLLM #

vi config.yaml

model_list:
  - model_name: gpt-4o-mini
    litellm_params:
      model: gpt-4o-mini
      api_key: os.environ/OPENROUTER_API_KEY
      api_base: https://openrouter.ai/api/v1
  - model_name: "gpt-4o"
    litellm_params:
      model: "openrouter/openai/gpt-4o"
      api_base: "https://openrouter.ai/api/v1"
      api_key: "os.environ/OPENROUTER_API_KEY"

general_settings:
  store_model_in_db: true
  store_prompts_in_spend_logs: true

3. Configure Docker Compose #

version: "3.11"
services:
  litellm:
    build:
      context: .
      args:
        target: runtime
    image: ghcr.io/berriai/litellm:main-stable
    #########################################
    ## Uncomment these lines to start proxy with a config.yaml file ##
    volumes:
      - ./config.yaml:/app/config.yaml #<<- this is missing in the docker-compose file currently
    command:
      - "--config=/app/config.yaml"
    ##############################################
    ports:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    environment:
        DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
        STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
    env_file:
      - .env # Load local .env file
    depends_on:
      - db  # Indicates that this service depends on the 'db' service, ensuring 'db' starts first
    healthcheck:  # Defines the health check configuration for the container
      test: [ "CMD", "curl", "-f", "http://localhost:4000/health/liveliness || exit 1" ]  # Command to execute for health check
      interval: 30s  # Perform health check every 30 seconds
      timeout: 10s   # Health check command times out after 10 seconds
      retries: 3     # Retry up to 3 times if health check fails
      start_period: 40s  # Wait 40 seconds after container start before beginning health checks

  db:
    image: postgres:16
    restart: always
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: dbpassword9090
    ports:
      - "54321:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data  # Persists Postgres data across container restarts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
      interval: 1s
      timeout: 5s
      retries: 10

  prometheus:
    image: prom/prometheus
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9099:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
    restart: always

volumes:
  prometheus_data:
    driver: local
  postgres_data:
    name: litellm_postgres_data  # Named volume for Postgres data persistence

4. Start Services #

# Start services
docker-compose up -d

# Check service status
docker-compose ps

# View logs
docker-compose logs -f

5. API Call Example #

Create virtual key: Visit http://localhost:4000/ui/?userID=default_user_id&page=api-keys

Virtual Key Creation Interface

Call example:

export $LITELLM_API_KEY=xxxx 
curl http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_API_KEY" \
  -d '{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are an assistant. Your name is Li Si San"
    },
    {
      "role": "user",
      "content": "how are you? what your name?"
    }
  ]
}'

6. Management Interface #

Access LiteLLM management interface: http://localhost:4000/ui

Main features:

Model Management: View and configure available models
Usage Statistics: Monitor API calls and costs
Key Management: Create and manage virtual keys
Log Viewing: View detailed call logs

Interface preview: Management Interface Overview Model Management Usage Statistics Log Viewing

Summary #

LiteLLM, as a unified LLM proxy layer, provides enterprises with an elegant solution:

Unified API Interface
- Provides standardized interfaces, simplifying integration work
- Ability to switch between different AI service providers without modifying existing code
- Unified model parameters and calling methods
Centralized Key Management
- Virtual key system protecting original API keys
- Unified key management and rotation mechanism
- Fine-grained access control and permission management
Complete Monitoring and Analysis
- Unified usage statistics and cost analysis
- Detailed call logs and performance monitoring
- Support for multi-team resource allocation and billing
Flexible Deployment Options
- Support for local deployment and cloud services
- Scalable architecture design
- Rich integration options