Skip to content

Commit 5fbcd22

Browse files
authored
[Fix] Allow overriding all constants using a .env variable (#10803)
* fix: bump: DEFAULT_MAX_RECURSE_DEPTH * fix: bump: DEFAULT_MAX_RECURSE_DEPTH * test: test_vertex_ai_complex_response_schema * fix: allow all constants to be overriden * fix: allow all numeric constants to be overriden with env vars * fix: remove dup DEFAULT_MAX_TOKENS in constants.py * document all constants env vars * docs - DEFAULT_PROMPT_INJECTION_SIMILARITY_THRESHOLD
1 parent a6412cd commit 5fbcd22

File tree

3 files changed

+341
-103
lines changed

3 files changed

+341
-103
lines changed

docs/my-website/docs/proxy/config_settings.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,14 +331,19 @@ router_settings:
331331
| AZURE_PASSWORD | Password for Azure services, use in conjunction with AZURE_USERNAME for azure ad token with basic username/password workflow
332332
| AZURE_FEDERATED_TOKEN_FILE | File path to Azure federated token
333333
| AZURE_KEY_VAULT_URI | URI for Azure Key Vault
334+
| AZURE_OPERATION_POLLING_TIMEOUT | Timeout in seconds for Azure operation polling
334335
| AZURE_STORAGE_ACCOUNT_KEY | The Azure Storage Account Key to use for Authentication to Azure Blob Storage logging
335336
| AZURE_STORAGE_ACCOUNT_NAME | Name of the Azure Storage Account to use for logging to Azure Blob Storage
336337
| AZURE_STORAGE_FILE_SYSTEM | Name of the Azure Storage File System to use for logging to Azure Blob Storage. (Typically the Container name)
337338
| AZURE_STORAGE_TENANT_ID | The Application Tenant ID to use for Authentication to Azure Blob Storage logging
338339
| AZURE_STORAGE_CLIENT_ID | The Application Client ID to use for Authentication to Azure Blob Storage logging
339340
| AZURE_STORAGE_CLIENT_SECRET | The Application Client Secret to use for Authentication to Azure Blob Storage logging
341+
| BATCH_STATUS_POLL_INTERVAL_SECONDS | Interval in seconds for polling batch status. Default is 3600 (1 hour)
342+
| BATCH_STATUS_POLL_MAX_ATTEMPTS | Maximum number of attempts for polling batch status. Default is 24 (for 24 hours)
343+
| BEDROCK_MAX_POLICY_SIZE | Maximum size for Bedrock policy. Default is 75
340344
| BERRISPEND_ACCOUNT_ID | Account ID for BerriSpend service
341345
| BRAINTRUST_API_KEY | API key for Braintrust integration
346+
| CACHED_STREAMING_CHUNK_DELAY | Delay in seconds for cached streaming chunks. Default is 0.02
342347
| CIRCLE_OIDC_TOKEN | OpenID Connect token for CircleCI
343348
| CIRCLE_OIDC_TOKEN_V2 | Version 2 of the OpenID Connect token for CircleCI
344349
| CONFIG_FILE_PATH | File path for configuration file
@@ -352,6 +357,9 @@ router_settings:
352357
| DATABASE_USER | Username for database connection
353358
| DATABASE_USERNAME | Alias for database user
354359
| DATABRICKS_API_BASE | Base URL for Databricks API
360+
| DAYS_IN_A_MONTH | Days in a month for calculation purposes. Default is 28
361+
| DAYS_IN_A_WEEK | Days in a week for calculation purposes. Default is 7
362+
| DAYS_IN_A_YEAR | Days in a year for calculation purposes. Default is 365
355363
| DD_BASE_URL | Base URL for Datadog integration
356364
| DATADOG_BASE_URL | (Alternative to DD_BASE_URL) Base URL for Datadog integration
357365
| _DATADOG_BASE_URL | (Alternative to DD_BASE_URL) Base URL for Datadog integration
@@ -362,15 +370,58 @@ router_settings:
362370
| DD_SERVICE | Service identifier for Datadog logs. Defaults to "litellm-server"
363371
| DD_VERSION | Version identifier for Datadog logs. Defaults to "unknown"
364372
| DEBUG_OTEL | Enable debug mode for OpenTelemetry
373+
| DEFAULT_ALLOWED_FAILS | Maximum failures allowed before cooling down a model. Default is 3
374+
| DEFAULT_ANTHROPIC_CHAT_MAX_TOKENS | Default maximum tokens for Anthropic chat completions. Default is 4096
375+
| DEFAULT_BATCH_SIZE | Default batch size for operations. Default is 512
376+
| DEFAULT_COOLDOWN_TIME_SECONDS | Duration in seconds to cooldown a model after failures. Default is 5
377+
| DEFAULT_CRON_JOB_LOCK_TTL_SECONDS | Time-to-live for cron job locks in seconds. Default is 60 (1 minute)
378+
| DEFAULT_FAILURE_THRESHOLD_PERCENT | Threshold percentage of failures to cool down a deployment. Default is 0.5 (50%)
379+
| DEFAULT_FLUSH_INTERVAL_SECONDS | Default interval in seconds for flushing operations. Default is 5
380+
| DEFAULT_HEALTH_CHECK_INTERVAL | Default interval in seconds for health checks. Default is 300 (5 minutes)
381+
| DEFAULT_IMAGE_HEIGHT | Default height for images. Default is 300
382+
| DEFAULT_IMAGE_TOKEN_COUNT | Default token count for images. Default is 250
383+
| DEFAULT_IMAGE_WIDTH | Default width for images. Default is 300
384+
| DEFAULT_IN_MEMORY_TTL | Default time-to-live for in-memory cache in seconds. Default is 5
385+
| DEFAULT_MAX_LRU_CACHE_SIZE | Default maximum size for LRU cache. Default is 16
386+
| DEFAULT_MAX_RECURSE_DEPTH | Default maximum recursion depth. Default is 100
387+
| DEFAULT_MAX_RETRIES | Default maximum retry attempts. Default is 2
388+
| DEFAULT_MAX_TOKENS | Default maximum tokens for LLM calls. Default is 4096
389+
| DEFAULT_MAX_TOKENS_FOR_TRITON | Default maximum tokens for Triton models. Default is 2000
390+
| DEFAULT_MOCK_RESPONSE_COMPLETION_TOKEN_COUNT | Default token count for mock response completions. Default is 20
391+
| DEFAULT_MOCK_RESPONSE_PROMPT_TOKEN_COUNT | Default token count for mock response prompts. Default is 10
392+
| DEFAULT_MODEL_CREATED_AT_TIME | Default creation timestamp for models. Default is 1677610602
393+
| DEFAULT_PROMPT_INJECTION_SIMILARITY_THRESHOLD | Default threshold for prompt injection similarity. Default is 0.7
394+
| DEFAULT_POLLING_INTERVAL | Default polling interval for schedulers in seconds. Default is 0.03
395+
| DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET | Default high reasoning effort thinking budget. Default is 4096
396+
| DEFAULT_REASONING_EFFORT_LOW_THINKING_BUDGET | Default low reasoning effort thinking budget. Default is 1024
397+
| DEFAULT_REASONING_EFFORT_MEDIUM_THINKING_BUDGET | Default medium reasoning effort thinking budget. Default is 2048
398+
| DEFAULT_REDIS_SYNC_INTERVAL | Default Redis synchronization interval in seconds. Default is 1
399+
| DEFAULT_REPLICATE_GPU_PRICE_PER_SECOND | Default price per second for Replicate GPU. Default is 0.001400
400+
| DEFAULT_REPLICATE_POLLING_DELAY_SECONDS | Default delay in seconds for Replicate polling. Default is 1
401+
| DEFAULT_REPLICATE_POLLING_RETRIES | Default number of retries for Replicate polling. Default is 5
402+
| DEFAULT_SLACK_ALERTING_THRESHOLD | Default threshold for Slack alerting. Default is 300
403+
| DEFAULT_SOFT_BUDGET | Default soft budget for LiteLLM proxy keys. Default is 50.0
404+
| DEFAULT_TRIM_RATIO | Default ratio of tokens to trim from prompt end. Default is 0.75
365405
| DIRECT_URL | Direct URL for service endpoint
366406
| DISABLE_ADMIN_UI | Toggle to disable the admin UI
367407
| DISABLE_SCHEMA_UPDATE | Toggle to disable schema updates
368408
| DOCS_DESCRIPTION | Description text for documentation pages
369409
| DOCS_FILTERED | Flag indicating filtered documentation
370410
| DOCS_TITLE | Title of the documentation pages
371411
| DOCS_URL | The path to the Swagger API documentation. **By default this is "/"**
412+
| EMAIL_LOGO_URL | URL for the logo used in emails
372413
| EMAIL_SUPPORT_CONTACT | Support contact email address
373414
| EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING | Flag to enable new multi-instance rate limiting. **Default is False**
415+
| FIREWORKS_AI_4_B | Size parameter for Fireworks AI 4B model. Default is 4
416+
| FIREWORKS_AI_16_B | Size parameter for Fireworks AI 16B model. Default is 16
417+
| FIREWORKS_AI_56_B_MOE | Size parameter for Fireworks AI 56B MOE model. Default is 56
418+
| FIREWORKS_AI_80_B | Size parameter for Fireworks AI 80B model. Default is 80
419+
| FIREWORKS_AI_176_B_MOE | Size parameter for Fireworks AI 176B MOE model. Default is 176
420+
| FUNCTION_DEFINITION_TOKEN_COUNT | Token count for function definitions. Default is 9
421+
| GALILEO_BASE_URL | Base URL for Galileo platform
422+
| GALILEO_PASSWORD | Password for Galileo authentication
423+
| GALILEO_PROJECT_ID | Project ID for Galileo usage
424+
| GALILEO_USERNAME | Username for Galileo authentication
374425
| GCS_BUCKET_NAME | Name of the Google Cloud Storage bucket
375426
| GCS_PATH_SERVICE_ACCOUNT | Path to the Google Cloud service account JSON file
376427
| GCS_FLUSH_INTERVAL | Flush interval for GCS logging (in seconds). Specify how often you want a log to be sent to GCS. **Default is 20 seconds**
@@ -402,6 +453,7 @@ router_settings:
402453
| GOOGLE_CLIENT_ID | Client ID for Google OAuth
403454
| GOOGLE_CLIENT_SECRET | Client secret for Google OAuth
404455
| GOOGLE_KMS_RESOURCE_NAME | Name of the resource in Google KMS
456+
| HEALTH_CHECK_TIMEOUT_SECONDS | Timeout in seconds for health checks. Default is 60
405457
| HF_API_BASE | Base URL for Hugging Face API
406458
| HCP_VAULT_ADDR | Address for [Hashicorp Vault Secret Manager](../secret.md#hashicorp-vault)
407459
| HCP_VAULT_CLIENT_CERT | Path to client certificate for [Hashicorp Vault Secret Manager](../secret.md#hashicorp-vault)
@@ -411,9 +463,13 @@ router_settings:
411463
| HCP_VAULT_CERT_ROLE | Role for [Hashicorp Vault Secret Manager Auth](../secret.md#hashicorp-vault)
412464
| HELICONE_API_KEY | API key for Helicone service
413465
| HOSTNAME | Hostname for the server, this will be [emitted to `datadog` logs](https://docs.litellm.ai/docs/proxy/logging#datadog)
466+
| HOURS_IN_A_DAY | Hours in a day for calculation purposes. Default is 24
414467
| HUGGINGFACE_API_BASE | Base URL for Hugging Face API
415468
| HUGGINGFACE_API_KEY | API key for Hugging Face API
469+
| HUMANLOOP_PROMPT_CACHE_TTL_SECONDS | Time-to-live in seconds for cached prompts in Humanloop. Default is 60
416470
| IAM_TOKEN_DB_AUTH | IAM token for database authentication
471+
| INITIAL_RETRY_DELAY | Initial delay in seconds for retrying requests. Default is 0.5
472+
| JITTER | Jitter factor for retry delay calculations. Default is 0.75
417473
| JSON_LOGS | Enable JSON formatted logging
418474
| JWT_AUDIENCE | Expected audience for JWT tokens
419475
| JWT_PUBLIC_KEY_URL | URL to fetch public key for JWT verification
@@ -434,6 +490,7 @@ router_settings:
434490
| LANGSMITH_PROJECT | Project name for Langsmith integration
435491
| LANGSMITH_SAMPLING_RATE | Sampling rate for Langsmith logging
436492
| LANGTRACE_API_KEY | API key for Langtrace service
493+
| LENGTH_OF_LITELLM_GENERATED_KEY | Length of keys generated by LiteLLM. Default is 16
437494
| LITERAL_API_KEY | API key for Literal integration
438495
| LITERAL_API_URL | API URL for Literal service
439496
| LITERAL_BATCH_SIZE | Batch size for Literal operations
@@ -454,6 +511,21 @@ router_settings:
454511
| LITELLM_TOKEN | Access token for LiteLLM integration
455512
| LITELLM_PRINT_STANDARD_LOGGING_PAYLOAD | If true, prints the standard logging payload to the console - useful for debugging
456513
| LOGFIRE_TOKEN | Token for Logfire logging service
514+
| MAX_EXCEPTION_MESSAGE_LENGTH | Maximum length for exception messages. Default is 2000
515+
| MAX_IN_MEMORY_QUEUE_FLUSH_COUNT | Maximum count for in-memory queue flush operations. Default is 1000
516+
| MAX_LONG_SIDE_FOR_IMAGE_HIGH_RES | Maximum length for the long side of high-resolution images. Default is 2000
517+
| MAX_REDIS_BUFFER_DEQUEUE_COUNT | Maximum count for Redis buffer dequeue operations. Default is 100
518+
| MAX_SHORT_SIDE_FOR_IMAGE_HIGH_RES | Maximum length for the short side of high-resolution images. Default is 768
519+
| MAX_SIZE_IN_MEMORY_QUEUE | Maximum size for in-memory queue. Default is 10000
520+
| MAX_SIZE_PER_ITEM_IN_MEMORY_CACHE_IN_KB | Maximum size in KB for each item in memory cache. Default is 512 or 1024
521+
| MAX_SPENDLOG_ROWS_TO_QUERY | Maximum number of spend log rows to query. Default is 1,000,000
522+
| MAX_TEAM_LIST_LIMIT | Maximum number of teams to list. Default is 20
523+
| MAX_TILE_HEIGHT | Maximum height for image tiles. Default is 512
524+
| MAX_TILE_WIDTH | Maximum width for image tiles. Default is 512
525+
| MAX_TOKEN_TRIMMING_ATTEMPTS | Maximum number of attempts to trim a token message. Default is 10
526+
| MAX_RETRY_DELAY | Maximum delay in seconds for retrying requests. Default is 8.0
527+
| MIN_NON_ZERO_TEMPERATURE | Minimum non-zero temperature value. Default is 0.0001
528+
| MINIMUM_PROMPT_CACHE_TOKEN_COUNT | Minimum token count for caching a prompt. Default is 1024
457529
| MISTRAL_API_BASE | Base URL for Mistral API
458530
| MISTRAL_API_KEY | API key for Mistral API
459531
| MICROSOFT_CLIENT_ID | Client ID for Microsoft services
@@ -462,10 +534,12 @@ router_settings:
462534
| MICROSOFT_SERVICE_PRINCIPAL_ID | Service Principal ID for Microsoft Enterprise Application. (This is an advanced feature if you want litellm to auto-assign members to Litellm Teams based on their Microsoft Entra ID Groups)
463535
| NO_DOCS | Flag to disable documentation generation
464536
| NO_PROXY | List of addresses to bypass proxy
537+
| NON_LLM_CONNECTION_TIMEOUT | Timeout in seconds for non-LLM service connections. Default is 15
465538
| OAUTH_TOKEN_INFO_ENDPOINT | Endpoint for OAuth token info retrieval
466539
| OPENAI_BASE_URL | Base URL for OpenAI API
467540
| OPENAI_API_BASE | Base URL for OpenAI API
468541
| OPENAI_API_KEY | API key for OpenAI services
542+
| OPENAI_FILE_SEARCH_COST_PER_1K_CALLS | Cost per 1000 calls for OpenAI file search. Default is 0.0025
469543
| OPENAI_ORGANIZATION | Organization identifier for OpenAI
470544
| OPENID_BASE_URL | Base URL for OpenID Connect services
471545
| OPENID_CLIENT_ID | Client ID for OpenID Connect authentication
@@ -487,21 +561,37 @@ router_settings:
487561
| PREDIBASE_API_BASE | Base URL for Predibase API
488562
| PRESIDIO_ANALYZER_API_BASE | Base URL for Presidio Analyzer service
489563
| PRESIDIO_ANONYMIZER_API_BASE | Base URL for Presidio Anonymizer service
564+
| PROMETHEUS_BUDGET_METRICS_REFRESH_INTERVAL_MINUTES | Refresh interval in minutes for Prometheus budget metrics. Default is 5
565+
| PROMETHEUS_FALLBACK_STATS_SEND_TIME_HOURS | Fallback time in hours for sending stats to Prometheus. Default is 9
490566
| PROMETHEUS_URL | URL for Prometheus service
491567
| PROMPTLAYER_API_KEY | API key for PromptLayer integration
492568
| PROXY_ADMIN_ID | Admin identifier for proxy server
493569
| PROXY_BASE_URL | Base URL for proxy service
570+
| PROXY_BATCH_WRITE_AT | Time in seconds to wait before batch writing spend logs to the database. Default is 10
571+
| PROXY_BUDGET_RESCHEDULER_MAX_TIME | Maximum time in seconds to wait before checking database for budget resets. Default is 605
572+
| PROXY_BUDGET_RESCHEDULER_MIN_TIME | Minimum time in seconds to wait before checking database for budget resets. Default is 597
494573
| PROXY_LOGOUT_URL | URL for logging out of the proxy service
495574
| LITELLM_MASTER_KEY | Master key for proxy authentication
496575
| QDRANT_API_BASE | Base URL for Qdrant API
497576
| QDRANT_API_KEY | API key for Qdrant service
577+
| QDRANT_SCALAR_QUANTILE | Scalar quantile for Qdrant operations. Default is 0.99
498578
| QDRANT_URL | Connection URL for Qdrant database
579+
| QDRANT_VECTOR_SIZE | Vector size for Qdrant operations. Default is 1536
580+
| REDIS_CONNECTION_POOL_TIMEOUT | Timeout in seconds for Redis connection pool. Default is 5
499581
| REDIS_HOST | Hostname for Redis server
500582
| REDIS_PASSWORD | Password for Redis service
501583
| REDIS_PORT | Port number for Redis server
584+
| REDIS_SOCKET_TIMEOUT | Timeout in seconds for Redis socket operations. Default is 0.1
502585
| REDOC_URL | The path to the Redoc Fast API documentation. **By default this is "/redoc"**
586+
| REPEATED_STREAMING_CHUNK_LIMIT | Limit for repeated streaming chunks to detect looping. Default is 100
587+
| REPLICATE_MODEL_NAME_WITH_ID_LENGTH | Length of Replicate model names with ID. Default is 64
588+
| REPLICATE_POLLING_DELAY_SECONDS | Delay in seconds for Replicate polling operations. Default is 0.5
589+
| REQUEST_TIMEOUT | Timeout in seconds for requests. Default is 6000
590+
| ROUTER_MAX_FALLBACKS | Maximum number of fallbacks for router. Default is 5
591+
| SECRET_MANAGER_REFRESH_INTERVAL | Refresh interval in seconds for secret manager. Default is 86400 (24 hours)
503592
| SERVER_ROOT_PATH | Root path for the server application
504593
| SET_VERBOSE | Flag to enable verbose logging
594+
| SINGLE_DEPLOYMENT_TRAFFIC_FAILURE_THRESHOLD | Minimum number of requests to consider "reasonable traffic" for single-deployment cooldown logic. Default is 1000
505595
| SLACK_DAILY_REPORT_FREQUENCY | Frequency of daily Slack reports (e.g., daily, weekly)
506596
| SLACK_WEBHOOK_URL | Webhook URL for Slack integration
507597
| SMTP_HOST | Hostname for the SMTP server
@@ -518,7 +608,17 @@ router_settings:
518608
| SUPABASE_KEY | API key for Supabase service
519609
| SUPABASE_URL | Base URL for Supabase instance
520610
| STORE_MODEL_IN_DB | If true, enables storing model + credential information in the DB.
611+
| SYSTEM_MESSAGE_TOKEN_COUNT | Token count for system messages. Default is 4
521612
| TEST_EMAIL_ADDRESS | Email address used for testing purposes
613+
| TOGETHER_AI_4_B | Size parameter for Together AI 4B model. Default is 4
614+
| TOGETHER_AI_8_B | Size parameter for Together AI 8B model. Default is 8
615+
| TOGETHER_AI_21_B | Size parameter for Together AI 21B model. Default is 21
616+
| TOGETHER_AI_41_B | Size parameter for Together AI 41B model. Default is 41
617+
| TOGETHER_AI_80_B | Size parameter for Together AI 80B model. Default is 80
618+
| TOGETHER_AI_110_B | Size parameter for Together AI 110B model. Default is 110
619+
| TOGETHER_AI_EMBEDDING_150_M | Size parameter for Together AI 150M embedding model. Default is 150
620+
| TOGETHER_AI_EMBEDDING_350_M | Size parameter for Together AI 350M embedding model. Default is 350
621+
| TOOL_CHOICE_OBJECT_TOKEN_COUNT | Token count for tool choice objects. Default is 4
522622
| UI_LOGO_PATH | Path to the logo image used in the UI
523623
| UI_PASSWORD | Password for accessing the UI
524624
| UI_USERNAME | Username for accessing the UI

0 commit comments

Comments
 (0)