From c14b69346a67d4c426d7ddb32ef32812c449e67c Mon Sep 17 00:00:00 2001 From: Mohamed Bassem Date: Sun, 12 Oct 2025 22:25:29 +0100 Subject: feat: support passing multiple proxy values (#2039) * feat: support passing multiple proxy values * fix typo * trim and filter --- docs/docs/03-configuration.md | 55 ++++++++++++++++++++++--------------------- 1 file changed, 28 insertions(+), 27 deletions(-) (limited to 'docs') diff --git a/docs/docs/03-configuration.md b/docs/docs/03-configuration.md index 0fadd0da..f57968be 100644 --- a/docs/docs/03-configuration.md +++ b/docs/docs/03-configuration.md @@ -2,28 +2,28 @@ The app is mainly configured by environment variables. All the used environment variables are listed in [packages/shared/config.ts](https://github.com/karakeep-app/karakeep/blob/main/packages/shared/config.ts). The most important ones are: -| Name | Required | Default | Description | -| ------------------------------- | ------------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| PORT | No | 3000 | The port on which the web server will listen. DON'T CHANGE THIS IF YOU'RE USING DOCKER, instead changed the docker bound external port. | -| WORKERS_PORT | No | 0 (Random Port) | The port on which the worker will export its prometheus metrics on `/metrics`. By default it's a random unused port. If you want to utilize those metrics, fix the port to a value (and export it in docker if you're using docker). | -| WORKERS_HOST | No | 127.0.0.1 | Host to listen to for requests to WORKERS_PORT. You will need to set this if running in a container, since localhost will not be reachable from outside | -| WORKERS_ENABLED_WORKERS | No | Not set | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,tidyAssets,video,feed,assetPreprocessing,webhook,ruleEngine. | -| WORKERS_DISABLED_WORKERS | No | Not set | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`. | -| DATA_DIR | Yes | Not set | The path for the persistent data directory. This is where the db lives. Assets are stored here by default unless `ASSETS_DIR` is set. | -| ASSETS_DIR | No | Not set | The path where crawled assets will be stored. If not set, defaults to `${DATA_DIR}/assets`. | -| NEXTAUTH_URL | Yes | Not set | Should point to the address of your server. The app will function without it, but will redirect you to wrong addresses on signout for example. | -| NEXTAUTH_SECRET | Yes | Not set | Random string used to sign the JWT tokens. Generate one with `openssl rand -base64 36`. | -| MEILI_ADDR | No | Not set | The address of meilisearch. If not set, Search will be disabled. E.g. (`http://meilisearch:7700`) | -| MEILI_MASTER_KEY | Only in Prod and if search is enabled | Not set | The master key configured for meilisearch. Not needed in development environment. Generate one with `openssl rand -base64 36 \| tr -dc 'A-Za-z0-9'` | -| MAX_ASSET_SIZE_MB | No | 50 | Sets the maximum allowed asset size (in MB) to be uploaded | -| DISABLE_NEW_RELEASE_CHECK | No | false | If set to true, latest release check will be disabled in the admin panel. | -| PROMETHEUS_AUTH_TOKEN | No | Random | Enable a prometheus metrics endpoint at `/api/metrics`. This endpoint will require this token being passed in the Authorization header as a Bearer token. If not set, a new random token is generated everytime at startup. This cannot contain any special characters or you may encounter a 400 Bad Request response. | -| RATE_LIMITING_ENABLED | No | false | If set to true, API rate limiting will be enabled. | -| DB_WAL_MODE | No | false | Enables WAL mode for the sqlite database. This should improve the performance of the database. There's no reason why you shouldn't set this to true unless you're running the db on a network attached drive. This will become the default at some time in the future. | -| SEARCH_NUM_WORKERS | No | 1 | Number of concurrent workers for search indexing tasks. Increase this if you have a high volume of content being indexed for search. | -| WEBHOOK_NUM_WORKERS | No | 1 | Number of concurrent workers for webhook delivery. Increase this if you have multiple webhook endpoints or high webhook traffic. | -| ASSET_PREPROCESSING_NUM_WORKERS | No | 1 | Number of concurrent workers for asset preprocessing tasks (image processing, OCR, etc.). Increase this if you have many images or documents that need processing. | -| RULE_ENGINE_NUM_WORKERS | No | 1 | Number of concurrent workers for rule engine processing. Increase this if you have complex automation rules that need to be processed quickly. | +| Name | Required | Default | Description | +| ------------------------------- | ------------------------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| PORT | No | 3000 | The port on which the web server will listen. DON'T CHANGE THIS IF YOU'RE USING DOCKER, instead changed the docker bound external port. | +| WORKERS_PORT | No | 0 (Random Port) | The port on which the worker will export its prometheus metrics on `/metrics`. By default it's a random unused port. If you want to utilize those metrics, fix the port to a value (and export it in docker if you're using docker). | +| WORKERS_HOST | No | 127.0.0.1 | Host to listen to for requests to WORKERS_PORT. You will need to set this if running in a container, since localhost will not be reachable from outside | +| WORKERS_ENABLED_WORKERS | No | Not set | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,tidyAssets,video,feed,assetPreprocessing,webhook,ruleEngine. | +| WORKERS_DISABLED_WORKERS | No | Not set | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`. | +| DATA_DIR | Yes | Not set | The path for the persistent data directory. This is where the db lives. Assets are stored here by default unless `ASSETS_DIR` is set. | +| ASSETS_DIR | No | Not set | The path where crawled assets will be stored. If not set, defaults to `${DATA_DIR}/assets`. | +| NEXTAUTH_URL | Yes | Not set | Should point to the address of your server. The app will function without it, but will redirect you to wrong addresses on signout for example. | +| NEXTAUTH_SECRET | Yes | Not set | Random string used to sign the JWT tokens. Generate one with `openssl rand -base64 36`. | +| MEILI_ADDR | No | Not set | The address of meilisearch. If not set, Search will be disabled. E.g. (`http://meilisearch:7700`) | +| MEILI_MASTER_KEY | Only in Prod and if search is enabled | Not set | The master key configured for meilisearch. Not needed in development environment. Generate one with `openssl rand -base64 36 \| tr -dc 'A-Za-z0-9'` | +| MAX_ASSET_SIZE_MB | No | 50 | Sets the maximum allowed asset size (in MB) to be uploaded | +| DISABLE_NEW_RELEASE_CHECK | No | false | If set to true, latest release check will be disabled in the admin panel. | +| PROMETHEUS_AUTH_TOKEN | No | Random | Enable a prometheus metrics endpoint at `/api/metrics`. This endpoint will require this token being passed in the Authorization header as a Bearer token. If not set, a new random token is generated everytime at startup. This cannot contain any special characters or you may encounter a 400 Bad Request response. | +| RATE_LIMITING_ENABLED | No | false | If set to true, API rate limiting will be enabled. | +| DB_WAL_MODE | No | false | Enables WAL mode for the sqlite database. This should improve the performance of the database. There's no reason why you shouldn't set this to true unless you're running the db on a network attached drive. This will become the default at some time in the future. | +| SEARCH_NUM_WORKERS | No | 1 | Number of concurrent workers for search indexing tasks. Increase this if you have a high volume of content being indexed for search. | +| WEBHOOK_NUM_WORKERS | No | 1 | Number of concurrent workers for webhook delivery. Increase this if you have multiple webhook endpoints or high webhook traffic. | +| ASSET_PREPROCESSING_NUM_WORKERS | No | 1 | Number of concurrent workers for asset preprocessing tasks (image processing, OCR, etc.). Increase this if you have many images or documents that need processing. | +| RULE_ENGINE_NUM_WORKERS | No | 1 | Number of concurrent workers for rule engine processing. Increase this if you have complex automation rules that need to be processed quickly. | ## Asset Storage @@ -139,6 +139,7 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin BROWSER_COOKIE_PATH specifies the path to a JSON file containing cookies to be loaded into the browser context for crawling. The JSON file must be an array of cookie objects, each with: + - name: The cookie name (required). - value: The cookie value (required). - Optional fields: domain, path, expires, httpOnly, secure, sameSite (values: "Strict", "Lax", or "None"). @@ -219,11 +220,11 @@ Karakeep can send emails for various purposes such as email verification during If your Karakeep instance needs to connect through a proxy server, you can configure the following settings: -| Name | Required | Default | Description | -| ------------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------- | -| CRAWLER_HTTP_PROXY | No | Not set | HTTP proxy server URL for outgoing HTTP requests (e.g., `http://proxy.example.com:8080`) | -| CRAWLER_HTTPS_PROXY | No | Not set | HTTPS proxy server URL for outgoing HTTPS requests (e.g., `http://proxy.example.com:8080`) | -| CRAWLER_NO_PROXY | No | Not set | Comma-separated list of hostnames/IPs that should bypass the proxy (e.g., `localhost,127.0.0.1,.local`) | +| Name | Required | Default | Description | +| ------------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| CRAWLER_HTTP_PROXY | No | Not set | HTTP proxy server URL for outgoing HTTP requests (e.g., `http://proxy.example.com:8080`). You can pass multiple comma separated proxies and the used one will be chosen at random. | +| CRAWLER_HTTPS_PROXY | No | Not set | HTTPS proxy server URL for outgoing HTTPS requests (e.g., `http://proxy.example.com:8080`). You can pass multiple comma separated proxies and the used one will be chosen at random. | +| CRAWLER_NO_PROXY | No | Not set | Comma-separated list of hostnames/IPs that should bypass the proxy (e.g., `localhost,127.0.0.1,.local`) | :::info These proxy settings will be used by the crawler and other components that make outgoing HTTP requests. -- cgit v1.2.3-70-g09d2