aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/docs/03-configuration/01-environment-variables.md77
1 files changed, 44 insertions, 33 deletions
diff --git a/docs/docs/03-configuration/01-environment-variables.md b/docs/docs/03-configuration/01-environment-variables.md
index 8caef0df..5584e620 100644
--- a/docs/docs/03-configuration/01-environment-variables.md
+++ b/docs/docs/03-configuration/01-environment-variables.md
@@ -2,35 +2,34 @@
The app is mainly configured by environment variables. All the used environment variables are listed in [packages/shared/config.ts](https://github.com/karakeep-app/karakeep/blob/main/packages/shared/config.ts). The most important ones are:
-| Name | Required | Default | Description |
-| -------------------------------------- | ------------------------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| PORT | No | 3000 | The port on which the web server will listen. DON'T CHANGE THIS IF YOU'RE USING DOCKER, instead changed the docker bound external port. |
-| WORKERS_PORT | No | 0 (Random Port) | The port on which the worker will export its prometheus metrics on `/metrics`. By default it's a random unused port. If you want to utilize those metrics, fix the port to a value (and export it in docker if you're using docker). |
-| WORKERS_HOST | No | 127.0.0.1 | Host to listen to for requests to WORKERS_PORT. You will need to set this if running in a container, since localhost will not be reachable from outside |
-| WORKERS_ENABLED_WORKERS | No | Not set | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,adminMaintenance,video,feed,assetPreprocessing,webhook,ruleEngine. |
-| WORKERS_DISABLED_WORKERS | No | Not set | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`. |
-| LOG_LEVEL | No | debug | The application log level as defined in the [winston documentation](https://github.com/winstonjs/winston?tab=readme-ov-file#logging-levels). You may want to set this to `notice` or `warning` when running Karakeep in a production environment. |
-| DATA_DIR | Yes | Not set | The path for the persistent data directory. This is where the db lives. Assets are stored here by default unless `ASSETS_DIR` is set. |
-| ASSETS_DIR | No | Not set | The path where crawled assets will be stored. If not set, defaults to `${DATA_DIR}/assets`. |
-| NEXTAUTH_URL | Yes | Not set | Should point to the address of your server. The app will function without it, but will redirect you to wrong addresses on signout for example. |
-| NEXTAUTH_SECRET | Yes | Not set | Random string used to sign the JWT tokens. Generate one with `openssl rand -base64 36`. |
-| MEILI_ADDR | No | Not set | The address of meilisearch. If not set, Search will be disabled. E.g. (`http://meilisearch:7700`) |
-| MEILI_MASTER_KEY | Only in Prod and if search is enabled | Not set | The master key configured for meilisearch. Not needed in development environment. Generate one with `openssl rand -base64 36 \| tr -dc 'A-Za-z0-9'` |
-| MAX_ASSET_SIZE_MB | No | 50 | Sets the maximum allowed asset size (in MB) to be uploaded |
-| DISABLE_NEW_RELEASE_CHECK | No | false | If set to true, latest release check will be disabled in the admin panel. |
-| PROMETHEUS_AUTH_TOKEN | No | Random | Enable a prometheus metrics endpoint at `/api/metrics`. This endpoint will require this token being passed in the Authorization header as a Bearer token. If not set, a new random token is generated everytime at startup. This cannot contain any special characters or you may encounter a 400 Bad Request response. |
-| RATE_LIMITING_ENABLED | No | false | If set to true, API rate limiting will be enabled. |
-| CRAWLER_DOMAIN_RATE_LIMIT_WINDOW_MS | No | Not set | Time window in milliseconds for per-domain crawler rate limiting. |
-| CRAWLER_DOMAIN_RATE_LIMIT_MAX_REQUESTS | No | Not set | Maximum crawler requests allowed per domain inside the configured window. |
-| DB_WAL_MODE | No | false | Enables WAL mode for the sqlite database. This should improve the performance of the database. There's no reason why you shouldn't set this to true unless you're running the db on a network attached drive. This will become the default at some time in the future. |
-| SEARCH_NUM_WORKERS | No | 1 | Number of concurrent workers for search indexing tasks. Increase this if you have a high volume of content being indexed for search. |
-| SEARCH_JOB_TIMEOUT_SEC | No | 30 | How long to wait for a search indexing job to finish before timing out. Increase this if you have large bookmarks with extensive content that takes longer to index. |
-| WEBHOOK_NUM_WORKERS | No | 1 | Number of concurrent workers for webhook delivery. Increase this if you have multiple webhook endpoints or high webhook traffic. |
-| ASSET_PREPROCESSING_NUM_WORKERS | No | 1 | Number of concurrent workers for asset preprocessing tasks (image processing, OCR, etc.). Increase this if you have many images or documents that need processing. |
-| ASSET_PREPROCESSING_JOB_TIMEOUT_SEC | No | 60 | How long to wait for an asset preprocessing job to finish before timing out. Increase this if you have large images or PDFs that take longer to process. |
-| RULE_ENGINE_NUM_WORKERS | No | 1 | Number of concurrent workers for rule engine processing. Increase this if you have complex automation rules that need to be processed quickly. |
-| MAX_RSS_FEEDS_PER_USER | No | 1000 | The maximum number of RSS feeds a user can create. |
-| MAX_WEBHOOKS_PER_USER | No | 100 | The maximum number of webhooks a user can create. |
+| Name | Required | Default | Description |
+| -------------------------------------- | ------------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| PORT | No | 3000 | The port on which the web server will listen. DON'T CHANGE THIS IF YOU'RE USING DOCKER, instead changed the docker bound external port. |
+| WORKERS_PORT | No | 0 (Random Port) | The port on which the worker will export its prometheus metrics on `/metrics`. By default it's a random unused port. If you want to utilize those metrics, fix the port to a value (and export it in docker if you're using docker). |
+| WORKERS_HOST | No | 127.0.0.1 | Host to listen to for requests to WORKERS_PORT. You will need to set this if running in a container, since localhost will not be reachable from outside |
+| WORKERS_ENABLED_WORKERS | No | Not set | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,adminMaintenance,video,feed,assetPreprocessing,webhook,ruleEngine. |
+| WORKERS_DISABLED_WORKERS | No | Not set | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`. |
+| LOG_LEVEL | No | debug | The application log level as defined in the [winston documentation](https://github.com/winstonjs/winston?tab=readme-ov-file#logging-levels). You may want to set this to `notice` or `warning` when running Karakeep in a production environment. |
+| DATA_DIR | Yes | Not set | The path for the persistent data directory. This is where the db lives. Assets are stored here by default unless `ASSETS_DIR` is set. |
+| ASSETS_DIR | No | Not set | The path where crawled assets will be stored. If not set, defaults to `${DATA_DIR}/assets`. |
+| NEXTAUTH_URL | Yes | Not set | Should point to the address of your server. The app will function without it, but will redirect you to wrong addresses on signout for example. |
+| NEXTAUTH_SECRET | Yes | Not set | Random string used to sign the JWT tokens. Generate one with `openssl rand -base64 36`. |
+| MEILI_ADDR | No | Not set | The address of meilisearch. If not set, Search will be disabled. E.g. (`http://meilisearch:7700`) |
+| MEILI_MASTER_KEY | Only in Prod and if search is enabled | Not set | The master key configured for meilisearch. Not needed in development environment. Generate one with `openssl rand -base64 36 \| tr -dc 'A-Za-z0-9'` |
+| MAX_ASSET_SIZE_MB | No | 50 | Sets the maximum allowed asset size (in MB) to be uploaded |
+| DISABLE_NEW_RELEASE_CHECK | No | false | If set to true, latest release check will be disabled in the admin panel. |
+| RATE_LIMITING_ENABLED | No | false | If set to true, API rate limiting will be enabled. |
+| CRAWLER_DOMAIN_RATE_LIMIT_WINDOW_MS | No | Not set | Time window in milliseconds for per-domain crawler rate limiting. |
+| CRAWLER_DOMAIN_RATE_LIMIT_MAX_REQUESTS | No | Not set | Maximum crawler requests allowed per domain inside the configured window. |
+| DB_WAL_MODE | No | false | Enables WAL mode for the sqlite database. This should improve the performance of the database. There's no reason why you shouldn't set this to true unless you're running the db on a network attached drive. This will become the default at some time in the future. |
+| SEARCH_NUM_WORKERS | No | 1 | Number of concurrent workers for search indexing tasks. Increase this if you have a high volume of content being indexed for search. |
+| SEARCH_JOB_TIMEOUT_SEC | No | 30 | How long to wait for a search indexing job to finish before timing out. Increase this if you have large bookmarks with extensive content that takes longer to index. |
+| WEBHOOK_NUM_WORKERS | No | 1 | Number of concurrent workers for webhook delivery. Increase this if you have multiple webhook endpoints or high webhook traffic. |
+| ASSET_PREPROCESSING_NUM_WORKERS | No | 1 | Number of concurrent workers for asset preprocessing tasks (image processing, OCR, etc.). Increase this if you have many images or documents that need processing. |
+| ASSET_PREPROCESSING_JOB_TIMEOUT_SEC | No | 60 | How long to wait for an asset preprocessing job to finish before timing out. Increase this if you have large images or PDFs that take longer to process. |
+| RULE_ENGINE_NUM_WORKERS | No | 1 | Number of concurrent workers for rule engine processing. Increase this if you have complex automation rules that need to be processed quickly. |
+| MAX_RSS_FEEDS_PER_USER | No | 1000 | The maximum number of RSS feeds a user can create. |
+| MAX_WEBHOOKS_PER_USER | No | 100 | The maximum number of webhooks a user can create. |
## Asset Storage
@@ -93,7 +92,7 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin
| Name | Required | Default | Description |
| ------------------------------------ | -------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| OPENAI_API_KEY | No | Not set | The OpenAI key used for automatic tagging. More on that in [here](../integrations/openai). |
+| OPENAI_API_KEY | No | Not set | The OpenAI key used for automatic tagging. More on that in [here](../integrations/openai). |
| OPENAI_BASE_URL | No | Not set | If you just want to use OpenAI you don't need to pass this variable. If, however, you want to use some other openai compatible API (e.g. azure openai service), set this to the url of the API. |
| OPENAI_PROXY_URL | No | Not set | HTTP proxy server URL for OpenAI API requests (e.g., `http://proxy.example.com:8080`). |
| OLLAMA_BASE_URL | No | Not set | If you want to use ollama for local inference, set the address of ollama API here. |
@@ -131,7 +130,7 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin
| CRAWLER_STORE_SCREENSHOT | No | true | Whether to store a screenshot from the crawled website or not. Screenshots act as a fallback for when we fail to extract an image from a website. You can also view the stored screenshots for any link. |
| CRAWLER_FULL_PAGE_SCREENSHOT | No | false | Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page |
| CRAWLER_SCREENSHOT_TIMEOUT_SEC | No | 5 | How long to wait for the screenshot finish before timing out. If you are capturing full-page screenshots of long webpages, consider increasing this value. |
-| CRAWLER_STORE_PDF | No | false | Whether to store a PDF snapshot of the crawled page. Disabled by default, as it can lead to much higher disk usage. When enabled, a PDF version of each crawled page will be captured and stored as an asset, which can be viewed in the bookmark preview. |
+| CRAWLER_STORE_PDF | No | false | Whether to store a PDF snapshot of the crawled page. Disabled by default, as it can lead to much higher disk usage. When enabled, a PDF version of each crawled page will be captured and stored as an asset, which can be viewed in the bookmark preview. |
| CRAWLER_FULL_PAGE_ARCHIVE | No | false | Whether to store a full local copy of the page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, only the readable text of the page is archived. |
| CRAWLER_JOB_TIMEOUT_SEC | No | 60 | How long to wait for the crawler job to finish before timing out. If you have a slow internet connection or a low powered device, you might want to bump this up a bit |
| CRAWLER_NAVIGATE_TIMEOUT_SEC | No | 30 | How long to spend navigating to the page (along with its redirects). Increase this if you have a slow internet connection |
@@ -188,8 +187,8 @@ Karakeep uses [tesseract.js](https://github.com/naptha/tesseract.js) to extract
You can use webhooks to trigger actions when bookmarks are created, changed or crawled.
-| Name | Required | Default | Description |
-| ------------------- | -------- | ------- | -------------------------------------------------- |
+| Name | Required | Default | Description |
+| ------------------- | -------- | ------- | ------------------------------------------------- |
| WEBHOOK_TIMEOUT_SEC | No | 5 | The timeout for the webhook request in seconds. |
| WEBHOOK_RETRY_TIMES | No | 3 | The number of times to retry the webhook request. |
@@ -241,3 +240,15 @@ If your Karakeep instance needs to connect through a proxy server, you can confi
:::info
These proxy settings will be used by the crawler and other components that make outgoing HTTP requests.
:::
+
+## Monitoring
+
+Karakeep supports distributed tracing via OpenTelemetry. When enabled, traces are collected for tRPC API calls, background worker operations, and other key workflows. Karakeep also exports prometheus-based metrics.
+
+| Name | Required | Default | Description |
+| --------------------------- | -------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| OTEL_TRACING_ENABLED | No | false | Set to `true` to enable OpenTelemetry tracing. When disabled, all tracing operations are no-ops. |
+| OTEL_EXPORTER_OTLP_ENDPOINT | No | Not set | The OTLP HTTP endpoint to send traces to (e.g., `http://jaeger:4318/v1/traces` or `http://otel-collector:4318/v1/traces`). If not set, traces are logged to the console. |
+| OTEL_SERVICE_NAME | No | karakeep | The service name that will appear in your tracing backend. The actual service name will include a suffix (e.g., `karakeep-api`, `karakeep-workers`). |
+| OTEL_SAMPLE_RATE | No | 1.0 | The sampling rate for traces, between 0.0 and 1.0. A value of 1.0 means all traces are sampled, while 0.1 means only 10% of traces are sampled. Lower values reduce overhead and storage costs in production. |
+| PROMETHEUS_AUTH_TOKEN | No | Random | Enable a prometheus metrics endpoint at `/api/metrics`. This endpoint will require this token being passed in the Authorization header as a Bearer token. If not set, a new random token is generated everytime at startup. This cannot contain any special characters or you may encounter a 400 Bad Request response. |