From 5576361a1afa280abb256cafe17b7a140ee42adf Mon Sep 17 00:00:00 2001 From: Mohamed Bassem Date: Sat, 5 Jul 2025 19:57:01 +0000 Subject: feat(workers): Allow custmoizing max parallelism for a bunch of workers. Fixes #724 --- docs/docs/03-configuration.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'docs') diff --git a/docs/docs/03-configuration.md b/docs/docs/03-configuration.md index 156632d7..d8981843 100644 --- a/docs/docs/03-configuration.md +++ b/docs/docs/03-configuration.md @@ -117,6 +117,18 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin | CRAWLER_ENABLE_ADBLOCKER | No | true | Whether to enable an adblocker in the crawler or not. If you're facing troubles downloading the adblocking lists on worker startup, you can disable this. | | CRAWLER_YTDLP_ARGS | No | [] | Include additional yt-dlp arguments to be passed at crawl time separated by %%: https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#general-options | +## Worker Concurrency Configs + +These settings control the number of concurrent workers for different background processing tasks. Increasing these values can improve throughput but will consume more system resources. + +| Name | Required | Default | Description | +| ------------------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| INFERENCE_NUM_WORKERS | No | 1 | Number of concurrent workers for AI inference tasks (tagging and summarization). Increase this if you have multiple AI inference requests and want to process them in parallel. | +| SEARCH_NUM_WORKERS | No | 1 | Number of concurrent workers for search indexing tasks. Increase this if you have a high volume of content being indexed for search. | +| WEBHOOK_NUM_WORKERS | No | 1 | Number of concurrent workers for webhook delivery. Increase this if you have multiple webhook endpoints or high webhook traffic. | +| ASSET_PREPROCESSING_NUM_WORKERS | No | 1 | Number of concurrent workers for asset preprocessing tasks (image processing, OCR, etc.). Increase this if you have many images or documents that need processing. | +| RULE_ENGINE_NUM_WORKERS | No | 1 | Number of concurrent workers for rule engine processing. Increase this if you have complex automation rules that need to be processed quickly. | + ## OCR Configs Karakeep uses [tesseract.js](https://github.com/naptha/tesseract.js) to extract text from images. -- cgit v1.2.3-70-g09d2