feat: Add LLM-based OCR as alternative to Tesseract (#2442)

* feat(ocr): add LLM-based OCR support alongside Tesseract Add support for using configured LLM inference providers (OpenAI or Ollama) for OCR text extraction from images as an alternative to Tesseract. Changes: - Add OCR_USE_LLM environment variable flag (default: false) - Add buildOCRPrompt function for LLM-based text extraction - Add readImageTextWithLLM function in asset preprocessing worker - Update extractAndSaveImageText to route between Tesseract and LLM OCR - Update documentation with the new configuration option When OCR_USE_LLM is enabled, the system uses the configured inference model to extract text from images. If no inference provider is configured, it falls back to Tesseract. https://claude.ai/code/session_01Y7h7kDAmqXKXEWDmWbVkDs * format --------- Co-authored-by: Claude <noreply@anthropic.com>
author: Mohamed Bassem <me@mbassem.com> 2026-02-01 22:57:11 +0000
committer: GitHub <noreply@github.com> 2026-02-01 22:57:11 +0000
commit: 3fcccb858ee3ef22fe9ce479af4ce458ac9a0fe1 (patch)
tree: 0d6ae299126a581f0ccc58afa89b2dd16a9a0925 /docs
parent: 54243b8cc5ccd76fe23821f6e159b954a2166578 (diff)
download: karakeep-3fcccb858ee3ef22fe9ce479af4ce458ac9a0fe1.tar.zst
1 files changed, 2 insertions, 1 deletions
diff --git a/docs/docs/03-configuration/01-environment-variables.md b/docs/docs/03-configuration/01-environment-variables.md
index 7a896fe4..dedc3406 100644
--- a/docs/docs/03-configuration/01-environment-variables.md
+++ b/docs/docs/03-configuration/01-environment-variables.md
@@ -176,13 +176,14 @@ Example JSON file:
 
 ## OCR Configs
 
-Karakeep uses [tesseract.js](https://github.com/naptha/tesseract.js) to extract text from images.
+Karakeep uses [tesseract.js](https://github.com/naptha/tesseract.js) to extract text from images by default. Alternatively, you can use an LLM-based OCR by enabling the `OCR_USE_LLM` flag. LLM-based OCR uses the configured inference model (OpenAI or Ollama) to extract text from images, which can provide better results for complex images but requires a configured inference provider.
 
 | Name                     | Required | Default   | Description                                                                                                                                                                                                                               |
 | ------------------------ | -------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | OCR_CACHE_DIR            | No       | $TEMP_DIR | The dir where tesseract will download its models. By default, those models are not persisted and stored in the OS' temp dir.                                                                                                              |
 | OCR_LANGS                | No       | eng       | Comma separated list of the language codes that you want tesseract to support. You can find the language codes [here](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html). Set to empty string to disable OCR. |
 | OCR_CONFIDENCE_THRESHOLD | No       | 50        | A number between 0 and 100 indicating the minimum acceptable confidence from tessaract. If tessaract's confidence is lower than this value, extracted text won't be stored.                                                               |
+| OCR_USE_LLM              | No       | false     | If set to true, uses the configured inference model (OpenAI or Ollama) for OCR instead of Tesseract. This can provide better results for complex images but requires a configured inference provider (`OPENAI_API_KEY` or `OLLAMA_BASE_URL`). Falls back to Tesseract if no inference provider is configured. |
 
 ## Webhook Configs
author	Mohamed Bassem <me@mbassem.com>	2026-02-01 22:57:11 +0000
committer	GitHub <noreply@github.com>	2026-02-01 22:57:11 +0000
commit	3fcccb858ee3ef22fe9ce479af4ce458ac9a0fe1 (patch)
tree	0d6ae299126a581f0ccc58afa89b2dd16a9a0925 /docs
parent	54243b8cc5ccd76fe23821f6e159b954a2166578 (diff)
download	karakeep-3fcccb858ee3ef22fe9ce479af4ce458ac9a0fe1.tar.zst