feat: Add cookie support for browser page access

* feat: Add cookie support for browser page access Implemented cookie functionality for browser page access, including BROWSER_COOKIE_PATH configuration to specify the cookies JSON file path. * fix the docs --------- Co-authored-by: lizz <lizong1204@gmail.com>
author: Mohamed Bassem <me@mbassem.com> 2025-09-07 15:47:38 +0100
committer: GitHub <noreply@github.com> 2025-09-07 15:47:38 +0100
commit: c57fd5137cc29870667777a371a4d1fcdf69436b (patch)
tree: 845bec5a60ee2b43fc33d653965a6571fa92d84b /docs
parent: 492b15203807b4ceb00af4b301958344cc5a668f (diff)
download: karakeep-c57fd5137cc29870667777a371a4d1fcdf69436b.tar.zst
1 files changed, 33 insertions, 2 deletions
diff --git a/docs/docs/03-configuration.md b/docs/docs/03-configuration.md
index aae1ffa3..0f61360f 100644
--- a/docs/docs/03-configuration.md
+++ b/docs/docs/03-configuration.md
@@ -6,8 +6,8 @@ The app is mainly configured by environment variables. All the used environment
 | ------------------------------- | ------------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | PORT                            | No                                    | 3000            | The port on which the web server will listen. DON'T CHANGE THIS IF YOU'RE USING DOCKER, instead changed the docker bound external port.                                                                                                                                |
 | WORKERS_PORT                    | No                                    | 0 (Random Port) | The port on which the worker will export its prometheus metrics on `/metrics`. By default it's a random unused port. If you want to utilize those metrics, fix the port to a value (and export it in docker if you're using docker).                                   |
-| WORKERS_ENABLED_WORKERS         | No                                    | Not set         | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,tidyAssets,video,feed,assetPreprocessing,webhook,ruleEngine. |
-| WORKERS_DISABLED_WORKERS        | No                                    | Not set         | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`. |
+| WORKERS_ENABLED_WORKERS         | No                                    | Not set         | Comma separated list of worker names to enable. If set, only these workers will run. Valid values: crawler,inference,search,tidyAssets,video,feed,assetPreprocessing,webhook,ruleEngine.                                                                               |
+| WORKERS_DISABLED_WORKERS        | No                                    | Not set         | Comma separated list of worker names to disable. Takes precedence over `WORKERS_ENABLED_WORKERS`.                                                                                                                                                                      |
 | DATA_DIR                        | Yes                                   | Not set         | The path for the persistent data directory. This is where the db lives. Assets are stored here by default unless `ASSETS_DIR` is set.                                                                                                                                  |
 | ASSETS_DIR                      | No                                    | Not set         | The path where crawled assets will be stored. If not set, defaults to `${DATA_DIR}/assets`.                                                                                                                                                                            |
 | NEXTAUTH_URL                    | Yes                                   | Not set         | Should point to the address of your server. The app will function without it, but will redirect you to wrong addresses on signout for example.                                                                                                                         |
@@ -129,6 +129,37 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin
 | CRAWLER_VIDEO_DOWNLOAD_TIMEOUT_SEC | No       | 600     | How long to wait for the video download to finish                                                                                                                                                                                                                                                                                                                             |
 | CRAWLER_ENABLE_ADBLOCKER           | No       | true    | Whether to enable an adblocker in the crawler or not. If you're facing troubles downloading the adblocking lists on worker startup, you can disable this.                                                                                                                                                                                                                     |
 | CRAWLER_YTDLP_ARGS                 | No       | []      | Include additional yt-dlp arguments to be passed at crawl time separated by %%: https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#general-options                                                                                                                                                                                                                           |
+| BROWSER_COOKIE_PATH                | No       | Not set | Path to a JSON file containing cookies to be loaded into the browser context. The file should be an array of cookie objects, each with name and value (required), and optional fields like domain, path, expires, httpOnly, secure, and sameSite (e.g., `[{"name": "session", "value": "xxx", "domain": ".example.com"}`]).                                                   |
+
+<details>
+
+  <summary>More info on BROWSER_COOKIE_PATH</summary>
+
+BROWSER_COOKIE_PATH specifies the path to a JSON file containing cookies to be loaded into the browser context for crawling.
+
+The JSON file must be an array of cookie objects, each with:
+- name: The cookie name (required).
+- value: The cookie value (required).
+- Optional fields: domain, path, expires, httpOnly, secure, sameSite (values: "Strict", "Lax", or "None").
+
+Example JSON file:
+
+```json
+[
+  {
+    "name": "session",
+    "value": "xxx",
+    "domain": ".example.com",
+    "path": "/",
+    "expires": 1735689600,
+    "httpOnly": true,
+    "secure": true,
+    "sameSite": "Lax"
+  }
+]
+```
+
+</details>
 
 ## OCR Configs
author	Mohamed Bassem <me@mbassem.com>	2025-09-07 15:47:38 +0100
committer	GitHub <noreply@github.com>	2025-09-07 15:47:38 +0100
commit	c57fd5137cc29870667777a371a4d1fcdf69436b (patch)
tree	845bec5a60ee2b43fc33d653965a6571fa92d84b /docs
parent	492b15203807b4ceb00af4b301958344cc5a668f (diff)
download	karakeep-c57fd5137cc29870667777a371a4d1fcdf69436b.tar.zst