From 9198c1b7e15c79a9b0452e8c2a6b702df6a37b60 Mon Sep 17 00:00:00 2001 From: MohamedBassem Date: Sun, 26 May 2024 10:14:42 +0000 Subject: docs: Document the new CRAWLER_FULL_PAGE_ARCHIVE flag --- docs/docs/03-configuration.md | 1 + 1 file changed, 1 insertion(+) (limited to 'docs') diff --git a/docs/docs/03-configuration.md b/docs/docs/03-configuration.md index fc9e70db..277d182e 100644 --- a/docs/docs/03-configuration.md +++ b/docs/docs/03-configuration.md @@ -47,5 +47,6 @@ Either `OPENAI_API_KEY` or `OLLAMA_BASE_URL` need to be set for automatic taggin | CRAWLER_DOWNLOAD_BANNER_IMAGE | No | true | Whether to cache the banner image used in the cards locally or fetch it each time directly from the website. Caching it consumes more storage space, but is more resilient against link rot and rate limits from websites. | | CRAWLER_STORE_SCREENSHOT | No | true | Whether to store a screenshot from the crawled website or not. Screenshots act as a fallback for when we fail to extract an image from a website. You can also view the stored screenshots for any link. | | CRAWLER_FULL_PAGE_SCREENSHOT | No | false | Whether to store a screenshot of the full page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, the screenshot will only include the visible part of the page | +| CRAWLER_FULL_PAGE_ARCHIVE | No | false | Whether to store a full local copy of the page or not. Disabled by default, as it can lead to much higher disk usage. If disabled, only the readable text of the page is archived. | | CRAWLER_JOB_TIMEOUT_SEC | No | 60 | How long to wait for the crawler job to finish before timing out. If you have a slow internet connection or a low powered device, you might want to bump this up a bit | | CRAWLER_NAVIGATE_TIMEOUT_SEC | No | 30 | How long to spend navigating to the page (along with its redirects). Increase this if you have a slow internet connection | -- cgit v1.2.3-70-g09d2