aboutsummaryrefslogtreecommitdiffstats
path: root/apps/workers (follow)
Commit message (Collapse)AuthorAgeFilesLines
* fix: lazy load js-tiktoken in prompts module (#2176)Mohamed Bassem2025-11-282-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: lazy load tiktoken to reduce memory footprint The js-tiktoken module loads a large encoding dictionary into memory immediately on import. This change defers the loading of the encoding until it's actually needed by using a lazy getter pattern. This reduces memory usage for processes that import this module but don't actually use the token encoding functions. * fix: use createRequire for lazy tiktoken import in ES module The previous implementation used bare require() which fails at runtime in ES modules (ReferenceError: require is not defined). This fixes it by using createRequire from Node's 'module' package, which creates a require function that works in ES module contexts. * refactor: convert tiktoken lazy loading to async dynamic imports Changed from createRequire to async import() for lazy loading tiktoken, making buildTextPrompt and buildSummaryPrompt async. This is cleaner for ES modules and properly defers the large tiktoken encoding data until it's actually needed. Updated all callers to await these async functions: - packages/trpc/routers/bookmarks.ts - apps/workers/workers/inference/tagging.ts - apps/workers/workers/inference/summarize.ts - apps/web/components/settings/AISettings.tsx (converted to useEffect) * feat: add untruncated prompt builders for UI previews Added buildTextPromptUntruncated and buildSummaryPromptUntruncated functions that don't require token counting or truncation. These are synchronous and don't load tiktoken, making them perfect for UI previews where exact token limits aren't needed. Updated AISettings.tsx to use these untruncated versions, eliminating the need for useEffect/useState and avoiding unnecessary tiktoken loading in the browser. * fix * fix --------- Co-authored-by: Claude <noreply@anthropic.com>
* fix: Propagate group ids in queue calls (#2177)Mohamed Bassem2025-11-275-4/+18
| | | | | * fix: Propagate group ids * fix tests
* fix: add a way to allowlist all domains from ip validationMohamed Bassem2025-11-221-0/+4
|
* deps: upgrade hono and playwrightMohamed Bassem2025-11-161-2/+2
|
* deps: Upgrade typescript to 5.9Mohamed Bassem2025-11-161-1/+1
|
* feat: add Prometheus counter for HTTP status codes (#2117)Mohamed Bassem2025-11-152-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | * feat: add Prometheus counter for crawler status codes Add a new Prometheus metric to track HTTP status codes encountered during crawling operations. This helps monitor crawler health and identify patterns in response codes (e.g., 200 OK, 404 Not Found, etc.). Changes: - Add crawlerStatusCodeCounter in metrics.ts with status_code label - Instrument crawlerWorker.ts to track status codes after page crawling - Counter increments for each crawl with the corresponding HTTP status code The metric is exposed at the /metrics endpoint and follows the naming convention: karakeep_crawler_status_codes_total * fix: update counter name to follow Prometheus conventions Change metric name from "karakeep_crawler_status_codes" to "karakeep_crawler_status_codes_total" to comply with Prometheus naming best practices for counter metrics. --------- Co-authored-by: Claude <noreply@anthropic.com>
* feat: correct default prom metrics from web and worker containersMohamed Bassem2025-11-101-0/+1
|
* fix: fix crash in crawler on invalid URL in matchesNoProxyMohamed Bassem2025-11-101-3/+9
|
* feat: add crawler domain rate limiting (#2115)Mohamed Bassem2025-11-091-4/+80
|
* refactor: Allow runner functions to return results to onCompleteMohamed Bassem2025-11-091-1/+1
|
* feat: add failed_permanent metric for worker monitoring (#2107)Mohamed Bassem2025-11-099-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: add last failure timestamp metric for worker monitoring Add a Prometheus Gauge metric to track the timestamp of the last failure for each worker. This complements the existing failed job counter by providing visibility into when failures last occurred for monitoring and alerting purposes. Changes: - Added workerLastFailureGauge metric in metrics.ts - Updated all 9 workers to set the gauge on failure: - crawler, feed, webhook, assetPreProcessing - inference, adminMaintenance, ruleEngine - video, search * refactor: track both all failures and permanent failures with counter Remove the gauge metric and use the existing counter to track both: - All failures (including retry attempts): status="failed" - Permanent failures (retries exhausted): status="failed_permanent" This provides better visibility into retry behavior and permanent vs temporary failures without adding a separate metric. Changes: - Removed workerLastFailureGauge from metrics.ts - Updated all 9 workers to track failed_permanent when numRetriesLeft == 0 - Maintained existing failed counter for all failure attempts * style: format worker files with prettier --------- Co-authored-by: Claude <noreply@anthropic.com>
* fix: metascraper logo to go through proxy if one configured. fixes #1863Mohamed Bassem2025-11-031-1/+14
|
* fix: fix monolith to respect crawler proxyMohamed Bassem2025-11-021-0/+9
|
* feat(rss): Add import tags from RSS feed categories (#2031)Mohamed Bassem2025-11-021-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat(feeds): Add import tags from RSS feed categories - Add importTags boolean field to rssFeedsTable schema (default: false) - Create database migration 0063_add_import_tags_to_feeds.sql - Update zod schemas (zFeedSchema, zNewFeedSchema, zUpdateFeedSchema) to include importTags - Update Feed model to handle importTags in create and update methods - Update feedWorker to: - Read title and categories from RSS parser - Attach categories as tags to bookmarks when importTags is enabled - Log warnings if tag attachment fails Resolves #1996 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Mohamed Bassem <MohamedBassem@users.noreply.github.com> * feat(web): Add importTags option to feed settings UI - Add importTags toggle to FeedsEditorDialog (create feed) - Add importTags toggle to EditFeedDialog (edit feed) - Display as a bordered switch control with descriptive text - Defaults to false for new feeds Co-authored-by: Mohamed Bassem <MohamedBassem@users.noreply.github.com> * fix migration * remove extra migration --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Mohamed Bassem <MohamedBassem@users.noreply.github.com>
* feat: Make search job timeout configurableMohamed Bassem2025-11-021-1/+1
|
* fix: Stricter SSRF validation (#2082)Mohamed Bassem2025-11-027-98/+507
| | | | | | | | | | | | | | | | | | | * fix: Stricter SSRF validation * skip dns resolution if running in proxy context * more fixes * Add LRU cache * change the env variable for internal hostnames * make dns resolution timeout configerable * upgrade ipaddr * handle ipv6 * handle proxy bypass for request interceptor
* fix: More memory optimizations for crawler worker. #1748Mohamed Bassem2025-10-261-26/+43
|
* fix: fix screenshot filepath in crawlerMohamed Bassem2025-10-261-1/+1
|
* fix: Respect abort signal in admin maintenance jobsMohamed Bassem2025-10-263-1/+11
|
* deps: Upgrade metascraper pluginsMohamed Bassem2025-10-261-11/+11
|
* deps: Upgrade metascraper-readability 5.49.6Mohamed Bassem2025-10-261-1/+1
|
* feat: Allow configuring inline asset size thresholdMohamed Bassem2025-10-263-7/+5
|
* feat: Add admin maintenance job to migrate large inline HTML (#2071)Mohamed Bassem2025-10-264-4/+200
| | | | | | | * Add admin maintenance job to migrate large inline HTML * add cursor * more fixes
* refactor: generalize tidy assets queue into admin maintenance (#2059)Mohamed Bassem2025-10-264-112/+154
| | | | | * refactor: generalize admin maintenance queue * more fixes
* feat: Add source field to track bookmark creation sources (#2037)Mohamed Bassem2025-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: Add source field to track bookmark creation sources Add a new 'source' field to the bookmarks table to track where bookmarks were created from. Possible values: api, web, cli, mobile, singlefile, rss. Changes: - Add source field to bookmarks table schema - Update Zod schemas to include source field - Update tRPC createBookmark procedure to store source - Update all callsites to pass appropriate source value: - api: Default to "api" if not provided - singlefile: Set to "singlefile" - rss: Set to "rss" in feedWorker - cli: Set to "cli" - mobile: Set to "mobile" in all mobile app bookmark creation - browser-extension: Set to "web" - web: Set to "web" in all web app bookmark creation - Create migration file for database schema change Fixes #2036 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Mohamed Bassem <MohamedBassem@users.noreply.github.com> * feat: Add extension source type for browser extension - Add 'extension' to bookmark source enum - Update browser extension to use 'extension' instead of 'web' Co-authored-by: Mohamed Bassem <MohamedBassem@users.noreply.github.com> * fix CI * fix CI * fix the migration file * add import source * make source nullish --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Mohamed Bassem <MohamedBassem@users.noreply.github.com>
* feat: support passing multiple proxy values (#2039)Mohamed Bassem2025-10-122-7/+14
| | | | | | | * feat: support passing multiple proxy values * fix typo * trim and filter
* fix: round feed refresh hour for idempotency (#2013)Mohamed Bassem2025-10-061-1/+6
|
* feat: Restate-based queue plugin (#2011)Mohamed Bassem2025-10-051-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WIP: Initial restate integration * add retry * add delay + idempotency * implement concurrency limits * add admin stats * add todos * add id provider * handle onComplete failures * add tests * add pub key and fix logging * add priorities * fail call after retries * more fixes * fix retries left * some refactoring * fix package.json * upgrade sdk * some test cleanups
* feat: use jpegs for screenshots instead of pngsMohamed Bassem2025-09-281-2/+3
|
* feat: Stop downloading video/audio in playwrightMohamed Bassem2025-09-281-0/+19
|
* fix: Abort dangling processing when crawler is aborted (#1988)Mohamed Bassem2025-09-281-27/+98
| | | | | | | | | | | * fix: Abort dangling processing when crawler is aborted * comments * report the size * handle unhandleded rejection * drop promisify
* fix: Cleanup temp assets on monolith timeoutMohamed Bassem2025-09-281-1/+17
|
* feat: Add tag search and pagination (#1987)Mohamed Bassem2025-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: Add tag search and use in the homepage * use paginated query in the all tags view * wire the load more buttons * add skeleton to all tags page * fix attachedby aggregation * fix loading states * fix hasNextPage * use action buttons for load more buttons * migrate the tags auto complete to the search api * Migrate the tags editor to the new search API * Replace tag merging dialog with tag auto completion * Merge both search and list APIs * fix tags.list * add some tests for the endpoint * add relevance based sorting * change cursor * update the REST API * fix review comments * more fixes * fix lockfile * i18n * fix visible tags
* fix: fix bundling liteque in the workersMohamed Bassem2025-09-142-0/+2
|
* refactor: Move callsites to liteque to be behind a pluginMohamed Bassem2025-09-1413-123/+134
|
* feat: Add cookie support for browser page accessMohamed Bassem2025-09-071-0/+59
| | | | | | | | | | | * feat: Add cookie support for browser page access Implemented cookie functionality for browser page access, including BROWSER_COOKIE_PATH configuration to specify the cookies JSON file path. * fix the docs --------- Co-authored-by: lizz <lizong1204@gmail.com>
* feat(workers): add worker enable/disable lists (#1885)Mohamed Bassem2025-09-071-44/+49
|
* fix: fix assets being marked as pending summarizationMohamed Bassem2025-09-071-0/+7
|
* feat: add gif asset type support (#1876)Drashi2025-09-071-2/+8
| | | | | | | | | * feat: add gif asset type support * skip inference for gis --------- Co-authored-by: Mohamed Bassem <me@mbassem.com>
* fix: don't mark inferenace job as failed when there's no content. fixes #1666Mohamed Bassem2025-09-072-7/+32
|
* fix: fix pdf detection when the header contains charset. fix: #1677Mohamed Bassem2025-09-071-2/+16
|
* fix: Fix feed worker to fetch feeds with proxyMohamed Bassem2025-09-063-50/+58
|
* fix: Change the inferance working logging when disabled to be a debug log levelMohamed Bassem2025-09-062-2/+2
|
* fix: Dont attempt to fetch rss if the user if out of quotaMohamed Bassem2025-09-061-0/+13
|
* refactor: Extract quota logic into its own classMohamed Bassem2025-09-063-15/+13
|
* fix: Reduce polling interval on meilisearch tasksMohamed Bassem2025-09-061-1/+1
|
* fix: Don't enqueue video tasks when video downlaod is disabledMohamed Bassem2025-09-061-8/+10
|
* fix: fix long worker log lines when downloading base64 imagesMohamed Bassem2025-08-301-1/+3
|
* fix: Respect wal mode for the queue dbMohamed Bassem2025-08-301-1/+1
|
* fix: dangling assets created by changing crawling configMohamedBassem2025-08-221-5/+6
|