aboutsummaryrefslogtreecommitdiffstats
path: root/apps/workers (follow)
Commit message (Collapse)AuthorAgeFilesLines
* fix: round feed refresh hour for idempotency (#2013)Mohamed Bassem2025-10-061-1/+6
|
* feat: Restate-based queue plugin (#2011)Mohamed Bassem2025-10-051-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WIP: Initial restate integration * add retry * add delay + idempotency * implement concurrency limits * add admin stats * add todos * add id provider * handle onComplete failures * add tests * add pub key and fix logging * add priorities * fail call after retries * more fixes * fix retries left * some refactoring * fix package.json * upgrade sdk * some test cleanups
* feat: use jpegs for screenshots instead of pngsMohamed Bassem2025-09-281-2/+3
|
* feat: Stop downloading video/audio in playwrightMohamed Bassem2025-09-281-0/+19
|
* fix: Abort dangling processing when crawler is aborted (#1988)Mohamed Bassem2025-09-281-27/+98
| | | | | | | | | | | * fix: Abort dangling processing when crawler is aborted * comments * report the size * handle unhandleded rejection * drop promisify
* fix: Cleanup temp assets on monolith timeoutMohamed Bassem2025-09-281-1/+17
|
* feat: Add tag search and pagination (#1987)Mohamed Bassem2025-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: Add tag search and use in the homepage * use paginated query in the all tags view * wire the load more buttons * add skeleton to all tags page * fix attachedby aggregation * fix loading states * fix hasNextPage * use action buttons for load more buttons * migrate the tags auto complete to the search api * Migrate the tags editor to the new search API * Replace tag merging dialog with tag auto completion * Merge both search and list APIs * fix tags.list * add some tests for the endpoint * add relevance based sorting * change cursor * update the REST API * fix review comments * more fixes * fix lockfile * i18n * fix visible tags
* fix: fix bundling liteque in the workersMohamed Bassem2025-09-142-0/+2
|
* refactor: Move callsites to liteque to be behind a pluginMohamed Bassem2025-09-1413-123/+134
|
* feat: Add cookie support for browser page accessMohamed Bassem2025-09-071-0/+59
| | | | | | | | | | | * feat: Add cookie support for browser page access Implemented cookie functionality for browser page access, including BROWSER_COOKIE_PATH configuration to specify the cookies JSON file path. * fix the docs --------- Co-authored-by: lizz <lizong1204@gmail.com>
* feat(workers): add worker enable/disable lists (#1885)Mohamed Bassem2025-09-071-44/+49
|
* fix: fix assets being marked as pending summarizationMohamed Bassem2025-09-071-0/+7
|
* feat: add gif asset type support (#1876)Drashi2025-09-071-2/+8
| | | | | | | | | * feat: add gif asset type support * skip inference for gis --------- Co-authored-by: Mohamed Bassem <me@mbassem.com>
* fix: don't mark inferenace job as failed when there's no content. fixes #1666Mohamed Bassem2025-09-072-7/+32
|
* fix: fix pdf detection when the header contains charset. fix: #1677Mohamed Bassem2025-09-071-2/+16
|
* fix: Fix feed worker to fetch feeds with proxyMohamed Bassem2025-09-063-50/+58
|
* fix: Change the inferance working logging when disabled to be a debug log levelMohamed Bassem2025-09-062-2/+2
|
* fix: Dont attempt to fetch rss if the user if out of quotaMohamed Bassem2025-09-061-0/+13
|
* refactor: Extract quota logic into its own classMohamed Bassem2025-09-063-15/+13
|
* fix: Reduce polling interval on meilisearch tasksMohamed Bassem2025-09-061-1/+1
|
* fix: Don't enqueue video tasks when video downlaod is disabledMohamed Bassem2025-09-061-8/+10
|
* fix: fix long worker log lines when downloading base64 imagesMohamed Bassem2025-08-301-1/+3
|
* fix: Respect wal mode for the queue dbMohamed Bassem2025-08-301-1/+1
|
* fix: dangling assets created by changing crawling configMohamedBassem2025-08-221-5/+6
|
* fix(workers): Drop the withTimeout wrappersMohamedBassem2025-08-222-10/+2
|
* feat: Export prometheus metrics from the workersMohamedBassem2025-08-2214-5/+111
|
* refactor: Refactor crawlerWorker to use tryCatchMohamedBassem2025-07-271-123/+117
|
* refactor: Extract meilisearch as a pluginMohamedBassem2025-07-273-61/+45
|
* chore: More turbo fixesMohamedBassem2025-07-271-2/+2
|
* fix: Ensure that all packages are ESM packagesMohamedBassem2025-07-271-0/+1
|
* deps: Upgrade viteMohamed Bassem2025-07-261-1/+1
|
* fix: Run workers in prod without tsx. Fixes #1673Mohamed Bassem2025-07-192-2/+26
|
* feat: Allow setting browserless crawling per userMohamed Bassem2025-07-191-1/+19
|
* Revert "fix: Fix the types of the bookmark types in the db query"Mohamed Bassem2025-07-132-21/+1
| | | | This reverts commit 4ba3e8047a5b1f160169617187436c09e91662ec.
* fix: Fix the types of the bookmark types in the db queryMohamed Bassem2025-07-132-1/+21
|
* feat: Add proper proxy support. fixes #1265Mohamed Bassem2025-07-132-9/+87
|
* deps: Upgrade typescript to 5.8Mohamed Bassem2025-07-121-1/+1
|
* deps: Upgrade drizzleMohamed Bassem2025-07-121-1/+1
|
* fix: Prioritize crawling user added links over bulk imports. fixes #1717Mohamed Bassem2025-07-125-24/+55
|
* fix: Fix search indexing after content splitMohamed Bassem2025-07-061-7/+4
|
* feat: Store large html content in the asset dbMohamed Bassem2025-07-065-9/+135
|
* feat: Add per user storage quotaMohamed Bassem2025-07-064-75/+183
|
* feat(workers): Allow custmoizing max parallelism for a bunch of workers. ↵Mohamed Bassem2025-07-055-5/+7
| | | | Fixes #724
* fix(workers): A more lenient JSON parsing for LLM responses. Fixes #1267Mohamed Bassem2025-07-041-1/+39
|
* fix(workers): Disable the metascraper readability as it's causing slowness ↵Mohamed Bassem2025-06-221-2/+0
| | | | in worker
* fix(workers): Fix jsdom console logs leaking into worker logsMohamed Bassem2025-06-221-2/+3
|
* feat(workers): adding a local metascraper plugin for Reddit posts (#1302)David Woods2025-06-223-13/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * chore: metascraper 5.x comes with its own types, including @types/metascraper is now redundant; also updating to latest versions of metascraper libraries * feat (workers): creating a local metascraper plugin for Reddit posts In the past, the preview images for bookmarks from Reddit links were poorly chosen. Reddit does not use opengraph tags, so metascraper-images simply looked for all images on the page and returned the first. This tended to be the profile picture for the poster for the Reddit link. This new plugin, using the existing metascraper framework, provides a better selection of image for the bookmark when the URL domain is 'reddit'. In addition, recent changes (I believe this was a side effect of adding the metascraper-author and/or the metascaper-publisher plugins, but it could also be related to the metascraper-readibility plugin) broke what used to be a good choice of bookmark title. Previously, titles looked like 'Tinyauth just reached 1000 stars! : r/selfhosted' with both thread title and subreddit mentioned. After this update, all Reddit posts now have the same title: 'The heart of the internet'. To return to the better format, this new metascraper-reddit plugin now attempts to retrieve the better title from reddit URLs. Note that in order to gain precendence in title selection, the 'metascraperReddit()' inclusion in the crawlerWorkers.ts metascraper instantiation list had to be moved above metascraperReadability(). * chore: updated Hoarder in text to Karakeep * chore: update metascraper versions fix for metascraper types has been merged; the expect-error comment can be removed * chore: merge with master --------- Co-authored-by: Mohamed Bassem <me@mbassem.com>
* feat(workers): migrate from puppeteer to playwright (#1296)Mael2025-06-222-34/+39
| | | | | | | | | | | | | | | | | | | | | | | * feat: convert to playwright Convert crawling to use Playwright instead of Chrome. - Update Dockerfile to include Playwright - Update crawler worker to use Playwright API - Update dependencies * feat: convert from Puppeteer to Playwright for crawling * feat: update docker-compose * use separate browser context for better isolation * skip chrome download in linux script * readd the stealth plugin --------- Co-authored-by: Mohamed Bassem <me@mbassem.com>
* chore: More oxlint changesMohamed Bassem2025-06-223-7/+4
|
* chore: migrate away from eslint to oxlint (#1642)xuatz2025-06-225-12/+27
| | | | | | | * chore: migrate away from eslint to oxlint * revert turbo task name lint * it seems like we can remove the seemingly default globals