aboutsummaryrefslogtreecommitdiffstats
path: root/apps/workers/crawlerWorker.ts (follow)
Commit message (Collapse)AuthorAgeFilesLines
* fix: Trigger search re-index on bookmark tag manual updates. Fixes #208 (#210)kamtschatka2024-06-091-5/+2
| | | | | | | | | | | | * re-index of database is not scanning all places when bookmark tags are changed. Manual indexing is working as workaround #208 introduced a new function to trigger a reindex to reduce copy/paste added missing reindexes when tags are deleted/bookmarks are updated * give functions a bit more descriptive name --------- Co-authored-by: kamtschatka <simon.schatka@gmx.at> Co-authored-by: MohamedBassem <me@mbassem.com>
* fix(crawler): Only update the database if full page archival is enabledMohamedBassem2024-05-261-19/+19
|
* feature: Full page archival with monolith. Fixes #132MohamedBassem2024-05-261-1/+65
|
* feature(crawler): Allow connecting to browser's websocket address and ↵MohamedBassem2024-05-151-28/+55
| | | | launching the browser on demand. This enables support for browserless
* feature: Take full page screenshots #143 (#148)kamtschatka2024-05-121-1/+2
| | | | | | Added the fullPage flag to take full screen screenshots updated the UI accordingly to properly show the screenshots instead of scaling it down Co-authored-by: kamtschatka <simon.schatka@gmx.at>
* feature(crawler): Allow increasing crawler concurrency and configure storing ↵MohamedBassem2024-04-261-0/+13
| | | | images and screenshots
* fix(crawler): Better extraction for amazon imagesMohamedBassem2024-04-231-0/+2
|
* fix(workers): Set a modern user agent and update the default viewport sizeMohamedBassem2024-04-231-0/+7
|
* feature: Allow recrawling bookmarks without running inference jobsMohamedBassem2024-04-201-7/+29
|
* feature: Download images and screenshotsMohamedBassem2024-04-201-28/+130
|
* feature: Recrawl failed links from admin UI (#95)Ahmad Mujahid2024-04-111-0/+20
| | | | | * feature: Retry failed crawling URLs * fix: Enhancing visuals and some minor changes.
* fix: Increase default navigation timeout to 30s, make it configurable and ↵MohamedBassem2024-04-111-1/+1
| | | | add retries to crawling jobs
* fix(crawler): Skip validating URLs in metascrapper as it was already being ↵MohamedBassem2024-04-091-0/+3
| | | | validated. Fixes #22
* fix(workers): Increase default timeout to 60s, make it configurable and ↵MohamedBassem2024-04-061-11/+21
| | | | improve logging
* fix(workers): Add a timeout to the crawling job to prevent it from getting ↵MohamedBassem2024-04-021-1/+2
| | | | stuck. Fixes #63
* chore(workers): Remove unused configuration optionsMohamedBassem2024-03-311-2/+0
|
* format: Add missing lint and format, and format the entire repoMohamedBassem2024-03-301-5/+6
|
* refactor: Validate env variables using zodMohamedBassem2024-03-271-1/+1
|
* docker: Use external chrome docker containerMohamedBassem2024-03-241-10/+40
|
* fix(workers): Fix the leaky browser instances in workers during developmentMohamedBassem2024-03-211-28/+30
|
* fix: Simple validations for crawled URLsMohamedBassem2024-03-211-1/+17
|
* structure: Create apps dir and copy tooling dir from t3-turbo repoMohamedBassem2024-03-141-0/+201