aboutsummaryrefslogtreecommitdiffstats
path: root/apps/workers/crawlerWorker.ts (unfollow)
Commit message (Expand)AuthorFilesLines
2025-04-16fix(workers): Close browser if connect on demand (#1151)Chang-Yen Tseng1-0/+3
2025-04-12chore: Rename hoarder packages to karakeepMohamedBassem1-8/+8
2025-03-27feat(workers): Add CRAWLER_SCREENSHOT_TIMEOUT_SEC (#1155)Chang-Yen Tseng1-10/+18
2025-03-22feat(workers): Adds publisher and author og:meta tags to Bookmark (#1141)erik-nilcoast1-0/+24
2025-02-17feat: Add PDF screenshot generation and display (#995)Ahmad Mujahid1-0/+1
2025-02-02fix: Dont rearchive singlefile uploads and consider them as archivesMohamed Bassem1-2/+6
2025-02-01fix: Abort all IO when workers timeout instead of detaching. Fixes #742Mohamed Bassem1-13/+62
2025-01-19feat: Change webhooks to be configurable by usersMohamed Bassem1-2/+2
2025-01-19feat(webhook): Implement webhook functionality for bookmark events (#852)玄猫1-0/+4
2025-01-11feat: Add support for singlefile extension uploads. #172Mohamed Bassem1-6/+30
2024-12-26refactor: Move asset preprocessing to its own worker out of the inference workerMohamed Bassem1-17/+18
2024-12-08feature: Store crawling status code and allow users to find broken links. Fix...Mohamed Bassem1-4/+6
2024-11-30feature(workers): Allow running hoarder without chrome as a hard dependency. ...Mohamed Bassem1-11/+35
2024-11-23fix(workers): Set a timeout on the screenshot call and completely skip it if ...Mohamed Bassem1-13/+32
2024-11-21fix(workers): Don't block connection to chrome when failing to download adblo...Mohamed Bassem1-6/+22
2024-11-21chore(workers): Add extra logging for browser connection errorsMohamed Bassem1-1/+1
2024-11-09fix: Only update bookmark tagging/crawling status when worker is out of retriesMohamed Bassem1-4/+4
2024-11-03fix: Pass arguments to monolith and yt-dlp as array for better escapingMohamed Bassem1-1/+1
2024-10-28feature: Archive videos using yt-dlp. Fixes #215 (#525)kamtschatka1-49/+10
2024-10-27deps: Extract the queue implementation into its own reposMohamed Bassem1-1/+1
2024-10-06refactor: Start tracking bookmark assets in the assets tableMohamedBassem1-60/+83
2024-10-06refactor: Include userId in the assets tableMohamedBassem1-0/+5
2024-09-30feature(web): Add ability to manually trigger full page archives. Fixes #398 ...kamtschatka1-3/+5
2024-09-26fix(workers): Log stacktrace on worker error. #424 (#429)kamtschatka1-1/+3
2024-07-28fix(workers): Shutdown workers on SIGTERMMohamedBassem1-0/+4
2024-07-21fix: async/await issues with the new queue (#319)kamtschatka1-2/+2
2024-07-21refactor: Replace the usage of bullMQ with the hoarder sqlite-based queue (#309)Mohamed Bassem1-31/+29
2024-07-14fix: monolith not embedding SVG files correctly. Fixes #289 (#306)kamtschatka1-5/+2
2024-07-01refactor: added the bookmark type to the database (#256)kamtschatka1-0/+6
2024-06-29refactor: remove redundant code from crawler worker and refactor handling of ...kamtschatka1-32/+49
2024-06-23feature: Automatically transfer image urls into bookmared assets. Fixes #246MohamedBassem1-6/+16
2024-06-23refactor: extract assets into their own database table. #215 (#220)kamtschatka1-29/+71
2024-06-22feature: add support for PDF links. Fixes #28 (#216)kamtschatka1-57/+163
2024-06-09fix: Trigger search re-index on bookmark tag manual updates. Fixes #208 (#210)kamtschatka1-5/+2
2024-05-26fix(crawler): Only update the database if full page archival is enabledMohamedBassem1-19/+19
2024-05-26feature: Full page archival with monolith. Fixes #132MohamedBassem1-1/+65
2024-05-15feature(crawler): Allow connecting to browser's websocket address and launchi...MohamedBassem1-28/+55
2024-05-12feature: Take full page screenshots #143 (#148)kamtschatka1-1/+2
2024-04-26feature(crawler): Allow increasing crawler concurrency and configure storing ...MohamedBassem1-0/+13
2024-04-23fix(crawler): Better extraction for amazon imagesMohamedBassem1-0/+2
2024-04-23fix(workers): Set a modern user agent and update the default viewport sizeMohamedBassem1-0/+7
2024-04-20feature: Allow recrawling bookmarks without running inference jobsMohamedBassem1-7/+29
2024-04-20feature: Download images and screenshotsMohamedBassem1-28/+130
2024-04-11feature: Recrawl failed links from admin UI (#95)Ahmad Mujahid1-0/+20
2024-04-11fix: Increase default navigation timeout to 30s, make it configurable and add...MohamedBassem1-1/+1
2024-04-09fix(crawler): Skip validating URLs in metascrapper as it was already being va...MohamedBassem1-0/+3
2024-04-06fix(workers): Increase default timeout to 60s, make it configurable and impro...MohamedBassem1-11/+21
2024-04-02fix(workers): Add a timeout to the crawling job to prevent it from getting st...MohamedBassem1-1/+2
2024-03-31chore(workers): Remove unused configuration optionsMohamedBassem1-2/+0
2024-03-30format: Add missing lint and format, and format the entire repoMohamedBassem1-5/+6