| Age | Commit message | Author | Files | +/- |
|---|---|---|---|---|
| fix(workers): Shutdown workers on SIGTERM | MohamedBassem | 2 | -0/+9 | |
| fix: async/await issues with the new queue (#319) | kamtschatka | 6 | -25/+27 | |
| refactor: Replace the usage of bullMQ with the hoarder sqlite-based queue (#309) | Mohamed Bassem | 13 | -344/+128 | |
fix: monolith not embedding SVG files correctly. Fixes #289 (#306) …passing in the URL of the page to have the proper URL for resolving relative paths |
kamtschatka | 1 | -5/+2 | |
refactor: added the bookmark type to the database (#256) …* refactoring asset types Extracted out functions to silently delete assets and to update them after crawling Generalized the mapping of assets to bookmark fields to make extending them easier * Added the bookmark type to the database Introduced an enum to have better type safety cleaned up the code and based some code on the type directly * add BookmarkType.UNKNWON * lint and remove unused function --------- Co-authored-by: MohamedBassem <me@mbassem.com> |
kamtschatka | 27 | -120/+1266 | |
refactor: remove redundant code from crawler worker and refactor handling of… …* refactoring asset types Extracted out functions to silently delete assets and to update them after crawling Generalized the mapping of assets to bookmark fields to make extending them easier * revert silentDeleteAsset and hide better-sqlite3 --------- Co-authored-by: MohamedBassem <me@mbassem.com> |
kamtschatka | 3 | -65/+80 | |
| feature: Automatically transfer image urls into bookmared assets. Fixes #246 | MohamedBassem | 2 | -9/+23 | |
refactor: extract assets into their own database table. #215 (#220) …* Allow downloading more content from a webpage and index it #215 added a new table that contains the information about assets for link bookmarks created migration code that transfers the existing data into the new table * Allow downloading more content from a webpage and index it #215 removed the old asset columns from the database updated the UI to use the data from the linkBookmarkAssets array * generalize the assets table to not be linked in particular to links * fix migrations post merge * fix missing asset ids in the getBookmarks call --------- Co-authored-by: MohamedBassem <me@mbassem.com> |
kamtschatka | 6 | -52/+1271 | |
feature: add support for PDF links. Fixes #28 (#216) …* feature request: pdf support #28 Added a new sourceUrl column to the asset bookmarks Added transforming a link bookmark pointing at a pdf to an asset bookmark made sure the "View Original" link is also shown for asset bookmarks that have a sourceURL updated gitignore for IDEA * remove pdf parsing from the crawler * extract the http logic into its own function to avoid duplicating the post-processing actions (openai/index) * Add 5s timeout to the content type fetch --------- Co-authored-by: MohamedBassem <me@mbassem.com> |
kamtschatka | 10 | -93/+1263 | |
fix: Trigger search re-index on bookmark tag manual updates. Fixes #208 (#210) …* re-index of database is not scanning all places when bookmark tags are changed. Manual indexing is working as workaround #208 introduced a new function to trigger a reindex to reduce copy/paste added missing reindexes when tags are deleted/bookmarks are updated * give functions a bit more descriptive name --------- Co-authored-by: kamtschatka <simon.schatka@gmx.at> Co-authored-by: MohamedBassem <me@mbassem.com> |
kamtschatka | 6 | -55/+41 | |
| fix(crawler): Only update the database if full page archival is enabled | MohamedBassem | 1 | -19/+19 | |
| feature: Full page archival with monolith. Fixes #132 | MohamedBassem | 14 | -7/+1259 | |
| feature(crawler): Allow connecting to browser's websocket address and launching… | MohamedBassem | 3 | -36/+70 | |
feature: Take full page screenshots #143 (#148) …Added the fullPage flag to take full screen screenshots updated the UI accordingly to properly show the screenshots instead of scaling it down Co-authored-by: kamtschatka <simon.schatka@gmx.at> |
kamtschatka | 4 | -3/+9 | |
| feature(crawler): Allow increasing crawler concurrency and configure storing… | MohamedBassem | 3 | -4/+26 | |
| fix(crawler): Better extraction for amazon images | MohamedBassem | 3 | -0/+20 | |
| fix(workers): Set a modern user agent and update the default viewport size | MohamedBassem | 1 | -0/+7 | |
| feature: Allow recrawling bookmarks without running inference jobs | MohamedBassem | 4 | -9/+46 | |
| feature: Download images and screenshots | MohamedBassem | 22 | -135/+1373 | |
feature: Recrawl failed links from admin UI (#95) …* feature: Retry failed crawling URLs * fix: Enhancing visuals and some minor changes. |
Ahmad Mujahid | 8 | -25/+1067 | |
| fix: Increase default navigation timeout to 30s, make it configurable and add… | MohamedBassem | 5 | -6/+17 | |
| fix(crawler): Skip validating URLs in metascrapper as it was already being… | MohamedBassem | 1 | -0/+3 | |
| fix(workers): Increase default timeout to 60s, make it configurable and improve… | MohamedBassem | 3 | -11/+29 | |
| fix(workers): Add a timeout to the crawling job to prevent it from getting… | MohamedBassem | 2 | -1/+18 | |
| chore(workers): Remove unused configuration options | MohamedBassem | 2 | -6/+0 | |
| format: Add missing lint and format, and format the entire repo | MohamedBassem | 57 | -192/+255 | |
| refactor: Validate env variables using zod | MohamedBassem | 7 | -46/+91 | |
| docker: Use external chrome docker container | MohamedBassem | 8 | -33/+61 | |
| fix(workers): Fix the leaky browser instances in workers during development | MohamedBassem | 3 | -29/+46 | |
| fix: Simple validations for crawled URLs | MohamedBassem | 1 | -1/+17 | |
| structure: Create apps dir and copy tooling dir from t3-turbo repo | MohamedBassem | 396 | -9511/+10350 | |
| feature: Store html content of links in the database | MohamedBassem | 6 | -0/+818 | |
| fix: Use puppeteer adblocker to block cookies notices | MohamedBassem | 3 | -0/+120 | |
| feature: Store full link content and index them | MohamedBassem | 9 | -1/+878 | |
| feature: Add full text search support | MohamedBassem | 17 | -12/+440 | |
| db: Migrate from prisma to drizzle | MohamedBassem | 41 | -975/+2177 | |
| branding: Rename app to Hoarder | MohamedBassem | 21 | -165/+164 | |
| build: Fix docker images | MohamedBassem | 7 | -20/+34 | |
| fix: Let the crawler wait a bit more for page load | MohamedBassem | 3 | -3/+18 | |
| fix: Harden puppeteer against browser disconnections and exceptions | MohamedBassem | 3 | -16/+44 | |
| feature: Add ability to refresh bookmark details | MohamedBassem | 5 | -4/+76 | |
| fix: Fix build for workers package and add it to CI | MohamedBassem | 8 | -70/+106 | |
| [feature] Use puppeteer for fetching websites | MohamedBassem | 3 | -18/+998 | |
| [chore] Linting and formating tweaking | MohamedBassem | 24 | -67/+157 | |
| [refactor] Extract the bookmark model to be a high level model to support other… | MohamedBassem | 22 | -308/+396 | |
| [refactor] Move the different packages to the package subdir | MohamedBassem | 128 | -2716/+2713 | |
| [feature] Add openAI integration for extracting tags from articles | MohamedBassem | 9 | -19/+239 | |
| [refactor] Rename the crawlers package to workers | MohamedBassem | 8 | -126/+126 | |
| Implement metadata fetching logic in the crawler | MohamedBassem | 29 | -264/+439 | |
| Init package and start bullmq workers | MohamedBassem | 12 | -8/+91 |