rcgit

/ karakeep
follow (on) | order: default date topo
Age Commit message Author Files +/-
fix(workers): Shutdown workers on SIGTERM MohamedBassem 2 -0/+9
fix: async/await issues with the new queue (#319) kamtschatka 6 -25/+27
refactor: Replace the usage of bullMQ with the hoarder sqlite-based queue (#309) Mohamed Bassem 13 -344/+128
fix: monolith not embedding SVG files correctly. Fixes #289 (#306)
passing in the URL of the page to have the proper URL for resolving relative paths
kamtschatka 1 -5/+2
refactor: added the bookmark type to the database (#256)
* refactoring asset types
Extracted out functions to silently delete assets and to update them after crawling
Generalized the mapping of assets to bookmark fields to make extending them easier
* Added the bookmark type to the database
Introduced an enum to have better type safety
cleaned up the code and based some code on the type directly
* add BookmarkType.UNKNWON
* lint and remove unused function
---------
Co-authored-by: MohamedBassem <me@mbassem.com>
kamtschatka 27 -120/+1266
refactor: remove redundant code from crawler worker and refactor handling of…
* refactoring asset types
Extracted out functions to silently delete assets and to update them after crawling
Generalized the mapping of assets to bookmark fields to make extending them easier
* revert silentDeleteAsset and hide better-sqlite3
---------
Co-authored-by: MohamedBassem <me@mbassem.com>
kamtschatka 3 -65/+80
feature: Automatically transfer image urls into bookmared assets. Fixes #246 MohamedBassem 2 -9/+23
refactor: extract assets into their own database table. #215 (#220)
* Allow downloading more content from a webpage and index it #215
added a new table that contains the information about assets for link bookmarks
created migration code that transfers the existing data into the new table
* Allow downloading more content from a webpage and index it #215
removed the old asset columns from the database
updated the UI to use the data from the linkBookmarkAssets array
* generalize the assets table to not be linked in particular to links
* fix migrations post merge
* fix missing asset ids in the getBookmarks call
---------
Co-authored-by: MohamedBassem <me@mbassem.com>
kamtschatka 6 -52/+1271
feature: add support for PDF links. Fixes #28 (#216)
* feature request: pdf support #28
Added a new sourceUrl column to the asset bookmarks
Added transforming a link bookmark pointing at a pdf to an asset bookmark
made sure the "View Original" link is also shown for asset bookmarks that have a sourceURL
updated gitignore for IDEA
* remove pdf parsing from the crawler
* extract the http logic into its own function to avoid duplicating the post-processing actions (openai/index)
* Add 5s timeout to the content type fetch
---------
Co-authored-by: MohamedBassem <me@mbassem.com>
kamtschatka 10 -93/+1263
fix: Trigger search re-index on bookmark tag manual updates. Fixes #208 (#210)
* re-index of database is not scanning all places when bookmark tags are changed. Manual indexing is working as workaround #208
introduced a new function to trigger a reindex to reduce copy/paste
added missing reindexes when tags are deleted/bookmarks are updated
* give functions a bit more descriptive name
---------
Co-authored-by: kamtschatka <simon.schatka@gmx.at>
Co-authored-by: MohamedBassem <me@mbassem.com>
kamtschatka 6 -55/+41
fix(crawler): Only update the database if full page archival is enabled MohamedBassem 1 -19/+19
feature: Full page archival with monolith. Fixes #132 MohamedBassem 14 -7/+1259
feature(crawler): Allow connecting to browser's websocket address and launching… MohamedBassem 3 -36/+70
feature: Take full page screenshots #143 (#148)
Added the fullPage flag to take full screen screenshots
updated the UI accordingly to properly show the screenshots instead of scaling it down
Co-authored-by: kamtschatka <simon.schatka@gmx.at>
kamtschatka 4 -3/+9
feature(crawler): Allow increasing crawler concurrency and configure storing… MohamedBassem 3 -4/+26
fix(crawler): Better extraction for amazon images MohamedBassem 3 -0/+20
fix(workers): Set a modern user agent and update the default viewport size MohamedBassem 1 -0/+7
feature: Allow recrawling bookmarks without running inference jobs MohamedBassem 4 -9/+46
feature: Download images and screenshots MohamedBassem 22 -135/+1373
feature: Recrawl failed links from admin UI (#95)
* feature: Retry failed crawling URLs
* fix: Enhancing visuals and some minor changes.
Ahmad Mujahid 8 -25/+1067
fix: Increase default navigation timeout to 30s, make it configurable and add… MohamedBassem 5 -6/+17
fix(crawler): Skip validating URLs in metascrapper as it was already being… MohamedBassem 1 -0/+3
fix(workers): Increase default timeout to 60s, make it configurable and improve… MohamedBassem 3 -11/+29
fix(workers): Add a timeout to the crawling job to prevent it from getting… MohamedBassem 2 -1/+18
chore(workers): Remove unused configuration options MohamedBassem 2 -6/+0
format: Add missing lint and format, and format the entire repo MohamedBassem 57 -192/+255
refactor: Validate env variables using zod MohamedBassem 7 -46/+91
docker: Use external chrome docker container MohamedBassem 8 -33/+61
fix(workers): Fix the leaky browser instances in workers during development MohamedBassem 3 -29/+46
fix: Simple validations for crawled URLs MohamedBassem 1 -1/+17
structure: Create apps dir and copy tooling dir from t3-turbo repo MohamedBassem 396 -9511/+10350
feature: Store html content of links in the database MohamedBassem 6 -0/+818
fix: Use puppeteer adblocker to block cookies notices MohamedBassem 3 -0/+120
feature: Store full link content and index them MohamedBassem 9 -1/+878
feature: Add full text search support MohamedBassem 17 -12/+440
db: Migrate from prisma to drizzle MohamedBassem 41 -975/+2177
branding: Rename app to Hoarder MohamedBassem 21 -165/+164
build: Fix docker images MohamedBassem 7 -20/+34
fix: Let the crawler wait a bit more for page load MohamedBassem 3 -3/+18
fix: Harden puppeteer against browser disconnections and exceptions MohamedBassem 3 -16/+44
feature: Add ability to refresh bookmark details MohamedBassem 5 -4/+76
fix: Fix build for workers package and add it to CI MohamedBassem 8 -70/+106
[feature] Use puppeteer for fetching websites MohamedBassem 3 -18/+998
[chore] Linting and formating tweaking MohamedBassem 24 -67/+157
[refactor] Extract the bookmark model to be a high level model to support other… MohamedBassem 22 -308/+396
[refactor] Move the different packages to the package subdir MohamedBassem 128 -2716/+2713
[feature] Add openAI integration for extracting tags from articles MohamedBassem 9 -19/+239
[refactor] Rename the crawlers package to workers MohamedBassem 8 -126/+126
Implement metadata fetching logic in the crawler MohamedBassem 29 -264/+439
Init package and start bullmq workers MohamedBassem 12 -8/+91