aboutsummaryrefslogtreecommitdiffstats
path: root/apps/workers (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* feature(web): Add ability to manually trigger full page archives. Fixes #398 ↵kamtschatka2024-09-301-3/+5
| | | | | | | | | | | | | (#418) * [Feature Request] Ability to select what to "crawl full page archive" #398 Added the ability to start a full page crawl for links and also in bulk operations added the ability to refresh links as a bulk operation as well * minor icon and wording changes --------- Co-authored-by: MohamedBassem <me@mbassem.com>
* feature(web): Add the ability to customize the inference prompts. Fixes #170MohamedBassem2024-09-291-39/+42
|
* fix(workers): Log stacktrace on worker error. #424 (#429)kamtschatka2024-09-263-3/+7
| | | extended logging when an exception occurrs, so it is possible to see the stacktrace of a failed execution
* deps: Upgrade drizzle and next auth drizzle adapterMohamedBassem2024-09-151-1/+1
|
* feature(worker): Allow configuring inference job timeout and ollama keep ↵MohamedBassem2024-09-152-1/+2
| | | | alive. Fixes #389 #224
* build: Fix sherif failures by sorting depsMohamedBassem2024-08-311-1/+1
|
* fix(workers): Shutdown workers on SIGTERMMohamedBassem2024-07-282-0/+9
|
* fix: async/await issues with the new queue (#319)kamtschatka2024-07-212-3/+3
|
* refactor: Replace the usage of bullMQ with the hoarder sqlite-based queue (#309)Mohamed Bassem2024-07-215-72/+75
|
* fix: monolith not embedding SVG files correctly. Fixes #289 (#306)kamtschatka2024-07-141-5/+2
| | | passing in the URL of the page to have the proper URL for resolving relative paths
* refactor: added the bookmark type to the database (#256)kamtschatka2024-07-011-0/+6
| | | | | | | | | | | | | | | | | * refactoring asset types Extracted out functions to silently delete assets and to update them after crawling Generalized the mapping of assets to bookmark fields to make extending them easier * Added the bookmark type to the database Introduced an enum to have better type safety cleaned up the code and based some code on the type directly * add BookmarkType.UNKNWON * lint and remove unused function --------- Co-authored-by: MohamedBassem <me@mbassem.com>
* refactor: remove redundant code from crawler worker and refactor handling of ↵kamtschatka2024-06-291-32/+49
| | | | | | | | | | | | | asset types (#253) * refactoring asset types Extracted out functions to silently delete assets and to update them after crawling Generalized the mapping of assets to bookmark fields to make extending them easier * revert silentDeleteAsset and hide better-sqlite3 --------- Co-authored-by: MohamedBassem <me@mbassem.com>
* feature: Automatically transfer image urls into bookmared assets. Fixes #246MohamedBassem2024-06-231-6/+16
|
* refactor: extract assets into their own database table. #215 (#220)kamtschatka2024-06-231-29/+71
| | | | | | | | | | | | | | | | | | | * Allow downloading more content from a webpage and index it #215 added a new table that contains the information about assets for link bookmarks created migration code that transfers the existing data into the new table * Allow downloading more content from a webpage and index it #215 removed the old asset columns from the database updated the UI to use the data from the linkBookmarkAssets array * generalize the assets table to not be linked in particular to links * fix migrations post merge * fix missing asset ids in the getBookmarks call --------- Co-authored-by: MohamedBassem <me@mbassem.com>
* feature: add support for PDF links. Fixes #28 (#216)kamtschatka2024-06-221-57/+163
| | | | | | | | | | | | | | | | | * feature request: pdf support #28 Added a new sourceUrl column to the asset bookmarks Added transforming a link bookmark pointing at a pdf to an asset bookmark made sure the "View Original" link is also shown for asset bookmarks that have a sourceURL updated gitignore for IDEA * remove pdf parsing from the crawler * extract the http logic into its own function to avoid duplicating the post-processing actions (openai/index) * Add 5s timeout to the content type fetch --------- Co-authored-by: MohamedBassem <me@mbassem.com>
* fix: Trigger search re-index on bookmark tag manual updates. Fixes #208 (#210)kamtschatka2024-06-092-10/+4
| | | | | | | | | | | | * re-index of database is not scanning all places when bookmark tags are changed. Manual indexing is working as workaround #208 introduced a new function to trigger a reindex to reduce copy/paste added missing reindexes when tags are deleted/bookmarks are updated * give functions a bit more descriptive name --------- Co-authored-by: kamtschatka <simon.schatka@gmx.at> Co-authored-by: MohamedBassem <me@mbassem.com>
* fix(workers): AI infered tags can contain " " at the beginning. Fixes #184 ↵kamtschatka2024-06-071-3/+5
| | | | | | | (#194) added a trim to tags to prevent whitespaces at the beginning/end of tags Co-authored-by: kamtschatka <simon.schatka@gmx.at>
* fix(crawler): Only update the database if full page archival is enabledMohamedBassem2024-05-261-19/+19
|
* feature: Full page archival with monolith. Fixes #132MohamedBassem2024-05-262-1/+66
|
* feature(inference): Improve ollama tagging (#162)kamtschatka2024-05-181-5/+12
| | | | | | | | | | | | | * Inference Failed with Ollama #20 Changed the prompt to be split in 2, so ollama does not forget them * Update apps/workers/openaiWorker.ts Co-authored-by: Mohamed Bassem <me@mbassem.com> --------- Co-authored-by: kamtschatka <simon.schatka@gmx.at> Co-authored-by: Mohamed Bassem <me@mbassem.com>
* feature(crawler): Allow connecting to browser's websocket address and ↵MohamedBassem2024-05-151-28/+55
| | | | launching the browser on demand. This enables support for browserless
* feature: Take full page screenshots #143 (#148)kamtschatka2024-05-121-1/+2
| | | | | | Added the fullPage flag to take full screen screenshots updated the UI accordingly to properly show the screenshots instead of scaling it down Co-authored-by: kamtschatka <simon.schatka@gmx.at>
* fix(inference): Attempt to reuse existing identical tagsMohamedBassem2024-04-261-22/+62
|
* feature(crawler): Allow increasing crawler concurrency and configure storing ↵MohamedBassem2024-04-261-0/+13
| | | | images and screenshots
* fix(crawler): Better extraction for amazon imagesMohamedBassem2024-04-232-0/+3
|
* fix(workers): Increase robustness of search worker and add extra logging. ↵MohamedBassem2024-04-231-24/+45
| | | | Fixes #118
* fix(workers): Set a modern user agent and update the default viewport sizeMohamedBassem2024-04-231-0/+7
|
* feature: Allow recrawling bookmarks without running inference jobsMohamedBassem2024-04-201-7/+29
|
* feature: Download images and screenshotsMohamedBassem2024-04-201-28/+130
|
* fix: Fix slice call in the content truncation logic which was resulting in ↵MohamedBassem2024-04-151-1/+1
| | | | excessive usage of context tokens. Fixes #94
* feature: Add title to bookmarks and allow editing them. Fixes #27MohamedBassem2024-04-151-1/+2
|
* fix: Differentiate between pending in db and in redis in admin job statsMohamedBassem2024-04-121-1/+1
|
* feature: Recrawl failed links from admin UI (#95)Ahmad Mujahid2024-04-111-0/+20
| | | | | * feature: Retry failed crawling URLs * fix: Enhancing visuals and some minor changes.
* fix: Increase default navigation timeout to 30s, make it configurable and ↵MohamedBassem2024-04-112-2/+1
| | | | add retries to crawling jobs
* feature: Add PDF support (#88)Ahmad Mujahid2024-04-114-12/+98
| | | | | | | | | | | | | | | | | | | * feature: Add PDF support * fix: PDF feature enhancements * fix: Freeze expo-share-intent version to prevent breaking changes * fix: set endOfLine to auto for cross-platform development * fix: Upgrading eslint/parser and eslint-plugin to 7.6.0 to solve the linting issues * fix: enhancing PDF feature * fix: Allowing null in fiename for backward compatibility * fix: update pnpm file with pnpm 9.0.0-alpha-8 * fix:(web): PDF Preview for web
* feature(inference): Upgrade the default vision model to the new gpt-4-turboMohamedBassem2024-04-091-0/+1
|
* fix(crawler): Skip validating URLs in metascrapper as it was already being ↵MohamedBassem2024-04-091-0/+3
| | | | validated. Fixes #22
* fix(workers): Increase default timeout to 60s, make it configurable and ↵MohamedBassem2024-04-061-11/+21
| | | | improve logging
* feature: Include server version in the admin UI. Fixes #66MohamedBassem2024-04-021-0/+4
|
* fix(workers): Add a timeout to the crawling job to prevent it from getting ↵MohamedBassem2024-04-022-1/+18
| | | | stuck. Fixes #63
* feat(workers): Allow configuring the language in which the tags are ↵MohamedBassem2024-04-021-5/+5
| | | | generated. Fixes #68
* chore(workers): Remove unused configuration optionsMohamedBassem2024-03-311-2/+0
|
* format: Add missing lint and format, and format the entire repoMohamedBassem2024-03-307-25/+37
|
* fix: Sort search results by relevanceMohamedBassem2024-03-301-0/+1
|
* feature(web): Add support for attaching notes to bookmarksMohamedBassem2024-03-301-0/+1
|
* fix: Drop the 2k char limit on notes. Fixes #25MohamedBassem2024-03-271-6/+11
|
* fix: Attempt to increase the reliability of the ollama inferenceMohamedBassem2024-03-272-16/+40
|
* feature: Add support for local models using ollamaMohamedBassem2024-03-274-76/+168
|
* refactor: Validate env variables using zodMohamedBassem2024-03-272-12/+12
|
* docker: Use external chrome docker containerMohamedBassem2024-03-241-10/+40
|