| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WIP: Initial restate integration
* add retry
* add delay + idempotency
* implement concurrency limits
* add admin stats
* add todos
* add id provider
* handle onComplete failures
* add tests
* add pub key and fix logging
* add priorities
* fail call after retries
* more fixes
* fix retries left
* some refactoring
* fix package.json
* upgrade sdk
* some test cleanups
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
* fix: Abort dangling processing when crawler is aborted
* comments
* report the size
* handle unhandleded rejection
* drop promisify
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* feat: Add tag search and use in the homepage
* use paginated query in the all tags view
* wire the load more buttons
* add skeleton to all tags page
* fix attachedby aggregation
* fix loading states
* fix hasNextPage
* use action buttons for load more buttons
* migrate the tags auto complete to the search api
* Migrate the tags editor to the new search API
* Replace tag merging dialog with tag auto completion
* Merge both search and list APIs
* fix tags.list
* add some tests for the endpoint
* add relevance based sorting
* change cursor
* update the REST API
* fix review comments
* more fixes
* fix lockfile
* i18n
* fix visible tags
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
* feat: Add cookie support for browser page access
Implemented cookie functionality for browser page access, including BROWSER_COOKIE_PATH configuration to specify the cookies JSON file path.
* fix the docs
---------
Co-authored-by: lizz <lizong1204@gmail.com>
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
* feat: add gif asset type support
* skip inference for gis
---------
Co-authored-by: Mohamed Bassem <me@mbassem.com>
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
This reverts commit 4ba3e8047a5b1f160169617187436c09e91662ec.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
Fixes #724
|
| | |
|
| |
|
|
| |
in worker
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* chore: metascraper 5.x comes with its own types, including @types/metascraper is now redundant; also updating to latest versions of metascraper libraries
* feat (workers): creating a local metascraper plugin for Reddit posts
In the past, the preview images for bookmarks from Reddit links were
poorly chosen. Reddit does not use opengraph tags, so metascraper-images
simply looked for all images on the page and returned the first. This
tended to be the profile picture for the poster for the Reddit link.
This new plugin, using the existing metascraper framework, provides a
better selection of image for the bookmark when the URL domain is
'reddit'.
In addition, recent changes (I believe this was a side effect of adding
the metascraper-author and/or the metascaper-publisher plugins, but it
could also be related to the metascraper-readibility plugin) broke what
used to be a good choice of bookmark title. Previously, titles looked
like 'Tinyauth just reached 1000 stars! : r/selfhosted' with both thread
title and subreddit mentioned. After this update, all Reddit posts now
have the same title: 'The heart of the internet'.
To return to the better format, this new metascraper-reddit plugin now
attempts to retrieve the better title from reddit URLs. Note that in
order to gain precendence in title selection, the 'metascraperReddit()'
inclusion in the crawlerWorkers.ts metascraper instantiation list had to
be moved above metascraperReadability().
* chore: updated Hoarder in text to Karakeep
* chore: update metascraper versions
fix for metascraper types has been merged; the expect-error comment can
be removed
* chore: merge with master
---------
Co-authored-by: Mohamed Bassem <me@mbassem.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* feat: convert to playwright
Convert crawling to use Playwright instead of Chrome.
- Update Dockerfile to include Playwright
- Update crawler worker to use Playwright API
- Update dependencies
* feat: convert from Puppeteer to Playwright for crawling
* feat: update docker-compose
* use separate browser context for better isolation
* skip chrome download in linux script
* readd the stealth plugin
---------
Co-authored-by: Mohamed Bassem <me@mbassem.com>
|
| | |
|
| |
|
|
|
|
|
| |
* chore: migrate away from eslint to oxlint
* revert turbo task name lint
* it seems like we can remove the seemingly default globals
|