From 2f21ef21f069485550f118e65a45a374e868c02f Mon Sep 17 00:00:00 2001 From: MohamedBassem Date: Wed, 20 Mar 2024 03:09:14 +0000 Subject: docs: Add docs for installation, configuration and development --- .github/workflows/docker.yml | 2 +- README.md | 75 +---------- apps/landing/app/page.tsx | 6 +- docker/docker-compose.yml | 14 +- docs/docs/01-intro.md | 24 ++++ docs/docs/02-installation.md | 64 +++++++++ docs/docs/03-configuration.md | 14 ++ docs/docs/04-quick-sharing.md | 17 +++ docs/docs/05-openai.md | 11 ++ docs/docs/06-Development/01-setup.md | 68 ++++++++++ docs/docs/06-Development/02-directories.md | 28 ++++ docs/docs/06-Development/03-database.md | 11 ++ docs/docs/07-security-considerations.md | 14 ++ docs/docs/configuration.md | 0 docs/docs/development.md | 0 docs/docs/installation.md | 0 docs/docs/intro.md | 5 - docs/docusaurus.config.ts | 11 +- docs/static/img/docusaurus.png | Bin 5142 -> 0 bytes docs/static/img/favicon.ico | Bin 3626 -> 15406 bytes docs/static/img/logo.png | Bin 0 -> 2362 bytes docs/static/img/logo.svg | 1 - docs/static/img/quick-sharing/extension.png | Bin 0 -> 67074 bytes docs/static/img/quick-sharing/mobile.png | Bin 0 -> 921508 bytes docs/static/img/undraw_docusaurus_mountain.svg | 171 ------------------------- docs/static/img/undraw_docusaurus_react.svg | 170 ------------------------ docs/static/img/undraw_docusaurus_tree.svg | 40 ------ 27 files changed, 284 insertions(+), 462 deletions(-) create mode 100644 docs/docs/01-intro.md create mode 100644 docs/docs/02-installation.md create mode 100644 docs/docs/03-configuration.md create mode 100644 docs/docs/04-quick-sharing.md create mode 100644 docs/docs/05-openai.md create mode 100644 docs/docs/06-Development/01-setup.md create mode 100644 docs/docs/06-Development/02-directories.md create mode 100644 docs/docs/06-Development/03-database.md create mode 100644 docs/docs/07-security-considerations.md delete mode 100644 docs/docs/configuration.md delete mode 100644 docs/docs/development.md delete mode 100644 docs/docs/installation.md delete mode 100644 docs/docs/intro.md delete mode 100644 docs/static/img/docusaurus.png create mode 100644 docs/static/img/logo.png delete mode 100644 docs/static/img/logo.svg create mode 100644 docs/static/img/quick-sharing/extension.png create mode 100644 docs/static/img/quick-sharing/mobile.png delete mode 100644 docs/static/img/undraw_docusaurus_mountain.svg delete mode 100644 docs/static/img/undraw_docusaurus_react.svg delete mode 100644 docs/static/img/undraw_docusaurus_tree.svg diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index 27c41d7e..60ef68e6 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -47,6 +47,6 @@ jobs: file: docker/Dockerfile target: ${{ matrix.package }} push: true - tags: ghcr.io/mohamedbassem/hoarder-${{ matrix.package }}:${{github.event.release.name}} + tags: ghcr.io/mohamedbassem/hoarder-${{ matrix.package }}:${{github.event.release.name}},ghcr.io/mohamedbassem/hoarder-${{ matrix.package }}:release cache-from: type=gha cache-to: type=gha,mode=max diff --git a/README.md b/README.md index 2f4197c6..0d0a1d37 100644 --- a/README.md +++ b/README.md @@ -6,54 +6,24 @@ A self-hostable bookmark-everything app with a touch of AI for the data hoarders ## Features -- 🔗 Bookmark links and take simple notes. +- 🔗 Bookmark links, take simple notes and store images. - ⬇️ Automatic fetching for link titles, descriptions and images. - 📋 Sort your bookmarks into lists. - 🔎 Full text search of all the content stored. - ✨ AI-based (aka chatgpt) automatic tagging. - 🔖 [Chrome plugin](https://chromewebstore.google.com/detail/hoarder/kgcjekpmcjjogibpjebkhaanilehneje) for quick bookmarking. -- 📱 [iOS shortcut](https://www.icloud.com/shortcuts/78734b46624c4a3297187c85eb50d800) for bookmarking content from the phone. A minimal mobile app might come later. +- 📱 [iOS shortcut](https://www.icloud.com/shortcuts/78734b46624c4a3297187c85eb50d800) for bookmarking content from the phone. A minimal mobile app is in the works. - 💾 Self-hosting first. - [Planned] Archiving the content for offline reading. -- [Planned] Store raw images. **⚠️ This app is under heavy development and it's far from stable.** -## Installation +## Documentation -Docker is the recommended way for deploying the app. A docker compose file is provided. - -Run `docker compose up` then head to `http://localhost:3000` to access the app. - -> NOTE: You'll need to set the env variable `OPENAI_API_KEY` without your own openai key for automatic tagging to work. Check the next section for config details. - -## Configuration - -The app is configured with env variables. - -| Name | Default | Description | -| -------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| OPENAI_API_KEY | Not set | The OpenAI key used for automatic tagging. If not set, automatic tagging won't be enabled. The app currently uses `gpt-3.5-turbo-0125` which is [extremely cheap](https://openai.com/pricing). You'll be able to bookmark 1000+ for less than $1. | -| DATA_DIR | Not set | The path for the persistent data directory. | -| REDIS_HOST | localhost | The address of redis used by background jobs | -| REDIS_POST | 6379 | The port of redis used by background jobs | -| MEILI_ADDR | Not set | The address of meilisearch. If not set, Search will be disabled. | -| MEILI_MASTER_KEY | Not set | The master key configured for meili. Not needed in development. | - -## Security Considerations - -If you're going to give app access to untrusted users, there's some security considerations that you'll need to be aware of given how the crawler works. The crawler is basically running a browser to fetch the content of the bookmarks. Any untrusted user can submit bookmarks to be crawled from your server and they'll be able to see the crawling result. This can be abused in multiple ways: - -1. Untrused users can submit crawl requests to websites that you don't want to be coming out of your IPs. -2. Crawling user controlled websites can expose your origin IP (and location) even if your service is hosted behind cloudflare for example. -3. The crawling requests will be coming out from your own network, which untrusted users can leverage to crawl internal non-internet exposed endpoints. - -To mitigate those risks, you can do one of the following: - -1. Limit access to trusted users -2. Let the browser traffic go through some VPN with restricted network policies. -3. Host the browser container outside of your network. -4. Use a hosted browser as a service (e.g. [browserless](https://browserless.io)). Note: I've never used them before. +- [Installation](https://docs.hoarder.app/installation) +- [Configuration](https://docs.hoarder.app/configuration) +- [Security Considerations](https://docs.hoarder.app/security-considerations) +- [Development](https://docs.hoarder.app/Development/setup) ## Stack @@ -80,34 +50,3 @@ I'm a systems engineer in my day job (and have been for the past 7 years). I did - [memos](https://github.com/usememos/memos): I love memos. I have it running on my home server and it's one of my most used self-hosted apps. I, however, don't like the fact that it doesn't preview the content of the links I dump there and to be honest, it doesn't have to because that's not what it was designed for. It's just that I dump a lot of links there and I'd have loved if I'd be able to figure which link is that by just looking at my timeline. Also, given the variety of things I dump there, I'd have loved if it does some sort of automatic tagging for what I save there. This is exactly the usecase that I'm trying to tackle with Hoarder. - [Wallabag](https://wallabag.it): Wallabag is a well-established open source read-it-later app written in php and I think it's the common recommendation on reddit for such apps. To be honest, I didn't give it a real shot, and the UI just felt a bit dated for my liking. Honestly, it's probably much more stable and feature complete than this app, but where's the fun in that? - [Shiori](https://github.com/go-shiori/shiori): Shiori is meant to be an open source pocket clone written in Go. It ticks all the marks but doesn't have my super sophisticated AI-based tagging. (JK, I only found about it after I decided to build my own app, so here we are 🤷). - -## Development - -### Docker - -You can turnup the whole development environment with: -`docker compose -f docker/docker-compose.dev.yml up` - -### Manual - -Or if you have nodejs installed locally, you can do: - -- `pnpm install` in the root of the repo. -- `pnpm db:migrate` to run the db migrations. -- `pnpm web` to start the web app. - - Access it over `http://localhost:3000`. -- `pnpm workers` to start the crawler and the openai worker. - - You'll need to have redis running at `localhost:5379` (configurable with env variables). - - An easy way to get redis running is by using docker `docker run -p 5379:5379 redis`. - - You can run the web app without the workers, but link fetching and automatic tagging won't work. - -### Codebase structure - -- `packages/db`: Where drizzle's schema lives. Shared between packages. -- `packages/shared`: Shared utilities and code between the workers and the web app. -- `packages/web`: Where the nextjs based web app lives. -- `packages/workers`: Where the background job workers (crawler and openai as of now) run. - -### Submitting PRs - -- Before submitting PRs, you'll want to run `pnpm format` and include its changes in the commit. Also make sure `pnpm lint` is successful. diff --git a/apps/landing/app/page.tsx b/apps/landing/app/page.tsx index d87962bb..1c852b80 100644 --- a/apps/landing/app/page.tsx +++ b/apps/landing/app/page.tsx @@ -6,6 +6,7 @@ import screenshot from "@/public/screenshot.png"; import { ExternalLink, Github, PackageOpen } from "lucide-react"; const GITHUB_LINK = "https://github.com/MohamedBassem/hoarder-app"; +const DOCS_LINK = "https://docs.hoarder.app"; function NavBar() { return ( @@ -15,7 +16,10 @@ function NavBar() {

Hoarder

- + Docs + +HOARDER_VERSION=release + +MEILI_ADDR=http://meilisearch:7700 +MEILI_MASTER_KEY=another_random_string +``` + +You can use `openssl rand -base64 36` to generate the random strings. + +Persistent storage and the wiring between the different services is already taken care of in the docker compose file. + +### 4. Setup OpenAI + +To enable automatic tagging, you'll need to configure open ai. This is optional though but hightly recommended. + +- Follow [OpenAI's help](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key) to get an API key. +- Add `OPENAI_API_KEY=` to the env file. + +Learn more about the costs of using openai [here](/openai). + + +### 5. Start the service + + +Start the service by running: + +``` +$ docker compose up -d +``` diff --git a/docs/docs/03-configuration.md b/docs/docs/03-configuration.md new file mode 100644 index 00000000..a9c02611 --- /dev/null +++ b/docs/docs/03-configuration.md @@ -0,0 +1,14 @@ +# Configuration + +The app is mainly configured by environment variables. All the used environment variables are listed in [packages/shared/config.ts](https://github.com/MohamedBassem/hoarder-app/blob/main/packages/shared/config.ts). The most important ones are: + +| Name | Required | Default | Description | +| ---------------- | ------------------------------------- | --------- | ----------------------------------------------------------------------------------------------------------------------------- | +| DATA_DIR | Yes | Not set | The path for the persistent data directory. This is where the db and the uploaded assets live. | +| NEXTAUTH_SECRET | Yes | Not set | Random string used to sign the JWT tokens. Generate one with `openssl rand -base64 36`. | +| NEXTAUTH_URL | Yes | Not set | The url on which the service will be running on. E.g. (`https://demo.hoarder.app`). | +| REDIS_HOST | Yes | localhost | The address of redis used by background jobs | +| REDIS_POST | Yes | 6379 | The port of redis used by background jobs | +| OPENAI_API_KEY | No | Not set | The OpenAI key used for automatic tagging. If not set, automatic tagging won't be enabled. More on that in [here](/openai). | +| MEILI_ADDR | No | Not set | The address of meilisearch. If not set, Search will be disabled. E.g. (`http://meilisearch:7700`) | +| MEILI_MASTER_KEY | Only in Prod and if search is enabled | Not set | The master key configured for meilisearch. Not needed in development environment. Generate one with `openssl rand -base64 36` | diff --git a/docs/docs/04-quick-sharing.md b/docs/docs/04-quick-sharing.md new file mode 100644 index 00000000..05ff5448 --- /dev/null +++ b/docs/docs/04-quick-sharing.md @@ -0,0 +1,17 @@ +# Quick Sharing Extensions + +The whole point of Hoarder is making it easy to hoard the content. That's why there are a couple of + +## Mobile Apps + +mobile screenshot + + +- iOS app: TODO +- Android App: The app is built in using a cross-platform framework (react native). So technically, the android app should just work, but I didn't test it. If there's enough demand, I'll publish it to the google play store. + +## Chrome Extensions + +mobile screenshot + +- To quickly bookmark links, you can also use the chrome extension [here](https://chromewebstore.google.com/detail/hoarder/kgcjekpmcjjogibpjebkhaanilehneje). diff --git a/docs/docs/05-openai.md b/docs/docs/05-openai.md new file mode 100644 index 00000000..91e37c07 --- /dev/null +++ b/docs/docs/05-openai.md @@ -0,0 +1,11 @@ +# OpenAI Costs + +This service uses OpenAI for automatic tagging. This means that you'll incur some costs if automatic tagging is enabled. There are two type of inferences that we do: + +## Text Tagging + +For text tagging, we use the `gpt-3.5-turbo-0125` model. This model is [extremely cheap](https://openai.com/pricing). Cost per inference varies depending on the content size per article. Though, roughly, You'll be able to generate tags for almost 1000+ bookmarks for less than $1. + +## Image Tagging + +For image uploads, we use the `gpt-4-vision-preview` model for extracting tags from the image. You can learn more about the costs of using this model [here](https://platform.openai.com/docs/guides/vision/calculating-costs). To lower the costs, we're using the low resolution mode (fixed number of tokens regardless of image size). The gpt-4 model, however, is much more expensive than the `gpt-3.5-turbo`. Currently, we're using around 350 token per image inference which ends up costing around $0.01 per inference. So around 10x more expensive than the text tagging. diff --git a/docs/docs/06-Development/01-setup.md b/docs/docs/06-Development/01-setup.md new file mode 100644 index 00000000..775a5806 --- /dev/null +++ b/docs/docs/06-Development/01-setup.md @@ -0,0 +1,68 @@ +# Setup + +## Manual Setup +### First Setup + +- You'll need to prepare the environment variables for the dev env. +- Easiest would be to set it up once in the root of the repo and then symlink it in each app directory. +- Start by copying the template by `cp .env.sample .env`. +- The most important env variables to set are: + - `DATA_DIR`: Where the database and assets will be stored. This is the only required env variable. You can use an absolute path so that all apps point to the same dir. + - `REDIS_HOST` and `REDIS_PORT` default to `localhost` and `6379` change them if redis is running on a different address. + - `MEILI_ADDR`: If not set, search will be disabled. You can set it to `http://127.0.0.1:7700` if you run meilisearch using the command below. + - `OPENAI_API_KEY`: If you want to enable auto tag inference in the dev env. + +### Dependencies + +#### Redis + +Redis is used as the background job queue. The easiest way to get it running is with docker `docker run -p 6379:6379 redis:alpine`. + +#### Meilisearch + +Meilisearch is the provider for the full text search. You can get it running with `docker run -p 7700:7700 getmeili/meilisearch:v1.6`. + +Mount persistent volume if you want to keep index data across restarts. You can trigger a re-index for the entire items collection in the admin panel in the web app. + +#### Chrome + +The worker app will automatically start headless chrome on startup for crawling pages. You don't need to do anything there. + +### Web App + +- Run `pnpm web` in the root of the repo. +- Go to `http://localhost:3000`. + +> NOTE: The web app kinda works without any dependencies. However, search won't work unless meilisearch is running. Also, new items added won't get crawled/indexed unless redis is running. + +### Workers + +- Run `pnpm workers` in the root of the repo. + +> NOTE: The workers package requires having redis working as it's the queue provider. + +### iOS Mobile App + +- `cd apps/mobile` +- `pnpm exec expo prebuild --no-install` to build the app. +- Start the ios simulator. +- `pnpm exec expo run:ios` +- The app will be installed and started in the simulator. + +Changing the code will hot reload the app. However, installing new packages requires restarting the expo server. + +### Browser Extension + +- `cd apps/browser-extension` +- `pnpm dev` +- This will generate a `dist` package +- Go to extension settings in chrome and enable developer mode. +- Press `Load unpacked` and point it to the `dist` directory. +- The plugin will pop up in the plugin list. + +In dev mode, opening and closing the plugin menu should reload the code. + + +## Docker Dev Env + +If the manual setup is too much hassle for you. You can use a docker based dev environment by running `docker compose -f docker/docker-compose.dev.yml up` in the root of the repo. This setup wasn't super reliable for me though. diff --git a/docs/docs/06-Development/02-directories.md b/docs/docs/06-Development/02-directories.md new file mode 100644 index 00000000..54552402 --- /dev/null +++ b/docs/docs/06-Development/02-directories.md @@ -0,0 +1,28 @@ +# Directory Structure + +## Apps + +| Directory | Description | +| ------------------------ | ------------------------------------------------------ | +| `apps/web` | The main web app | +| `apps/workers` | The background workers logic | +| `apps/mobile` | The react native based mobile app | +| `apps/browser-extension` | The browser extension | +| `apps/landing` | The landing page of [hoarder.app](https://hoarder.app) | + +## Shared Packages + +| Directory | Description | +| ----------------- | ---------------------------------------------------------------------------- | +| `packages/db` | The database schema and migrations | +| `packages/trpc` | Where most of the business logic lies built as TRPC routes | +| `packages/shared` | Some shared code between the different apps (e.g. loggers, configs, assetdb) | + +## Toolings + +| Directory | Description | +| -------------------- | ----------------------- | +| `tooling/typescript` | The shared tsconfigs | +| `tooling/eslint` | ESlint configs | +| `tooling/prettier` | Prettier configs | +| `tooling/tailwind` | Shared tailwind configs | diff --git a/docs/docs/06-Development/03-database.md b/docs/docs/06-Development/03-database.md new file mode 100644 index 00000000..40e2d164 --- /dev/null +++ b/docs/docs/06-Development/03-database.md @@ -0,0 +1,11 @@ +# Database Migrations + +- The database schema lives in `packages/db/schema.ts`. +- Changing the schema, requires a migration. +- You can generate the migration by running `pnpm drizzle-kit generate:sqlite` in the `packages/db` dir. +- You can then apply the migration by running `pnpm run migrate`. + + +## Drizzle Studio + +You can start the drizzle studio by running `pnpm db:studio` in the root of the repo. diff --git a/docs/docs/07-security-considerations.md b/docs/docs/07-security-considerations.md new file mode 100644 index 00000000..7cab2e07 --- /dev/null +++ b/docs/docs/07-security-considerations.md @@ -0,0 +1,14 @@ +# Security Considerations + +If you're going to give app access to untrusted users, there's some security considerations that you'll need to be aware of given how the crawler works. The crawler is basically running a browser to fetch the content of the bookmarks. Any untrusted user can submit bookmarks to be crawled from your server and they'll be able to see the crawling result. This can be abused in multiple ways: + +1. Untrused users can submit crawl requests to websites that you don't want to be coming out of your IPs. +2. Crawling user controlled websites can expose your origin IP (and location) even if your service is hosted behind cloudflare for example. +3. The crawling requests will be coming out from your own network, which untrusted users can leverage to crawl internal non-internet exposed endpoints. + +To mitigate those risks, you can do one of the following: + +1. Limit access to trusted users +2. Let the browser traffic go through some VPN with restricted network policies. +3. Host the browser container outside of your network. +4. Use a hosted browser as a service (e.g. [browserless](https://browserless.io)). Note: I've never used them before. diff --git a/docs/docs/configuration.md b/docs/docs/configuration.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/docs/development.md b/docs/docs/development.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/docs/installation.md b/docs/docs/installation.md deleted file mode 100644 index e69de29b..00000000 diff --git a/docs/docs/intro.md b/docs/docs/intro.md deleted file mode 100644 index f0e332e5..00000000 --- a/docs/docs/intro.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -slug: / -sidebar_position: 1 ---- - diff --git a/docs/docusaurus.config.ts b/docs/docusaurus.config.ts index 0c5c59dd..b1414661 100644 --- a/docs/docusaurus.config.ts +++ b/docs/docusaurus.config.ts @@ -51,12 +51,17 @@ const config: Config = { // Replace with your project's social card image: 'img/docusaurus-social-card.jpg', navbar: { - title: 'Hoarder Docs', + title: 'Hoarder', logo: { - alt: 'My Site Logo', - src: 'img/logo.svg', + alt: 'Hoarder Logo', + src: 'img/logo.png', }, items: [ + { + href: 'https://hoarder.app', + label: 'Homepage', + position: 'right', + }, { href: 'https://github.com/MohamedBassem/hoarder-app', label: 'GitHub', diff --git a/docs/static/img/docusaurus.png b/docs/static/img/docusaurus.png deleted file mode 100644 index f458149e..00000000 Binary files a/docs/static/img/docusaurus.png and /dev/null differ diff --git a/docs/static/img/favicon.ico b/docs/static/img/favicon.ico index c01d54bc..750e3c04 100644 Binary files a/docs/static/img/favicon.ico and b/docs/static/img/favicon.ico differ diff --git a/docs/static/img/logo.png b/docs/static/img/logo.png new file mode 100644 index 00000000..71ead90c Binary files /dev/null and b/docs/static/img/logo.png differ diff --git a/docs/static/img/logo.svg b/docs/static/img/logo.svg deleted file mode 100644 index 9db6d0d0..00000000 --- a/docs/static/img/logo.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/docs/static/img/quick-sharing/extension.png b/docs/static/img/quick-sharing/extension.png new file mode 100644 index 00000000..4b273998 Binary files /dev/null and b/docs/static/img/quick-sharing/extension.png differ diff --git a/docs/static/img/quick-sharing/mobile.png b/docs/static/img/quick-sharing/mobile.png new file mode 100644 index 00000000..b1617a47 Binary files /dev/null and b/docs/static/img/quick-sharing/mobile.png differ diff --git a/docs/static/img/undraw_docusaurus_mountain.svg b/docs/static/img/undraw_docusaurus_mountain.svg deleted file mode 100644 index af961c49..00000000 --- a/docs/static/img/undraw_docusaurus_mountain.svg +++ /dev/null @@ -1,171 +0,0 @@ - - Easy to Use - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/static/img/undraw_docusaurus_react.svg b/docs/static/img/undraw_docusaurus_react.svg deleted file mode 100644 index 94b5cf08..00000000 --- a/docs/static/img/undraw_docusaurus_react.svg +++ /dev/null @@ -1,170 +0,0 @@ - - Powered by React - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/docs/static/img/undraw_docusaurus_tree.svg b/docs/static/img/undraw_docusaurus_tree.svg deleted file mode 100644 index d9161d33..00000000 --- a/docs/static/img/undraw_docusaurus_tree.svg +++ /dev/null @@ -1,40 +0,0 @@ - - Focus on What Matters - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- cgit v1.2.3-70-g09d2