aboutsummaryrefslogtreecommitdiffstats
path: root/packages/shared/prompts.ts (follow)
Commit message (Collapse)AuthorAgeFilesLines
* feat(ai): Support restricting AI tags to a subset of existing tags (#2444)Mohamed Bassem2026-02-091-1/+9
| | | | | * feat(ai): Support restricting AI tags to a subset of existing tags Co-authored-by: Claude <noreply@anthropic.com>
* feat: Add LLM-based OCR as alternative to Tesseract (#2442)Mohamed Bassem2026-02-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | * feat(ocr): add LLM-based OCR support alongside Tesseract Add support for using configured LLM inference providers (OpenAI or Ollama) for OCR text extraction from images as an alternative to Tesseract. Changes: - Add OCR_USE_LLM environment variable flag (default: false) - Add buildOCRPrompt function for LLM-based text extraction - Add readImageTextWithLLM function in asset preprocessing worker - Update extractAndSaveImageText to route between Tesseract and LLM OCR - Update documentation with the new configuration option When OCR_USE_LLM is enabled, the system uses the configured inference model to extract text from images. If no inference provider is configured, it falls back to Tesseract. https://claude.ai/code/session_01Y7h7kDAmqXKXEWDmWbVkDs * format --------- Co-authored-by: Claude <noreply@anthropic.com>
* fix(web): don't bundle tiktoken in client bundlesMohamed Bassem2026-02-011-80/+2
|
* fix: more tagging tweaksMohamed Bassem2025-12-291-4/+3
|
* fix: change prompt to better recognize error pagesMohamed Bassem2025-12-291-3/+6
|
* feat: add customizable tag styles (#2312)Mohamed Bassem2025-12-271-5/+32
| | | | | | | | | | | | | | | * feat: add customizable tag styles * add tag lang setting * ui settings cleanup * fix migration * change look of the field * more fixes * fix tests
* fix: lazy load js-tiktoken in prompts module (#2176)Mohamed Bassem2025-11-281-26/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: lazy load tiktoken to reduce memory footprint The js-tiktoken module loads a large encoding dictionary into memory immediately on import. This change defers the loading of the encoding until it's actually needed by using a lazy getter pattern. This reduces memory usage for processes that import this module but don't actually use the token encoding functions. * fix: use createRequire for lazy tiktoken import in ES module The previous implementation used bare require() which fails at runtime in ES modules (ReferenceError: require is not defined). This fixes it by using createRequire from Node's 'module' package, which creates a require function that works in ES module contexts. * refactor: convert tiktoken lazy loading to async dynamic imports Changed from createRequire to async import() for lazy loading tiktoken, making buildTextPrompt and buildSummaryPrompt async. This is cleaner for ES modules and properly defers the large tiktoken encoding data until it's actually needed. Updated all callers to await these async functions: - packages/trpc/routers/bookmarks.ts - apps/workers/workers/inference/tagging.ts - apps/workers/workers/inference/summarize.ts - apps/web/components/settings/AISettings.tsx (converted to useEffect) * feat: add untruncated prompt builders for UI previews Added buildTextPromptUntruncated and buildSummaryPromptUntruncated functions that don't require token counting or truncation. These are synchronous and don't load tiktoken, making them perfect for UI previews where exact token limits aren't needed. Updated AISettings.tsx to use these untruncated versions, eliminating the need for useEffect/useState and avoiding unnecessary tiktoken loading in the browser. * fix * fix --------- Co-authored-by: Claude <noreply@anthropic.com>
* fix(inferance): skip token slicing when content is already witin max lengthMohamed Bassem2025-10-261-0/+3
|
* fix: Correct grammatical errors in prompts (#2020)atsggx2025-10-111-2/+2
| | | Corrected "who's" to "whose" in buildImagePrompt and buildTextPrompt.
* fix: minor changes to the tagging prompts (#1474)Olicorne2025-06-221-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat: add optional `thinking` key to tagging response schema * prompt: fix indent * prompt: remove extra 'language' word * prompt: use xml as separator * revert: dont use a thinking tags Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com> * prompt: don't ask to include website tags * prompt: aim for 5 tags * prompt: dont tell bot its a bot * prompt: propose a tag_error * Revert "prompt: propose a tag_error" This reverts commit 78c5099a187960cc3697b77f2b2bd687edb015f3. * minor prompt tweaks * minor prompt tweaks take 2 --------- Signed-off-by: thiswillbeyourgithub Co-authored-by: Mohamed Bassem <me@mbassem.com>
* fix: Collapse long runs of repeated whitespaces before tokenization to avoid ↵Mohamed Bassem2025-06-211-0/+9
| | | | choking the tokenizer. Fixes #1622
* fix: Use proper tokenizer when truncating for inference. Fixes #1405Mohamed Bassem2025-05-181-8/+8
|
* feat: Support customizing the summarization prompt. Fixes #731Mohamed Bassem2025-01-121-1/+5
|
* fix: Instruct the model to only respond with the summary when summarizing ↵Mohamed Bassem2024-11-241-1/+1
| | | | content
* feature: Add a summarize with AI button for linksMohamed Bassem2024-10-271-0/+14
|
* feature: Allow customizing the inference's context lengthMohamedBassem2024-10-121-2/+21
|
* feature(web): Add the ability to customize the inference prompts. Fixes #170MohamedBassem2024-09-291-0/+33