aboutsummaryrefslogtreecommitdiffstats
path: root/apps/web
diff options
context:
space:
mode:
authorMohamed Bassem <me@mbassem.com>2025-11-09 09:02:28 +0000
committerGitHub <noreply@github.com>2025-11-09 09:02:28 +0000
commit1b8129a28191c7093818060e39e968fc16bf24b4 (patch)
tree4e05ffad503a7ddf0391883169917a79079894f4 /apps/web
parentd9ef832e0b4fb04909a848ae948e22a01613c3b7 (diff)
downloadkarakeep-1b8129a28191c7093818060e39e968fc16bf24b4.tar.zst
feat: add failed_permanent metric for worker monitoring (#2107)
* feat: add last failure timestamp metric for worker monitoring Add a Prometheus Gauge metric to track the timestamp of the last failure for each worker. This complements the existing failed job counter by providing visibility into when failures last occurred for monitoring and alerting purposes. Changes: - Added workerLastFailureGauge metric in metrics.ts - Updated all 9 workers to set the gauge on failure: - crawler, feed, webhook, assetPreProcessing - inference, adminMaintenance, ruleEngine - video, search * refactor: track both all failures and permanent failures with counter Remove the gauge metric and use the existing counter to track both: - All failures (including retry attempts): status="failed" - Permanent failures (retries exhausted): status="failed_permanent" This provides better visibility into retry behavior and permanent vs temporary failures without adding a separate metric. Changes: - Removed workerLastFailureGauge from metrics.ts - Updated all 9 workers to track failed_permanent when numRetriesLeft == 0 - Maintained existing failed counter for all failure attempts * style: format worker files with prettier --------- Co-authored-by: Claude <noreply@anthropic.com>
Diffstat (limited to 'apps/web')
0 files changed, 0 insertions, 0 deletions