Commit Graph

5 Commits

Author SHA1 Message Date
Mohamed Bassem
ddd4b578cd fix: preserve failure count when rescheduling rate limited domains (#2303)
* fix: preserve retry count when rate-limited jobs are rescheduled

Previously, when a domain was rate-limited in the crawler worker,
the job would be re-enqueued as a new job, which reset the failure
count. This meant rate-limited jobs could retry indefinitely without
respecting the max retry limit.

This commit introduces a RateLimitRetryError exception that signals
the queue system to retry the job after a delay without counting it
as a failed attempt. The job is retried within the same invocation,
preserving the original retry count.

Changes:
- Add RateLimitRetryError class to shared/queueing.ts
- Update crawler worker to throw RateLimitRetryError instead of re-enqueuing
- Update Restate queue service to handle RateLimitRetryError with delay
- Update Liteque queue wrapper to handle RateLimitRetryError with delay

This ensures that rate-limited jobs respect the configured retry limits
while still allowing for delayed retries when domains are rate-limited.

* refactor: use liteque's native RetryAfterError for rate limiting

Instead of manually handling retries in a while loop, translate
RateLimitRetryError to liteque's native RetryAfterError. This is
cleaner and lets liteque handle the retry logic using its built-in
mechanism.

* test: add tests for RateLimitRetryError handling in restate queue

Added comprehensive tests to verify that:
1. RateLimitRetryError delays retry appropriately
2. Rate-limited retries don't count against the retry limit
3. Jobs can be rate-limited more times than the retry limit
4. Regular errors still respect the retry limit

These tests ensure the queue correctly handles rate limiting
without exhausting retry attempts.

* lint & format

* fix: prevent onError callback for RateLimitRetryError

Fixed two issues with RateLimitRetryError handling in restate queue:

1. RateLimitRetryError now doesn't trigger the onError callback since
   it's not a real error - it's an expected rate limiting behavior

2. Check for RateLimitRetryError in runWorkerLogic before calling onError,
   ensuring the instanceof check works correctly before the error gets
   further wrapped by restate

Updated tests to verify onError is not called for rate limit retries.

* fix: catch RateLimitRetryError before ctx.run wraps it

Changed approach to use a discriminated union instead of throwing
and catching RateLimitRetryError. Now we catch the error inside the
ctx.run callback before it gets wrapped by restate's TerminalError,
and return a RunResult type that indicates success, rate limit, or error.

This fixes the issue where instanceof checks would fail because
ctx.run wraps all errors in TerminalError.

* more fixes

* rename error name

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-25 12:46:45 +00:00
Mohamed Bassem
5426875949 feat: Introduce groupId in restate queue (#2168)
* feat: Introduce groupId in restate queue

* add group ids to the interface

* use last served timestamp
2025-11-24 01:23:06 +00:00
Mohamed Bassem
b28cd03a4a refactor: Allow runner functions to return results to onComplete 2025-11-09 20:13:39 +00:00
Mohamed Bassem
74a1f7b6b6 feat: Restate-based queue plugin (#2011)
* WIP: Initial restate integration

* add retry

* add delay + idempotency

* implement concurrency limits

* add admin stats

* add todos

* add id provider

* handle onComplete failures

* add tests

* add pub key and fix logging

* add priorities

* fail call after retries

* more fixes

* fix retries left

* some refactoring

* fix package.json

* upgrade sdk

* some test cleanups
2025-10-05 07:04:29 +01:00
Mohamed Bassem
8d32055485 refactor: Move callsites to liteque to be behind a plugin 2025-09-14 18:16:57 +00:00