Skip to content

Commit 7bb606a

Browse files
authored
Cap retry seconds after applying jitter in case jitter brought it over limit (#1023)
Response to #1022: in the default retry policy, cap retry seconds to `maxDurationSeconds` after applying jitter just in case the application of jitter brought it over limit. This should never realistically happen due to the way exponentials work. If attempt N would bring retry seconds over `maxDurationSeconds`, then attempt N-1 will likely be nowhere near `maxDurationSeconds` and a 10% jitter won't be able to bring it over, but still, capping it really doesn't hurt, so just do it as an additional shore up. Fixes #1022.
1 parent 9f16f8f commit 7bb606a

1 file changed

Lines changed: 9 additions & 8 deletions

File tree

retry_policy.go

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,11 @@ type DefaultClientRetryPolicy struct {
4747
// equivalent of the maximum of time.Duration to each retry, about 292 years.
4848
// The schedule is no longer exponential past this point.
4949
func (p *DefaultClientRetryPolicy) NextRetry(job *rivertype.JobRow) time.Time {
50-
// For the purposes of calculating the backoff, we can look solely at the
51-
// number of errors. If we were to use the raw attempt count, this would be
52-
// incremented and influenced by snoozes. However the use case for snoozing is
53-
// to try again later *without* counting as an error.
54-
//
55-
// Note that we explicitly add 1 here, because the error hasn't been appended
56-
// yet at the time this is called (that happens in the completer).
50+
// In modern versions of River `len(job.Errors)` is the same number as
51+
// `attempt`. However, in older version snoozing a job wouldn't restore its
52+
// attempt count to the pre-fetch value, and that would lead to incorrect
53+
// retry durations when jobs are first snoozed, then retried. To avoid this
54+
// and keep backward compatibility, the number of errors are used instead.
5755
errorCount := len(job.Errors) + 1
5856

5957
return p.timeNowUTC().Add(timeutil.SecondsAsDuration(p.retrySeconds(errorCount)))
@@ -89,7 +87,10 @@ func (p *DefaultClientRetryPolicy) retrySeconds(attempt int) float64 {
8987
// Jitter number of seconds +/- 10%.
9088
retrySeconds += retrySeconds * (rand.Float64()*0.2 - 0.1)
9189

92-
return retrySeconds
90+
// Cap retrySeconds once more in case adding random jitter pushed it over
91+
// maxDurationSeconds. (This should never realistically happen, but protect
92+
// against it just in case.)
93+
return min(retrySeconds, maxDurationSeconds)
9394
}
9495

9596
// Gets a base number of retry seconds for the given attempt, jitter excluded.

0 commit comments

Comments
 (0)