Skip to main content
Every Raise integration eventually encounters errors. Some are transient (gateway timeouts, network blips, brief rate-limit windows) and recover on retry. Some are permanent (validation failures, deleted records, revoked credentials) and require human intervention. The integrations that handle both cases well — distinguishing between them, retrying transient errors appropriately, surfacing permanent ones for review without spamming alerts — are the ones that stay up under production load. This page covers the classification framework, the retry patterns with backoff, circuit breakers for cascading failures, dead-letter queues for permanent failures, and the special considerations for POST /api/Raise/give (which charges payment methods and can’t be retried naively).

Classifying errors

The first and most important decision: is this error transient or permanent? The right response is wildly different.
Error classExamplesRight response
Permanent client400 validation, 404 not found, 403 forbiddenDon’t retry. The request itself needs to change.
Persistent-recoverable401 unauthorizedDon’t retry. The credential needs to be refreshed.
Transient429, 500, 502, 503, 504, network errors, TLS errorsRetry with exponential backoff.
AmbiguousSome 4xx codes, occasional weird responsesTreat conservatively — assume permanent unless evidence suggests otherwise.
The classifier is the foundation of all retry logic. Get it right, and everything else falls into place. Get it wrong, and you either hammer the API with futile retries or fail to retry transient errors that would succeed.

A reference classifier

JavaScript
function classifyError(status, problem) {
  // Network or TLS errors (status undefined) — always transient
  if (!status) return 'transient';

  // Success
  if (status >= 200 && status < 300) return 'success';

  // 3xx — typically not seen in API responses; treat as transient
  if (status >= 300 && status < 400) return 'transient';

  // 4xx — client errors
  if (status === 401) return 'persistent_recoverable'; // Credential
  if (status === 403) return 'permanent_client';
  if (status === 404) return 'permanent_client';
  if (status === 408) return 'transient'; // Request timeout
  if (status === 409) return 'permanent_client'; // Conflict
  if (status === 422) return 'permanent_client'; // Unprocessable entity
  if (status === 429) return 'transient'; // Rate limited
  if (status >= 400 && status < 500) {
    // 400 and other 4xx — typically validation failures
    return problem?.errors ? 'permanent_client' : 'permanent_client';
  }

  // 5xx — server errors, all transient
  if (status >= 500) return 'transient';

  // Unknown — be cautious
  return 'permanent_client';
}
This is a starting point; tune it based on what your integration actually encounters. Log unexpected classifications so you can refine the rules.

Retry patterns for transient errors

Transient errors are the bread-and-butter case for retry logic. The patterns:

Exponential backoff

The basic pattern: each retry waits longer than the previous, eventually giving up.
JavaScript
async function callWithRetry(url, options, maxAttempts = 5) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const response = await fetch(url, options);

    if (response.ok) return response;

    const classification = classifyError(response.status, await parseProblem(response));

    if (classification !== 'transient') {
      // Non-retryable — fail fast
      throw makeError(response, classification);
    }

    if (attempt === maxAttempts) {
      throw new Error(`Failed after ${maxAttempts} attempts`);
    }

    // Compute backoff: 1s, 2s, 4s, 8s, 16s (exponential)
    const baseDelay = Math.pow(2, attempt - 1) * 1000;

    // Add jitter to avoid thundering-herd retries
    const jitter = Math.random() * 1000;

    await sleep(baseDelay + jitter);
  }
}
Three things this pattern gets right:
  • Exponential growth means most retries happen quickly, but persistent failures don’t loop tightly.
  • Jitter prevents synchronized retries across many integrations from hammering the API simultaneously.
  • Bounded attempts ensure the retry loop terminates rather than retrying forever.

Honor Retry-After when present

For rate-limit responses, the server may include a Retry-After header indicating how long to wait. Honor it instead of computing a backoff:
JavaScript
async function callWithRetry(url, options, maxAttempts = 5) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const response = await fetch(url, options);

    if (response.ok) return response;

    const classification = classifyError(response.status, await parseProblem(response));

    if (classification !== 'transient') {
      throw makeError(response, classification);
    }

    if (attempt === maxAttempts) {
      throw new Error(`Failed after ${maxAttempts} attempts`);
    }

    // Honor Retry-After if the server provided it
    const retryAfter = response.headers.get('Retry-After');
    let delayMs;
    if (retryAfter) {
      delayMs = parseInt(retryAfter, 10) * 1000;
    } else {
      delayMs = Math.pow(2, attempt - 1) * 1000 + Math.random() * 1000;
    }

    await sleep(delayMs);
  }
}
The Raise OpenAPI spec doesn’t explicitly document the Retry-After header on rate-limit responses. The pattern assumes it’s present (following common HTTP convention) and falls back to exponential backoff if not. See Rate Limits for what’s known.

Don’t retry forever

A 5xx that persists for hours isn’t transient — it’s a sustained issue worth surfacing for human review. Bound retries at a reasonable number (typically 5 attempts) and move to a different handling strategy after that. See Dead-letter queues below.

Errors that should never be retried

Three classes of errors that retry only makes worse:

401 Unauthorized: the credential is bad

A 401 means the token is invalid, expired, or revoked. Retrying produces the same 401 indefinitely. The fix isn’t a retry — it’s a credential refresh.
JavaScript
if (response.status === 401) {
  await alertOps({
    severity: 'critical',
    customerId,
    message: 'Raise API token is invalid — credential needs refresh',
  });
  await pauseSyncForCustomer(customerId);
  throw new AuthError('Token invalid');
}
Pause the customer’s sync work until a human refreshes the token. Continuing to attempt requests with a bad token just generates noise.

400 validation failures: the request is bad

A 400 typically means the request body has a validation issue — a missing required field, an invalid value, a malformed structure. Retrying with the same body produces the same 400. The fix is to correct the request:
JavaScript
if (response.status === 400) {
  const problem = await response.json();
  if (problem.errors) {
    // Per-field validation errors
    throw new ValidationError('Request validation failed', problem.errors);
  }
  // Other 400 — payment failure, etc.
  throw new ClientError(problem.detail || problem.title);
}
For partner integrations submitting donations, a 400 from POST /api/Raise/give may also indicate a payment failure (card declined, gateway rejection). These also should not be retried with the same payment method — surface them to the donor for a different card.

404 Not Found: the resource doesn’t exist

A 404 from GET /api/Donor/12345 means donor 12345 doesn’t exist (or was deleted). No retry will make it appear. The fix is to handle the absence gracefully:
JavaScript
if (response.status === 404) {
  return null; // Let the caller decide what to do
}
Sometimes a 404 is expected (the integration was checking for existence). Sometimes it indicates a deeper issue (the donor was deleted between the integration learning about them and the lookup). Don’t retry; the right path depends on context.

Special case: POST /api/Raise/give retries

Donation submissions deserve special attention because they charge payment methods. A naive retry on a network error can produce double charges if the original request succeeded but the response didn’t reach the integration.

The double-charge risk

The integration sees one successful response. The donor sees two charges. The customer has to issue a refund for one of them. Avoid this.

The defensive pattern

For POST /api/Raise/give specifically, never retry network errors blindly. Instead:
JavaScript
async function submitDonationSafely(donationRequest, customerId) {
  // Generate a client-side tracking ID
  const trackingId = `${customerId}-${donationRequest.donor.email}-${Date.now()}`;

  // Record the intent before submitting
  await donationAttemptStore.recordIntent({
    trackingId,
    customerId,
    amount: donationRequest.amount,
    donorEmail: donationRequest.donor.email,
    submittedAt: new Date(),
    status: 'submitting',
  });

  try {
    const response = await fetch(
      'https://prod-api.raisedonors.com/api/Raise/give',
      {
        method: 'POST',
        headers: { /* ... */ },
        body: JSON.stringify(donationRequest),
      }
    );

    if (response.ok) {
      const gift = await response.json();
      await donationAttemptStore.recordSuccess(trackingId, gift.id);
      return gift;
    }

    // Non-OK response — classify
    const problem = await response.json();
    const classification = classifyError(response.status, problem);

    if (classification === 'permanent_client') {
      // Card declined, validation failed, etc. — don't retry
      await donationAttemptStore.recordFailure(trackingId, 'permanent', problem);
      throw new DonationError(problem.detail, response.status, problem);
    }

    // Transient — but don't retry blindly
    await donationAttemptStore.recordFailure(trackingId, 'transient', problem);
    throw new DonationError(`Donation submission failed: ${problem.detail}`, response.status);

  } catch (err) {
    if (err instanceof DonationError) throw err;

    // Network error — uncertain whether the donation went through
    await donationAttemptStore.recordFailure(trackingId, 'uncertain', { error: err.message });
    throw new UncertainDonationError(
      'Donation may or may not have been processed — requires reconciliation',
      err
    );
  }
}

Reconciling uncertain donations

When a network error leaves the outcome uncertain, the integration shouldn’t auto-retry. Instead, surface the uncertain donation for reconciliation:
JavaScript
async function reconcileUncertainDonations() {
  const uncertain = await donationAttemptStore.findUncertain();

  for (const attempt of uncertain) {
    // Look for matching gifts in Raise from around the attempt time
    const candidates = await fetch(
      'https://prod-api.raisedonors.com/api/Gift/query',
      {
        method: 'POST',
        headers: { /* ... */ },
        body: JSON.stringify({
          skip: 0,
          take: 10,
          groups: [
            {
              conditions: [
                { parameter: 'donorEmail', operator: EQUALS, value: attempt.donorEmail },
                { parameter: 'amount', operator: EQUALS, value: attempt.amount.toString() },
                { parameter: 'date', operator: GT_OPERATOR, value: attempt.submittedAt },
              ],
              conjunct: AND_CONJUNCT,
            },
          ],
        }),
      }
    ).then((r) => r.json());

    if (candidates.items.length > 0) {
      // Match found — the donation did go through
      await donationAttemptStore.recordSuccess(attempt.trackingId, candidates.items[0].id);
    } else {
      // No match — the donation did not go through; safe to retry
      await donationAttemptStore.markRetryable(attempt.trackingId);
    }
  }
}
Run this on a short cadence (every few minutes) to resolve uncertain attempts. Only after confirming the original attempt didn’t go through is it safe to retry.
This is the recommended pattern only because the Raise spec doesn’t currently document an idempotency-key header that would solve the problem more elegantly. When such a header becomes available, use it instead — it’s a more robust solution than client-side reconciliation.⚠️ Spec gap: No Idempotency-Key header is documented for POST /api/Raise/give. Confirm whether the platform supports one before relying on the client-side reconciliation pattern.

Circuit breakers

For workloads that touch many records, a sustained failure can produce a cascade — many in-flight requests all hitting the same issue, all retrying, all eventually failing. A circuit breaker stops the cascade by short-circuiting requests after a threshold of failures.

A basic circuit breaker

JavaScript
class CircuitBreaker {
  constructor({ failureThreshold = 10, resetTimeoutMs = 60000 }) {
    this.failureThreshold = failureThreshold;
    this.resetTimeoutMs = resetTimeoutMs;
    this.state = 'closed';        // 'closed' | 'open' | 'half-open'
    this.failureCount = 0;
    this.openedAt = null;
  }

  async call(fn) {
    if (this.state === 'open') {
      if (Date.now() - this.openedAt > this.resetTimeoutMs) {
        this.state = 'half-open';
      } else {
        throw new CircuitBreakerOpenError('Circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      if (this.state === 'half-open') {
        // Success after half-open — close the circuit
        this.state = 'closed';
        this.failureCount = 0;
      }
      return result;
    } catch (err) {
      this.failureCount++;
      if (this.failureCount >= this.failureThreshold) {
        this.state = 'open';
        this.openedAt = Date.now();
      }
      throw err;
    }
  }
}
Use one circuit breaker per logical operation (per-endpoint, per-customer, or per-destination) so a failure in one doesn’t disrupt others:
JavaScript
const breakers = new Map();

function getBreaker(key) {
  if (!breakers.has(key)) {
    breakers.set(key, new CircuitBreaker({ failureThreshold: 10, resetTimeoutMs: 60000 }));
  }
  return breakers.get(key);
}

async function callRaise(url, options, breakerKey) {
  return getBreaker(breakerKey).call(() => callWithRetry(url, options));
}
When the breaker opens, in-flight requests fail fast rather than producing further retries. After the reset timeout, the breaker tentatively allows a few requests through (“half-open”). If they succeed, the breaker closes; if not, it stays open.

When to use circuit breakers

Use a breakerDon’t bother
Operations that produce many concurrent requests (bulk syncs, parallel reads)Single-request workflows (one-off API calls)
Operations that hit shared downstream resourcesOperations against many independent endpoints
Operations that are expensive to retry (payment processing)Operations that are cheap and idempotent
For partner integrations operating at scale (hundreds of customers, thousands of requests per minute), circuit breakers prevent localized issues from cascading into widespread degradation.

Dead-letter queues

When all retries fail, the operation can’t continue. Two options: drop it silently (bad — lost work) or move it to a dead-letter queue for human review (good).

A dead-letter pattern

CREATE TABLE dead_letter_queue (
  id BIGSERIAL PRIMARY KEY,
  customer_id TEXT NOT NULL,
  operation_type TEXT NOT NULL,
  operation_payload JSONB NOT NULL,
  last_error TEXT NOT NULL,
  last_error_status INTEGER,
  attempts INTEGER NOT NULL,
  first_attempted_at TIMESTAMPTZ NOT NULL,
  last_attempted_at TIMESTAMPTZ NOT NULL,
  resolved_at TIMESTAMPTZ,
  resolution TEXT
);
The flow:
1

Operation fails permanently or exhausts retries

A 400 validation error, a sustained 5xx, or a network error that doesn’t recover.
2

Move the operation to the dead-letter queue

Capture the full operation payload, the last error, and the attempt history.
3

Continue processing other operations

One bad operation doesn’t block the queue.
4

Surface the dead-letter entry for review

Alert or daily digest to ops; expose in a UI for support staff.
5

Investigate and resolve

Either fix the underlying issue and replay, or mark the operation as permanently lost.

Replaying from the dead-letter queue

For operations that failed due to a transient issue that’s now resolved, replay them:
JavaScript
async function replayDeadLetter(dlqId) {
  const entry = await dlq.findById(dlqId);

  try {
    await performOperation(entry.operation_payload);
    await dlq.markResolved(dlqId, 'replay_succeeded');
  } catch (err) {
    // Failed again — update the attempt count, leave in DLQ
    await dlq.recordAdditionalFailure(dlqId, err);
    throw err;
  }
}
A reasonable UI: an ops dashboard showing dead-letter entries with “replay” and “mark resolved” buttons. Most entries are resolved by replay once the underlying issue is fixed (credential renewed, downstream system back up, etc.).

Surfacing errors to humans

Not every error needs to wake someone up. A reasonable severity model:
SeverityWhat triggers itResponse
PageSustained failures affecting many customers; widespread integration outageOn-call engineer woken up
High alertSingle-customer issue blocking critical workflow (sync paused, donations failing)Same-day investigation
WarningElevated error rate, dead-letter accumulation, expired credentialsNext-business-day review
Info / log onlyPer-request retries, expected 404s, dedup decisionsNo alert; available in logs
The right thresholds depend on the integration’s SLA. For a major-donor-focused integration, even a single failed POST /api/Raise/give may warrant a same-day investigation. For a low-priority analytics sync, the same failure might be a warning aggregated into a daily digest.

Useful alert content

A useful error alert includes:
  • The customer affected
  • The operation that failed
  • The error classification (transient, permanent, uncertain)
  • The number of attempts made
  • The last error message
  • A link to the dead-letter entry (or wherever the operation can be inspected and replayed)
  • Suggested next steps based on the error type
JavaScript
async function alertOnDeadLetter(entry) {
  await alerter.send({
    severity: classifyDeadLetterSeverity(entry),
    title: `Operation ${entry.operation_type} failed permanently`,
    fields: {
      customer: entry.customer_id,
      attempts: entry.attempts,
      lastError: entry.last_error,
      lastStatus: entry.last_error_status,
    },
    links: [
      { label: 'View in dashboard', url: `https://ops.example.com/dlq/${entry.id}` },
      { label: 'Replay', url: `https://ops.example.com/dlq/${entry.id}/replay` },
    ],
    suggestedActions: suggestActions(entry),
  });
}

function suggestActions(entry) {
  if (entry.last_error_status === 401) {
    return ['Verify the customer\'s API token', 'Contact customer to issue new token'];
  }
  if (entry.last_error_status === 400) {
    return ['Inspect the payload', 'Update integration logic if validation rule changed'];
  }
  if (!entry.last_error_status) {
    return ['Network issue — try replaying', 'Check Raise API status if recurring'];
  }
  return ['Investigate via dashboard'];
}

Idempotency: the underlying defense

Most error-recovery patterns rely on idempotency — the property that repeating an operation produces the same outcome as running it once. Build idempotency into operations from the start.

Webhook handlers

See Idempotency and Safe Reprocessing for the full pattern. Summary: every event has a unique key (typically contextId + eventType + modifiedDate), and the dedup store records processed events. Retried deliveries skip re-processing.

Downstream writes

For partner integrations writing to external systems, use the external system’s idempotency mechanisms:
  • Database upserts keyed by Raise resource IDs.
  • Idempotency keys on third-party API calls (Stripe, Slack, many modern APIs support them).
  • Conditional logic that checks for existing records before creating new ones.
The combination of webhook-level dedup and downstream-write idempotency ensures retries are safe to perform.

POST /api/Raise/give — the special case

Donation submissions are the most challenging case because the operation is genuinely not idempotent at the API level (no documented idempotency-key header). The client-side reconciliation pattern (see The defensive pattern above) is the workaround. When an idempotency-key header becomes available, switch to it.

A complete error-handling pipeline

Putting the patterns together:
JavaScript
class RaiseClient {
  constructor({ token, customerId, breaker, dlq, attemptStore }) {
    this.token = token;
    this.customerId = customerId;
    this.breaker = breaker;
    this.dlq = dlq;
    this.attemptStore = attemptStore;
  }

  async call(url, options = {}, opName = 'unknown') {
    return this.breaker.call(async () => {
      try {
        return await this._callWithRetry(url, options);
      } catch (err) {
        if (err.classification === 'permanent_client') {
          await this.dlq.add({
            customerId: this.customerId,
            operationType: opName,
            operationPayload: { url, options },
            lastError: err.message,
            lastErrorStatus: err.status,
            attempts: err.attempts,
            firstAttemptedAt: err.firstAttemptedAt,
            lastAttemptedAt: new Date(),
          });
          await alertOnError(err, this.customerId, opName);
        }
        throw err;
      }
    });
  }

  async _callWithRetry(url, options, maxAttempts = 5) {
    const firstAttemptedAt = new Date();
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      let response;
      try {
        response = await fetch(url, {
          ...options,
          headers: {
            Authorization: `Bearer ${this.token}`,
            'Content-Type': 'application/json',
            ...(options.headers ?? {}),
          },
        });
      } catch (err) {
        // Network error
        if (attempt === maxAttempts) {
          const finalErr = new Error(err.message);
          finalErr.classification = 'transient';
          finalErr.attempts = attempt;
          finalErr.firstAttemptedAt = firstAttemptedAt;
          throw finalErr;
        }
        await sleep(this._backoff(attempt));
        continue;
      }

      if (response.ok) return response.json();

      const problem = await parseProblem(response);
      const classification = classifyError(response.status, problem);

      if (classification !== 'transient') {
        const err = new Error(problem?.detail ?? problem?.title ?? 'Request failed');
        err.classification = classification;
        err.status = response.status;
        err.problem = problem;
        err.attempts = attempt;
        err.firstAttemptedAt = firstAttemptedAt;
        throw err;
      }

      if (attempt === maxAttempts) {
        const err = new Error(`Transient failure after ${maxAttempts} attempts`);
        err.classification = 'transient';
        err.status = response.status;
        err.attempts = attempt;
        err.firstAttemptedAt = firstAttemptedAt;
        throw err;
      }

      // Honor Retry-After if present
      const retryAfter = response.headers.get('Retry-After');
      const delay = retryAfter ? parseInt(retryAfter, 10) * 1000 : this._backoff(attempt);
      await sleep(delay);
    }
  }

  _backoff(attempt) {
    return Math.pow(2, attempt - 1) * 1000 + Math.random() * 1000;
  }
}
Use this pattern for every Raise API call. The cost of writing the pattern once is small; the cost of not having it is paid at every incident.

A recovery checklist

When designing a Raise integration, walk through these questions:
  • Every API call goes through a function that classifies and retries appropriately
  • POST /api/Raise/give uses the client-side reconciliation pattern, not naive retry
  • Webhook handlers are idempotent — retries don’t produce duplicate side effects
  • Bulk operations use circuit breakers to prevent cascade failures
  • Permanently-failed operations go to a dead-letter queue
  • Dead-letter entries are surfaced to ops with enough context to investigate
  • 401 responses pause the affected customer’s work and alert ops, rather than retrying
  • Network errors and 5xx responses retry with exponential backoff + jitter
  • Retry-After headers are honored when present
  • Rate-limit 429 responses are visible in metrics
Most of these are small individually. Together, they make the difference between an integration that requires constant manual intervention and one that recovers from most failures on its own.

Where to go next

Sync Architecture Patterns

The architectural patterns these error-recovery patterns plug into.

API Performance Tips

Performance patterns complementary to recovery — fewer requests means fewer chances to fail.

Rate Limits

The 429 patterns referenced throughout this page.

Idempotency and Safe Reprocessing

Webhook-specific idempotency that pairs with API error recovery.
Last modified on May 21, 2026