How Raise integrations should handle errors — classifying failures, retry strategies with backoff, circuit breakers, dead-letter queues, and the special considerations for payment-processing operations.
Every Raise integration eventually encounters errors. Some are transient (gateway timeouts, network blips, brief rate-limit windows) and recover on retry. Some are permanent (validation failures, deleted records, revoked credentials) and require human intervention. The integrations that handle both cases well — distinguishing between them, retrying transient errors appropriately, surfacing permanent ones for review without spamming alerts — are the ones that stay up under production load.This page covers the classification framework, the retry patterns with backoff, circuit breakers for cascading failures, dead-letter queues for permanent failures, and the special considerations for POST /api/Raise/give (which charges payment methods and can’t be retried naively).
The classifier is the foundation of all retry logic. Get it right, and everything else falls into place. Get it wrong, and you either hammer the API with futile retries or fail to retry transient errors that would succeed.
For rate-limit responses, the server may include a Retry-After header indicating how long to wait. Honor it instead of computing a backoff:
JavaScript
async function callWithRetry(url, options, maxAttempts = 5) { for (let attempt = 1; attempt <= maxAttempts; attempt++) { const response = await fetch(url, options); if (response.ok) return response; const classification = classifyError(response.status, await parseProblem(response)); if (classification !== 'transient') { throw makeError(response, classification); } if (attempt === maxAttempts) { throw new Error(`Failed after ${maxAttempts} attempts`); } // Honor Retry-After if the server provided it const retryAfter = response.headers.get('Retry-After'); let delayMs; if (retryAfter) { delayMs = parseInt(retryAfter, 10) * 1000; } else { delayMs = Math.pow(2, attempt - 1) * 1000 + Math.random() * 1000; } await sleep(delayMs); }}
The Raise OpenAPI spec doesn’t explicitly document the Retry-After header on rate-limit responses. The pattern assumes it’s present (following common HTTP convention) and falls back to exponential backoff if not. See Rate Limits for what’s known.
A 5xx that persists for hours isn’t transient — it’s a sustained issue worth surfacing for human review. Bound retries at a reasonable number (typically 5 attempts) and move to a different handling strategy after that. See Dead-letter queues below.
A 400 typically means the request body has a validation issue — a missing required field, an invalid value, a malformed structure. Retrying with the same body produces the same 400. The fix is to correct the request:
JavaScript
if (response.status === 400) { const problem = await response.json(); if (problem.errors) { // Per-field validation errors throw new ValidationError('Request validation failed', problem.errors); } // Other 400 — payment failure, etc. throw new ClientError(problem.detail || problem.title);}
For partner integrations submitting donations, a 400 from POST /api/Raise/give may also indicate a payment failure (card declined, gateway rejection). These also should not be retried with the same payment method — surface them to the donor for a different card.
A 404 from GET /api/Donor/12345 means donor 12345 doesn’t exist (or was deleted). No retry will make it appear. The fix is to handle the absence gracefully:
JavaScript
if (response.status === 404) { return null; // Let the caller decide what to do}
Sometimes a 404 is expected (the integration was checking for existence). Sometimes it indicates a deeper issue (the donor was deleted between the integration learning about them and the lookup). Don’t retry; the right path depends on context.
Donation submissions deserve special attention because they charge payment methods. A naive retry on a network error can produce double charges if the original request succeeded but the response didn’t reach the integration.
When a network error leaves the outcome uncertain, the integration shouldn’t auto-retry. Instead, surface the uncertain donation for reconciliation:
JavaScript
async function reconcileUncertainDonations() { const uncertain = await donationAttemptStore.findUncertain(); for (const attempt of uncertain) { // Look for matching gifts in Raise from around the attempt time const candidates = await fetch( 'https://prod-api.raisedonors.com/api/Gift/query', { method: 'POST', headers: { /* ... */ }, body: JSON.stringify({ skip: 0, take: 10, groups: [ { conditions: [ { parameter: 'donorEmail', operator: EQUALS, value: attempt.donorEmail }, { parameter: 'amount', operator: EQUALS, value: attempt.amount.toString() }, { parameter: 'date', operator: GT_OPERATOR, value: attempt.submittedAt }, ], conjunct: AND_CONJUNCT, }, ], }), } ).then((r) => r.json()); if (candidates.items.length > 0) { // Match found — the donation did go through await donationAttemptStore.recordSuccess(attempt.trackingId, candidates.items[0].id); } else { // No match — the donation did not go through; safe to retry await donationAttemptStore.markRetryable(attempt.trackingId); } }}
Run this on a short cadence (every few minutes) to resolve uncertain attempts. Only after confirming the original attempt didn’t go through is it safe to retry.
This is the recommended pattern only because the Raise spec doesn’t currently document an idempotency-key header that would solve the problem more elegantly. When such a header becomes available, use it instead — it’s a more robust solution than client-side reconciliation.⚠️ Spec gap: No Idempotency-Key header is documented for POST /api/Raise/give. Confirm whether the platform supports one before relying on the client-side reconciliation pattern.
For workloads that touch many records, a sustained failure can produce a cascade — many in-flight requests all hitting the same issue, all retrying, all eventually failing. A circuit breaker stops the cascade by short-circuiting requests after a threshold of failures.
Use one circuit breaker per logical operation (per-endpoint, per-customer, or per-destination) so a failure in one doesn’t disrupt others:
JavaScript
const breakers = new Map();function getBreaker(key) { if (!breakers.has(key)) { breakers.set(key, new CircuitBreaker({ failureThreshold: 10, resetTimeoutMs: 60000 })); } return breakers.get(key);}async function callRaise(url, options, breakerKey) { return getBreaker(breakerKey).call(() => callWithRetry(url, options));}
When the breaker opens, in-flight requests fail fast rather than producing further retries. After the reset timeout, the breaker tentatively allows a few requests through (“half-open”). If they succeed, the breaker closes; if not, it stays open.
Operations that produce many concurrent requests (bulk syncs, parallel reads)
Single-request workflows (one-off API calls)
Operations that hit shared downstream resources
Operations against many independent endpoints
Operations that are expensive to retry (payment processing)
Operations that are cheap and idempotent
For partner integrations operating at scale (hundreds of customers, thousands of requests per minute), circuit breakers prevent localized issues from cascading into widespread degradation.
When all retries fail, the operation can’t continue. Two options: drop it silently (bad — lost work) or move it to a dead-letter queue for human review (good).
CREATE TABLE dead_letter_queue ( id BIGSERIAL PRIMARY KEY, customer_id TEXT NOT NULL, operation_type TEXT NOT NULL, operation_payload JSONB NOT NULL, last_error TEXT NOT NULL, last_error_status INTEGER, attempts INTEGER NOT NULL, first_attempted_at TIMESTAMPTZ NOT NULL, last_attempted_at TIMESTAMPTZ NOT NULL, resolved_at TIMESTAMPTZ, resolution TEXT);
The flow:
1
Operation fails permanently or exhausts retries
A 400 validation error, a sustained 5xx, or a network error that doesn’t recover.
2
Move the operation to the dead-letter queue
Capture the full operation payload, the last error, and the attempt history.
3
Continue processing other operations
One bad operation doesn’t block the queue.
4
Surface the dead-letter entry for review
Alert or daily digest to ops; expose in a UI for support staff.
5
Investigate and resolve
Either fix the underlying issue and replay, or mark the operation as permanently lost.
For operations that failed due to a transient issue that’s now resolved, replay them:
JavaScript
async function replayDeadLetter(dlqId) { const entry = await dlq.findById(dlqId); try { await performOperation(entry.operation_payload); await dlq.markResolved(dlqId, 'replay_succeeded'); } catch (err) { // Failed again — update the attempt count, leave in DLQ await dlq.recordAdditionalFailure(dlqId, err); throw err; }}
A reasonable UI: an ops dashboard showing dead-letter entries with “replay” and “mark resolved” buttons. Most entries are resolved by replay once the underlying issue is fixed (credential renewed, downstream system back up, etc.).
The right thresholds depend on the integration’s SLA. For a major-donor-focused integration, even a single failed POST /api/Raise/give may warrant a same-day investigation. For a low-priority analytics sync, the same failure might be a warning aggregated into a daily digest.
Most error-recovery patterns rely on idempotency — the property that repeating an operation produces the same outcome as running it once. Build idempotency into operations from the start.
See Idempotency and Safe Reprocessing for the full pattern. Summary: every event has a unique key (typically contextId + eventType + modifiedDate), and the dedup store records processed events. Retried deliveries skip re-processing.
Donation submissions are the most challenging case because the operation is genuinely not idempotent at the API level (no documented idempotency-key header). The client-side reconciliation pattern (see The defensive pattern above) is the workaround. When an idempotency-key header becomes available, switch to it.
When designing a Raise integration, walk through these questions:
Every API call goes through a function that classifies and retries appropriately
POST /api/Raise/give uses the client-side reconciliation pattern, not naive retry
Webhook handlers are idempotent — retries don’t produce duplicate side effects
Bulk operations use circuit breakers to prevent cascade failures
Permanently-failed operations go to a dead-letter queue
Dead-letter entries are surfaced to ops with enough context to investigate
401 responses pause the affected customer’s work and alert ops, rather than retrying
Network errors and 5xx responses retry with exponential backoff + jitter
Retry-After headers are honored when present
Rate-limit 429 responses are visible in metrics
Most of these are small individually. Together, they make the difference between an integration that requires constant manual intervention and one that recovers from most failures on its own.