A complete recipe for batched, scheduled sync between an external platform and Virtuous CRM+ — when nightly is the right choice, how to structure the job, and how to recover from interruption.
Event-driven sync (the architecture in Sync External Donations into Virtuous and most integration recipes) is the default recommendation for most partner integrations. But event-driven sync depends on the source platform supporting webhooks, the customer’s environment supporting persistent webhook receivers, and the data freshness requirement justifying always-on infrastructure. When any of those don’t hold, nightly sync — a scheduled batch job that pulls changes from the source platform and pushes them to Virtuous — is the right architecture.This recipe covers the full nightly sync pattern: when to choose it, how to structure the job, how to handle interruption and retry, and how to monitor it.
Some customers can’t run always-on infrastructure; a scheduled job is operationally simpler.
Data freshness requirement is tolerant
If “yesterday’s data, today” is acceptable (typical for reporting, accounting reconciliation, donor analytics), the latency of a nightly run is fine.
Source platform’s API quota is more constrained than Virtuous’s
Polling once nightly consumes less source-side quota than continuous polling or webhook ingestion.
Customer’s operations are already batch-oriented
Monthly accounting close, weekly BI reports — nightly sync aligns naturally with these.
Most importantly: nightly sync is not a worse architecture than event-driven. It’s a different tradeoff. The right pattern is the one that matches the customer’s actual operational needs.
Do not choose nightly sync because event-driven seems hard. Event-driven is the right choice for most integrations because the freshness benefit is substantial for the customer’s day-to-day operations. Choose nightly only when one of the signals above genuinely applies.
Job scheduler — cron, Kubernetes CronJob, AWS EventBridge, or whatever scheduling primitive your environment provides.
Sync job — a single binary or script that performs the full sync.
Source platform read — pull changes since the last checkpoint.
Virtuous write — apply changes via the appropriate write endpoints.
State store — persistent storage for the checkpoint timestamp and any per-record sync metadata.
A nightly sync is simpler than event-driven because it runs in one place at one time. It’s also more demanding because that one run needs to handle the full hour-long (or longer) window in which something might go wrong.
The checkpoint is the heart of incremental nightly sync. It tracks where the last successful run stopped so the current run knows where to start.
CREATE TABLE nightly_sync_checkpoints ( customer_id TEXT PRIMARY KEY, last_run_id TEXT, last_run_completed_at TIMESTAMPTZ, last_source_timestamp TIMESTAMPTZ, -- the highest source-side timestamp processed last_virtuous_timestamp TIMESTAMPTZ, -- if doing bidirectional, the highest Virtuous timestamp consecutive_failures INTEGER NOT NULL DEFAULT 0, paused_until TIMESTAMPTZ -- circuit-breaker: skip runs while paused);
Two things the checkpoint stores:
The highest source timestamp processed. Use this as the floor for the next run’s source query — pull everything modified after this value.
Failure metadata. A consecutive_failures counter and a paused_until field implement a simple circuit breaker: after N failed runs, pause the schedule until manual intervention.
The checkpoint should track the highest source-side modification timestamp processed, not the wall-clock time when the last run started. The latter would miss any source-side changes that happened during the run itself. The former guarantees a record modified during the run will be picked up by the next run.
The source platform’s pagination shape varies — some use cursors, some use page numbers, some use since + until ranges. Adapt the loop to the source’s API.
The source-read pattern depends on the source platform’s API. Three common shapes:
Source API style
Pattern
Modification-timestamp filter
Query for records with modified_after > checkpoint. The most common case.
Cursor-based “changes since”
Pass a cursor from the last run; the API returns everything new.
Full snapshot + diff
Pull every record, diff against your local copy to find changes. Used when the source doesn’t expose modification timestamps.
For most modern APIs, the modification-timestamp filter is what’s available. The cursor pattern is more efficient when the API supports it. Full snapshot is the fallback when neither is available — it’s expensive but works.
async function readChangesViaFullSnapshot(customerId) { const sourceToken = await loadSourceToken(customerId); const currentSnapshot = await fetchFullSourceSnapshot(sourceToken); const previousSnapshot = await loadPreviousSnapshot(customerId); const changes = []; const previousById = new Map(previousSnapshot.map((r) => [r.id, r])); for (const current of currentSnapshot) { const previous = previousById.get(current.id); if (!previous || diffRecord(current, previous)) { changes.push({ type: previous ? 'update' : 'create', record: current }); } previousById.delete(current.id); } // Anything left in previousById is a deletion for (const deleted of previousById.values()) { changes.push({ type: 'delete', record: deleted }); } // Save the current snapshot for next run await persistSnapshot(customerId, currentSnapshot); return changes;}
The cost is the storage of the previous snapshot and the comparison time. For sources with tens of thousands of records, this is acceptable nightly; for millions, it’s not.
The Virtuous rate limit is 1,500 requests per hour per organization — see Rate Limits. For a nightly sync with thousands of changes, throttle the submission rate:
JavaScript
async function applyChangesWithThrottling(virtuousToken, changes) { const REQUESTS_PER_HOUR = 1200; // Conservative — leave 20% headroom const MS_BETWEEN_REQUESTS = (60 * 60 * 1000) / REQUESTS_PER_HOUR; let lastRequestAt = 0; const results = { successes: 0, failures: 0, failureDetails: [] }; for (const change of changes) { // Pace the requests const elapsed = Date.now() - lastRequestAt; if (elapsed < MS_BETWEEN_REQUESTS) { await sleep(MS_BETWEEN_REQUESTS - elapsed); } lastRequestAt = Date.now(); try { await applyChange(virtuousToken, change); results.successes++; } catch (err) { if (isRetryable(err)) { // Re-queue for the next run await persistDeferredChange(change, err); } else { results.failures++; results.failureDetails.push({ change, error: err.message }); } } } return results;}
At 1,200 requests/hour (20% headroom), a job processes 1,200 changes per hour. A 10,000-change run takes roughly 8.3 hours — typically fitting in the overnight window.For larger workloads, raise the throttle closer to the limit (1,400/hour leaves 7% headroom). Don’t run at the cap — a single rate-limited request stops the run mid-stream until the limit resets.
Retryable (5xx, 429, network error): re-queue for the next nightly run.
Permanent (400, 422): log and surface for human investigation. Do not retry on the next run.
The difference from event-driven sync is the retry cadence — nightly retries are 24 hours apart, not minutes apart. For genuinely transient failures this is usually fine; for failures that look transient but are actually permanent (a misconfigured field that produces 422 every time), the slower cadence makes the misdiagnosis cheaper.
The checkpoint update commits the run’s progress. If anything fails after the writes succeed but before the checkpoint is updated, the next run will re-process the same changes — your idempotency layer needs to handle this (see Idempotency and Safe Reprocessing).
Nightly jobs are vulnerable to interruption: the scheduler kills the job after a timeout, the host machine restarts, the network drops mid-run. Make the job resumable.The pattern: persist progress within the job, not just at the end:
JavaScript
async function applyChangesResumable(virtuousToken, changes, runId) { // Check if a previous attempt for this run exists const prevProgress = await db.partial_run_progress.find({ run_id: runId }); const startIndex = prevProgress?.last_completed_index ?? 0; for (let i = startIndex; i < changes.length; i++) { await applyChange(virtuousToken, changes[i]); // Persist progress every N records if (i % 100 === 0) { await db.partial_run_progress.upsert({ run_id: runId, last_completed_index: i, last_persisted_at: new Date(), }); } } // Clean up partial-progress record after successful completion await db.partial_run_progress.delete({ run_id: runId });}
A killed job restarts and resumes from the last persisted index rather than starting over.For very long-running jobs, persist progress more frequently. The tradeoff: more frequent persistence means lower replay cost after interruption but higher steady-state I/O. Every 100 records (or every 30 seconds) is a reasonable default.
After three consecutive failures, the sync pauses for 24 hours. An ops human must investigate, fix the root cause, and manually clear paused_until to resume. This prevents “sync has been failing for three weeks but nobody noticed” scenarios.
For partner integrations serving many customers, run a separate scheduled job per customer. The patterns to follow:
Stagger start times. Don’t run all customers’ syncs at midnight; spread them across the overnight window. This isolates rate-limit budgets and keeps any single Virtuous account from being hammered by your infrastructure.
Per-customer state. The checkpoint, credentials, and run report are scoped by customer_id.
Per-customer credentials. Each customer has their own Virtuous API token and their own source-platform credentials, loaded from secrets manager.
Per-customer alerts. A failure in one customer’s sync should alert ops about that customer specifically, not as part of a generic “sync failed” notification.
A typical setup: a cron expression that fires once per hour, each invocation processing the customers whose scheduled time slot has arrived. This naturally staggers the load.
Nightly: marketing platform subscriber sync (no webhooks), data warehouse export, accounting reconciliation.
The two pipelines are independent — different schedules, different code paths, different alerting. Just make sure they share idempotency keys for any resource they both touch, so a nightly run that overlaps with an event-driven write doesn’t produce duplicates.
Most nightly sync issues show up first as a duration regression — the job is doing more work than expected and starts spilling out of its window. The second-most-common issue is checkpoint staleness, which a simple “has the sync run successfully in the last 24 hours?” check catches quickly.