The three-layer sync model
Most production Volunteer integrations have three distinct sync layers, running at different cadences and serving different purposes:| Layer | Cadence | Purpose | Cost |
|---|---|---|---|
| Initial sync | Once per customer at onboarding | Full backfill of historical data | High but bounded |
| Steady-state polling | Every 15-30 minutes per resource | Incremental change detection | Low per cycle, high cumulative |
| Reconciliation | Daily + weekly | Catch what polling missed; detect deletions | Medium; bounded |
Why three layers, not one
It’s tempting to “just poll” — and rely on polling to catch everything. The reasons that doesn’t work:- Polling can miss deletions. Deleted records don’t appear in
updated_afterqueries. - Polling can miss failed processing. A record was returned and the checkpoint advanced, but processing failed silently.
- Polling can’t recover from a lost checkpoint. If the checkpoint is corrupted, you don’t know where to restart.
Pattern 1: pull architecture (the Volunteer default)
Volunteer has no webhooks. Every integration is pull-based — partner integrations query the API on a schedule. This shapes everything else:| Pull pros | Pull cons |
|---|---|
| Simpler infrastructure (no public-facing endpoints) | Higher latency (depends on poll cadence) |
| Reliable through outages (next poll catches up) | Higher continuous API cost (every cycle = requests) |
| Easy to test (synchronous request/response) | Doesn’t naturally detect deletions |
| Failure handling localized to the polling worker | Hard to support sub-minute freshness |
When pull is the wrong choice
Pull architecture works well when:- The data doesn’t change second-to-second
- Sub-minute freshness is a comfort goal, not a hard need
- The integration can tolerate brief data lag
- Real-time signal is essential (live check-in dashboards, e.g.)
- The data volume is too high for full polling at the needed cadence
Pattern 2: hub-and-spoke architecture
For partner integrations serving many customers, the typical architecture is hub-and-spoke: The hub holds:- Per-customer tokens (with proper isolation)
- Shared infrastructure (workers, queues, DBs)
- Per-customer state (checkpoints, mappings, DLQs)
- Per-customer cadence configuration
Why hub-and-spoke beats per-customer-deployment
Some partners deploy a separate instance per customer. This is conceptually simpler but operationally heavier:| Concern | Per-customer deployment | Hub-and-spoke |
|---|---|---|
| Onboarding cost | High (new infrastructure per customer) | Low (just configuration) |
| Operational burden | High (N deployments to monitor) | Low (one system to monitor) |
| Cost efficiency | Poor (each customer pays for unused capacity) | Good (shared capacity) |
| Customer isolation | Excellent (physical separation) | Requires careful design (composite keys) |
| Onboarding speed | Days to weeks | Minutes to hours |
Shared vs per-customer infrastructure
Within the hub, decide what’s shared and what’s per-customer:| Component | Typical sharing |
|---|---|
| Compute (workers) | Shared with per-customer queues |
| Database | Shared with customer_id in every key |
| Caches | Shared with customer-prefixed keys |
| Credentials store | Per-customer encryption keys; shared service |
| Logging / observability | Shared with customer_id tag on every event |
| Rate-limit budgets | Per-customer (so one customer can’t exhaust the shared rate) |
| Background job queues | Shared workers, per-customer queues |
Pattern 3: per-resource worker decomposition
Within the polling layer, two main organizational shapes:Shape A: monolithic poll cycle
One worker handles all resources for a customer in sequence:JavaScript
Shape B: per-resource workers
Each resource has its own worker with its own cadence: Pros: Per-resource cadence tuning; isolated failures; scales independently. Cons: More schedules to manage; more state per customer.Choosing between them
For most partner integrations, start with Shape A (monolithic) and decompose to Shape B when you have evidence one resource needs different treatment. Premature decomposition adds complexity for hypothetical needs. Signals to decompose:- One resource’s poll takes much longer than others
- One resource genuinely needs a different cadence (e.g., Form Completions every 5 min, everything else hourly)
- One resource’s failures shouldn’t block the rest
Pattern 4: queue-based decoupling
For higher-scale or higher-reliability integrations, decouple polling from processing via a queue: The polling worker’s only job is to detect changes and publish them to the queue. The processing worker(s) consume the queue and do the actual destination writes.Why decouple
| Concern | Without decoupling | With queue |
|---|---|---|
| Slow destination | Blocks polling | Doesn’t block (queue absorbs) |
| Destination outage | Polling fails or accumulates lag | Polling continues; queue grows |
| Backpressure | Hard to throttle destination writes | Queue rate-limits naturally |
| Retry isolation | Tied to polling cycle | Independent retry topology |
| Horizontal scale | Hard (sequential polling) | Easy (more consumer workers) |
When the queue is worth it
| Customer scale | Queue value |
|---|---|
| Few customers, small data | Not worth it — monolith is fine |
| Many customers, modest data | Helpful for resilience |
| Few customers, large data | Helpful for backpressure |
| Many customers, large data | Essentially required |
Pattern 5: bidirectional sync (when needed)
Most Volunteer integrations are one-way (VOMO → external). But some need bidirectional sync — external system pushes changes back to VOMO:Bidirectional brings complications
| Concern | Why it matters |
|---|---|
| Loop detection | A change pushed in one direction may trigger a re-push in the other (infinite loop) |
| Conflict resolution | Both sides change the same field — which wins? |
| Email-as-key fragility | An email change in either direction breaks the join (see the email-change problem) |
| Authority modeling | Each field has a “source of truth” — must be explicit |
Loop detection pattern
Track the direction of recent changes and skip writes that would re-trigger a change you just processed:JavaScript
Authority configuration
Per-field authority declared explicitly:JavaScript
Pattern 6: multi-tenant scheduling
For hub-and-spoke integrations, scheduling polling across many customers needs to avoid:- All customers polling at the same minute (thundering herd against VOMO)
- All customers using rate budget on the same worker
- One customer’s heavy polling starving others
Staggered scheduling
Spread polls across the polling interval:JavaScript
Per-customer rate budget
JavaScript
Priority lanes
For customers paying for premium tier vs. standard tier:JavaScript
Pattern 7: regional and geographic considerations
For partner integrations serving customers in multiple regions:| Concern | Pattern |
|---|---|
| Data residency requirements (EU, etc.) | Per-region deployments with isolated databases |
| Latency to VOMO API | Less of an issue (API is hosted; client location matters less) |
| Latency to destination | Co-locate workers with the destination when possible |
| Time-of-day scheduling | Schedule polling in customer’s local time (3 AM customer-local for reconciliation) |
| Disaster recovery | Multi-region active-passive or active-active |
Pattern 8: progressive delivery and feature flags
When changing a sync integration in production, feature-flag the change:JavaScript
- Internal test customer first
- One or two friendly customers (with notification)
- 10% of customers
- 50% of customers
- All customers
Why feature flags matter for sync
Sync changes are subtle. A “small change” to the polling logic might:- Skip records that should be processed
- Re-process records that already were processed (wasted work)
- Advance the checkpoint incorrectly
- Trigger spurious side effects (welcome emails, notifications)
Pattern 9: separation of “infrastructure work” from “business work”
The polling worker shouldn’t know about your business logic; the business processor shouldn’t know about polling mechanics. Separate them cleanly:JavaScript
Pattern 10: operations playbook
Production integrations need a documented playbook for common operational scenarios. A starter checklist:| Scenario | Playbook |
|---|---|
| Customer’s token expired | (1) Disable polling for that customer; (2) Email customer to refresh; (3) Resume on confirmation |
| One customer’s sync is stuck | (1) Check checkpoint freshness; (2) Check DLQ; (3) Check error logs for trace IDs; (4) If DLQ has failures, investigate; otherwise restart worker |
| Many customers’ polling slowed | (1) Check VOMO API status; (2) Check rate-limit metrics; (3) If sustained, throttle down system-wide |
| New customer onboarding | (1) Verify token; (2) Run estimate; (3) Schedule backfill for low-traffic time; (4) Monitor backfill; (5) Enable polling after completion |
| Customer offboarding | (1) Disable polling and reconciliation; (2) Stop scheduled jobs; (3) Delete credentials; (4) Schedule data retention/deletion per agreement |
| Suspected data quality issue | (1) Capture sample; (2) Compare VOMO vs partner state; (3) Compare partner state vs destination; (4) Run targeted reconciliation; (5) Investigate root cause |
| Suspected security incident | (1) Disable affected customer’s integration immediately; (2) Rotate credentials; (3) Audit recent activity; (4) Document and report per policy |
Architecture maturity model
Where does your integration sit?| Level | Characteristics |
|---|---|
| 1: Working | Single polling cycle; basic error handling; per-customer state in memory |
| 2: Multi-tenant | Per-customer state in persistent store; staggered scheduling; shared workers |
| 3: Resilient | Queue-based decoupling; DLQ; circuit breakers; reconciliation; per-customer rate budgets |
| 4: Observable | Per-customer dashboards; structured logs with trace IDs; alerting on patterns |
| 5: Operationally mature | Feature-flagged rollouts; documented playbooks; on-call rotation; error budgets |
Decision framework
When designing a new sync integration, walk through these questions:| Question | Implication |
|---|---|
| What’s the freshness requirement? | Determines polling cadence |
| How many customers will it serve? | Determines hub-vs-deployment |
| What’s the scale per customer? | Determines queue-vs-monolith |
| Is bidirectional sync needed? | Determines authority + loop detection |
| What’s the SLA? | Determines retry/recovery/playbook depth |
| What’s the data sensitivity? | Determines security and audit requirements |
| What’s the team’s operational maturity? | Determines how many patterns to adopt vs defer |
Where to go next
Security and Credential Management
The security patterns that protect this architecture.
Versioning and Backward Compatibility
The patterns for surviving API changes.
Data Modeling
The data model that supports this architecture.
Error Recovery Patterns
The error-handling patterns this architecture relies on.