Sync Architecture Patterns - Virtuous API Docs

By this point in the docs, you’ve seen the individual building blocks — polling, reconciliation, data modeling, error recovery, performance. This page is about how they fit together at the architecture level: the layered sync model, the push-vs-pull decision, the multi-tenant orchestration, and the choices that determine whether your integration scales to dozens of customers or hundreds. The patterns here are general — they apply to any read-heavy API integration — but the specific shapes are tuned for Volunteer’s quirks (no webhooks, small page size, no idempotency keys, the participation gap).

The three-layer sync model

Most production Volunteer integrations have three distinct sync layers, running at different cadences and serving different purposes:

Layer	Cadence	Purpose	Cost
Initial sync	Once per customer at onboarding	Full backfill of historical data	High but bounded
Steady-state polling	Every 15-30 minutes per resource	Incremental change detection	Low per cycle, high cumulative
Reconciliation	Daily + weekly	Catch what polling missed; detect deletions	Medium; bounded

Each layer has a different failure mode and recovery story. The initial sync can be restarted from scratch (it’s idempotent). Polling can lose ground (which reconciliation catches). Reconciliation can run late (no immediate user-visible impact).

Why three layers, not one

It’s tempting to “just poll” — and rely on polling to catch everything. The reasons that doesn’t work:

Polling can miss deletions. Deleted records don’t appear in updated_after queries.
Polling can miss failed processing. A record was returned and the checkpoint advanced, but processing failed silently.
Polling can’t recover from a lost checkpoint. If the checkpoint is corrupted, you don’t know where to restart.

Reconciliation answers all three. Initial sync handles the cold-start case where there’s no checkpoint at all.

Pattern 1: pull architecture (the Volunteer default)

Volunteer has no webhooks. Every integration is pull-based — partner integrations query the API on a schedule. This shapes everything else:

Pull pros	Pull cons
Simpler infrastructure (no public-facing endpoints)	Higher latency (depends on poll cadence)
Reliable through outages (next poll catches up)	Higher continuous API cost (every cycle = requests)
Easy to test (synchronous request/response)	Doesn’t naturally detect deletions
Failure handling localized to the polling worker	Hard to support sub-minute freshness

When pull is the wrong choice

Pull architecture works well when:

The data doesn’t change second-to-second
Sub-minute freshness is a comfort goal, not a hard need
The integration can tolerate brief data lag

It’s a poor fit when:

Real-time signal is essential (live check-in dashboards, e.g.)
The data volume is too high for full polling at the needed cadence

For most Volunteer integrations, pull is fine — Volunteer’s data doesn’t change at the rates that make pull problematic. Customers asking for “real-time” usually mean “fresh enough for my workflow,” which polling delivers.

Pattern 2: hub-and-spoke architecture

For partner integrations serving many customers, the typical architecture is hub-and-spoke: The hub holds:

Per-customer tokens (with proper isolation)
Shared infrastructure (workers, queues, DBs)
Per-customer state (checkpoints, mappings, DLQs)
Per-customer cadence configuration

The hub is the partner’s product; each customer’s integration is a per-customer-scoped slice of it.

Why hub-and-spoke beats per-customer-deployment

Some partners deploy a separate instance per customer. This is conceptually simpler but operationally heavier:

Concern	Per-customer deployment	Hub-and-spoke
Onboarding cost	High (new infrastructure per customer)	Low (just configuration)
Operational burden	High (N deployments to monitor)	Low (one system to monitor)
Cost efficiency	Poor (each customer pays for unused capacity)	Good (shared capacity)
Customer isolation	Excellent (physical separation)	Requires careful design (composite keys)
Onboarding speed	Days to weeks	Minutes to hours

For most B2B partner integrations, hub-and-spoke is the right shape. Per-customer-deployment is reserved for high-stakes integrations with strict isolation requirements (compliance, security).

Shared vs per-customer infrastructure

Within the hub, decide what’s shared and what’s per-customer:

Component	Typical sharing
Compute (workers)	Shared with per-customer queues
Database	Shared with `customer_id` in every key
Caches	Shared with customer-prefixed keys
Credentials store	Per-customer encryption keys; shared service
Logging / observability	Shared with `customer_id` tag on every event
Rate-limit budgets	Per-customer (so one customer can’t exhaust the shared rate)
Background job queues	Shared workers, per-customer queues

Pattern 3: per-resource worker decomposition

Within the polling layer, two main organizational shapes:

Shape A: monolithic poll cycle

One worker handles all resources for a customer in sequence:

JavaScript

async function pollAllResources(customerId) {
  await pollUsers(customerId);
  await pollProjects(customerId);
  await pollGroups(customerId);
  await pollForms(customerId);
  await pollFormCompletions(customerId);
  // ... etc
}

Pros: Simple scheduling; easy to monitor as a single workflow; one failure mode. Cons: All resources poll at the same cadence; one resource’s failure can block the rest.

Shape B: per-resource workers

Each resource has its own worker with its own cadence: Pros: Per-resource cadence tuning; isolated failures; scales independently. Cons: More schedules to manage; more state per customer.

Choosing between them

For most partner integrations, start with Shape A (monolithic) and decompose to Shape B when you have evidence one resource needs different treatment. Premature decomposition adds complexity for hypothetical needs. Signals to decompose:

One resource’s poll takes much longer than others
One resource genuinely needs a different cadence (e.g., Form Completions every 5 min, everything else hourly)
One resource’s failures shouldn’t block the rest

Pattern 4: queue-based decoupling

For higher-scale or higher-reliability integrations, decouple polling from processing via a queue: The polling worker’s only job is to detect changes and publish them to the queue. The processing worker(s) consume the queue and do the actual destination writes.

Why decouple

Concern	Without decoupling	With queue
Slow destination	Blocks polling	Doesn’t block (queue absorbs)
Destination outage	Polling fails or accumulates lag	Polling continues; queue grows
Backpressure	Hard to throttle destination writes	Queue rate-limits naturally
Retry isolation	Tied to polling cycle	Independent retry topology
Horizontal scale	Hard (sequential polling)	Easy (more consumer workers)

The cost is operational complexity (you now have a queue to monitor and reason about).

When the queue is worth it

Customer scale	Queue value
Few customers, small data	Not worth it — monolith is fine
Many customers, modest data	Helpful for resilience
Few customers, large data	Helpful for backpressure
Many customers, large data	Essentially required

For partner integrations serving 50+ customers with high-volume sync needs, the queue is worth its weight in operational burden.

Pattern 5: bidirectional sync (when needed)

Most Volunteer integrations are one-way (VOMO → external). But some need bidirectional sync — external system pushes changes back to VOMO:

Bidirectional brings complications

Concern	Why it matters
Loop detection	A change pushed in one direction may trigger a re-push in the other (infinite loop)
Conflict resolution	Both sides change the same field — which wins?
Email-as-key fragility	An email change in either direction breaks the join (see the email-change problem)
Authority modeling	Each field has a “source of truth” — must be explicit

Loop detection pattern

Track the direction of recent changes and skip writes that would re-trigger a change you just processed:

JavaScript

async function pushFromExternalToVomo(customerId, externalUpdate) {
  // 1. Check if we just received this change from VOMO direction
  const recentInbound = await db.getRecentInboundChange(
    customerId,
    externalUpdate.vomoUserId,
    /* withinSeconds */ 60
  );

  if (recentInbound) {
    return { skipped: true, reason: 'recent_inbound_from_vomo' };
  }

  // 2. Push to VOMO
  await pushToVomo(customerId, externalUpdate);

  // 3. Track that this was an outbound change
  await db.recordOutboundChange(customerId, externalUpdate.vomoUserId);
}

The two-direction-tracking pattern: each direction logs “I just changed this record” with a TTL, and the other direction checks the log before re-pushing.

Authority configuration

Per-field authority declared explicitly:

JavaScript

const FIELD_AUTHORITY = {
  // VOMO is canonical for these
  first_name: 'vomo',
  last_name: 'vomo',
  email: 'vomo',

  // External (e.g., HR system) is canonical for these
  phone: 'external',
  address: 'external',
  birthday: 'external',
};

function shouldPushFieldToVomo(fieldName) {
  return FIELD_AUTHORITY[fieldName] === 'external';
}

When external sends an update, only push the fields where external has authority. For most integrations, keep things one-way unless bidirectional is a genuine business need. The complexity isn’t worth the marginal benefit for most use cases.

Pattern 6: multi-tenant scheduling

For hub-and-spoke integrations, scheduling polling across many customers needs to avoid:

All customers polling at the same minute (thundering herd against VOMO)
All customers using rate budget on the same worker
One customer’s heavy polling starving others

Staggered scheduling

Spread polls across the polling interval:

JavaScript

async function scheduleAllCustomerPolls() {
  const customers = await db.getActiveCustomers();
  const pollIntervalMs = 15 * 60 * 1000; // 15 min

  for (let i = 0; i < customers.length; i++) {
    const offsetMs = (i / customers.length) * pollIntervalMs;
    scheduler.cron(
      `poll:${customers[i].id}`,
      `*/15 * * * *`, // every 15 min
      () => pollCustomer(customers[i].id),
      { initialDelayMs: offsetMs } // stagger initial start
    );
  }
}

For 60 customers on a 15-minute cadence, this means a poll starting every 15 seconds — much smoother than 60 polls all starting at minute 0.

Per-customer rate budget

JavaScript

class PerCustomerBudget {
  async checkAndDeduct(customerId, requestCount) {
    const minute = Math.floor(Date.now() / 60000);
    const key = `budget:${customerId}:${minute}`;
    const current = await redis.incrby(key, requestCount);
    await redis.expire(key, 120);

    const limit = 100; // per minute per customer
    return current <= limit;
  }
}

Per-customer budgets enforce fair use even when one customer’s worker has an unusually heavy load.

Priority lanes

For customers paying for premium tier vs. standard tier:

JavaScript

class TieredScheduler {
  async scheduleCustomer(customerId) {
    const tier = await getCustomerTier(customerId);

    switch (tier) {
      case 'premium':
        return scheduler.cron(`poll:${customerId}`, '*/5 * * * *',
          () => pollCustomer(customerId), { lane: 'priority' });
      case 'standard':
        return scheduler.cron(`poll:${customerId}`, '*/15 * * * *',
          () => pollCustomer(customerId), { lane: 'standard' });
      case 'basic':
        return scheduler.cron(`poll:${customerId}`, '0 * * * *',
          () => pollCustomer(customerId), { lane: 'background' });
    }
  }
}

Different lanes get different worker pools, different rate budgets, and different SLAs.

Pattern 7: regional and geographic considerations

For partner integrations serving customers in multiple regions:

Concern	Pattern
Data residency requirements (EU, etc.)	Per-region deployments with isolated databases
Latency to VOMO API	Less of an issue (API is hosted; client location matters less)
Latency to destination	Co-locate workers with the destination when possible
Time-of-day scheduling	Schedule polling in customer’s local time (3 AM customer-local for reconciliation)
Disaster recovery	Multi-region active-passive or active-active

For most partner integrations, single-region is fine. Multi-region is a complication worth taking on only when data residency or DR requirements demand it.

Pattern 8: progressive delivery and feature flags

When changing a sync integration in production, feature-flag the change:

JavaScript

async function pollUsers(customerId) {
  const useNewLogic = await featureFlags.isEnabled(
    'new_user_polling_logic',
    { customerId }
  );

  if (useNewLogic) {
    return pollUsersV2(customerId);
  }
  return pollUsersV1(customerId);
}

Roll out to:

Internal test customer first
One or two friendly customers (with notification)
10% of customers
50% of customers
All customers

At each step, monitor: error rates, processed counts, reconciliation gap rates. If any metric degrades, roll back.

Why feature flags matter for sync

Sync changes are subtle. A “small change” to the polling logic might:

Skip records that should be processed
Re-process records that already were processed (wasted work)
Advance the checkpoint incorrectly
Trigger spurious side effects (welcome emails, notifications)

Feature-flagged rollout means production exposure is bounded — only the flagged percentage of customers is affected if the change has bugs.

Pattern 9: separation of “infrastructure work” from “business work”

The polling worker shouldn’t know about your business logic; the business processor shouldn’t know about polling mechanics. Separate them cleanly:

JavaScript

// Polling layer — knows about VOMO, checkpoints, pagination
class UserPoller {
  async poll(customerId) {
    const checkpoint = await checkpoints.get(customerId, 'users');
    const changes = await listChangedUsers(customerId, checkpoint);

    for (const change of changes) {
      await queue.publish({
        type: 'user_changed',
        customerId,
        userData: change,
      });
    }

    const latest = latestUpdatedAt(changes) ?? checkpoint;
    await checkpoints.set(customerId, 'users', latest);
  }
}

// Business layer — processes events, knows about destination
class UserChangeHandler {
  async handle(event) {
    const customerId = event.customerId;
    const user = event.userData;
    // Business decisions — mapping, transformation, side effects
    await externalSystem.upsertUser(transformToExternal(user));
    await fireBusinessEvents(customerId, user);
  }
}

The polling layer is reusable across business contexts (different external destinations, different business workflows). The business layer is replaceable without touching polling mechanics. This is the basis of clean evolution — when the business need changes, you change the business layer. When VOMO changes (e.g., a v2 overhaul), you change the polling/API layer.

Pattern 10: operations playbook

Production integrations need a documented playbook for common operational scenarios. A starter checklist:

Scenario	Playbook
Customer’s token expired	(1) Disable polling for that customer; (2) Email customer to refresh; (3) Resume on confirmation
One customer’s sync is stuck	(1) Check checkpoint freshness; (2) Check DLQ; (3) Check error logs for trace IDs; (4) If DLQ has failures, investigate; otherwise restart worker
Many customers’ polling slowed	(1) Check VOMO API status; (2) Check rate-limit metrics; (3) If sustained, throttle down system-wide
New customer onboarding	(1) Verify token; (2) Run estimate; (3) Schedule backfill for low-traffic time; (4) Monitor backfill; (5) Enable polling after completion
Customer offboarding	(1) Disable polling and reconciliation; (2) Stop scheduled jobs; (3) Delete credentials; (4) Schedule data retention/deletion per agreement
Suspected data quality issue	(1) Capture sample; (2) Compare VOMO vs partner state; (3) Compare partner state vs destination; (4) Run targeted reconciliation; (5) Investigate root cause
Suspected security incident	(1) Disable affected customer’s integration immediately; (2) Rotate credentials; (3) Audit recent activity; (4) Document and report per policy

Even a basic playbook ensures consistent response when problems arise. The detail matters less than having one.

Architecture maturity model

Where does your integration sit?

Level	Characteristics
1: Working	Single polling cycle; basic error handling; per-customer state in memory
2: Multi-tenant	Per-customer state in persistent store; staggered scheduling; shared workers
3: Resilient	Queue-based decoupling; DLQ; circuit breakers; reconciliation; per-customer rate budgets
4: Observable	Per-customer dashboards; structured logs with trace IDs; alerting on patterns
5: Operationally mature	Feature-flagged rollouts; documented playbooks; on-call rotation; error budgets

Most partner integrations land between Level 2 and Level 3. Level 5 is reserved for the most operationally critical integrations. Don’t try to skip levels — each level’s practices build on the previous. A Level 1 integration that adds feature flags before adding multi-tenant scoping is over-engineered for its actual maturity.

Decision framework

When designing a new sync integration, walk through these questions:

Question	Implication
What’s the freshness requirement?	Determines polling cadence
How many customers will it serve?	Determines hub-vs-deployment
What’s the scale per customer?	Determines queue-vs-monolith
Is bidirectional sync needed?	Determines authority + loop detection
What’s the SLA?	Determines retry/recovery/playbook depth
What’s the data sensitivity?	Determines security and audit requirements
What’s the team’s operational maturity?	Determines how many patterns to adopt vs defer

Answering these explicitly produces clearer architectural choices than improvising as you build. Document the answers; revisit them when scale changes.

Where to go next

Security and Credential Management

The security patterns that protect this architecture.

Versioning and Backward Compatibility

The patterns for surviving API changes.

Data Modeling

The data model that supports this architecture.

Error Recovery Patterns

The error-handling patterns this architecture relies on.

​The three-layer sync model

​Why three layers, not one

​Pattern 1: pull architecture (the Volunteer default)

​When pull is the wrong choice

​Pattern 2: hub-and-spoke architecture

​Why hub-and-spoke beats per-customer-deployment

​Shared vs per-customer infrastructure

​Pattern 3: per-resource worker decomposition

​Shape A: monolithic poll cycle

​Shape B: per-resource workers

​Choosing between them

​Pattern 4: queue-based decoupling

​Why decouple

​When the queue is worth it

​Pattern 5: bidirectional sync (when needed)

​Bidirectional brings complications

​Loop detection pattern

​Authority configuration

​Pattern 6: multi-tenant scheduling

​Staggered scheduling

​Per-customer rate budget

​Priority lanes

​Pattern 7: regional and geographic considerations

​Pattern 8: progressive delivery and feature flags

​Why feature flags matter for sync

​Pattern 9: separation of “infrastructure work” from “business work”

​Pattern 10: operations playbook

​Architecture maturity model

​Decision framework

​Where to go next

Security and Credential Management

Versioning and Backward Compatibility

Data Modeling

Error Recovery Patterns

The three-layer sync model

Why three layers, not one

Pattern 1: pull architecture (the Volunteer default)

When pull is the wrong choice

Pattern 2: hub-and-spoke architecture

Why hub-and-spoke beats per-customer-deployment

Shared vs per-customer infrastructure

Pattern 3: per-resource worker decomposition

Shape A: monolithic poll cycle

Shape B: per-resource workers

Choosing between them

Pattern 4: queue-based decoupling

Why decouple

When the queue is worth it

Pattern 5: bidirectional sync (when needed)

Bidirectional brings complications

Loop detection pattern

Authority configuration

Pattern 6: multi-tenant scheduling

Staggered scheduling

Per-customer rate budget

Priority lanes

Pattern 7: regional and geographic considerations

Pattern 8: progressive delivery and feature flags

Why feature flags matter for sync

Pattern 9: separation of “infrastructure work” from “business work”

Pattern 10: operations playbook

Architecture maturity model

Decision framework

Where to go next