Data Modeling - Virtuous API Docs

Every Volunteer integration eventually faces the same architectural question: how do you represent VOMO data inside your own system? It’s tempting to mirror VOMO’s structure exactly — but that couples your schema to API quirks, makes cross-API joins harder, and complicates evolution when VOMO changes. Better-designed integrations introduce a partner-side data model that’s inspired by VOMO’s shape but adapted to your use case. This page covers the modeling decisions that matter: identifier strategy, mapping tables, denormalization, the participation challenge, and the schema patterns that age well.

When this matters

Decision	Why it matters
What’s your stable identifier for a person?	Survives email changes, VOMO ID changes, system migrations
How do you store VOMO IDs?	Determines join cost, lookup performance, deletion handling
How do you model participations?	The most-data-intensive part of any reporting integration
How do you handle the schema differences (snake_case, type oddities)?	Determines whether your code needs to know about API quirks
What’s denormalized vs. computed?	Trade-off between storage and query cost

These decisions are hard to reverse. A partner integration that uses VOMO User IDs as primary keys throughout will struggle when the customer migrates to a different platform; one that uses partner-assigned IDs survives the migration.

Principle 1: stable partner identifiers

Don’t make VOMO IDs your primary keys. Use partner-assigned identifiers as the stable reference, and link them to VOMO IDs via a mapping table.

The pattern

-- Partner-side person records
CREATE TABLE persons (
  partner_person_id  VARCHAR PRIMARY KEY,  -- Stable; never changes
  customer_id        VARCHAR NOT NULL,
  email              VARCHAR NOT NULL,
  first_name         VARCHAR,
  last_name          VARCHAR,
  -- ... your business fields ...
  created_at         TIMESTAMP DEFAULT NOW()
);

-- Mapping to VOMO IDs
CREATE TABLE person_vomo_mapping (
  customer_id        VARCHAR NOT NULL,
  partner_person_id  VARCHAR NOT NULL,
  vomo_user_id       INTEGER,
  last_known_email   VARCHAR,
  last_synced_at     TIMESTAMP,

  PRIMARY KEY (customer_id, partner_person_id),
  UNIQUE (customer_id, vomo_user_id)
);

CREATE INDEX person_vomo_by_vomo_id ON person_vomo_mapping (customer_id, vomo_user_id);

The partner_person_id is your stable key. The mapping table is the bridge to VOMO. The same pattern extends to Projects, Groups, Forms, etc.

Why not just use VOMO IDs

Concern	What happens if you use VOMO IDs as primary keys
Customer migrates to a different platform	All your foreign keys break
You join with CRM+ data	CRM+ has its own IDs; you need a translation layer anyway
A VOMO record is deleted	You lose the join target; orphaned references everywhere
You add a new data source later	Schema is already locked to VOMO
The customer’s identity system changes their email	If you stored by email, joins break

The partner-assigned ID isolates you from all of these.

Identifier generation

A few options for partner_person_id:

Format	Pros	Cons
UUID	Universal, no coordination	Long; opaque
Customer-prefixed (e.g., `cust001-prs-12345`)	Self-describing; debug-friendly	Coupled to customer ID format
Slug-based (e.g., `bruce-wayne-xj9k`)	Human-readable	Collisions; profanity risk
Sequential	Compact; sortable	Reveals scale; bad for multi-tenant

For most integrations, UUID is the safe default. The opacity is a feature — it prevents IDs from being misused as ordering or identifying information.

Principle 2: mapping tables, not foreign keys

When linking partner-side records to VOMO records, use a mapping table rather than embedding vomo_user_id as a foreign key everywhere.

The pattern

-- ❌ Anti-pattern: VOMO ID embedded in business tables
CREATE TABLE participations (
  id INTEGER PRIMARY KEY,
  vomo_user_id INTEGER REFERENCES vomo_users(id),  -- Tight coupling
  ...
);

-- ✅ Better: business table uses partner ID; mapping resolves to VOMO
CREATE TABLE participations (
  id INTEGER PRIMARY KEY,
  partner_person_id VARCHAR REFERENCES persons(partner_person_id),
  ...
);

The mapping table changes infrequently; the business tables don’t have to.

When mapping tables get complex

For multi-API integrations (Volunteer + CRM+ + Raise), the mapping pattern scales:

CREATE TABLE person_external_mapping (
  customer_id        VARCHAR NOT NULL,
  partner_person_id  VARCHAR NOT NULL,
  source_system      VARCHAR NOT NULL,  -- 'vomo', 'crm', 'raise', 'workday'
  source_id          VARCHAR NOT NULL,
  last_synced_at     TIMESTAMP,

  PRIMARY KEY (customer_id, partner_person_id, source_system),
  UNIQUE (customer_id, source_system, source_id)
);

-- Find the VOMO ID for a given partner person
SELECT source_id FROM person_external_mapping
WHERE customer_id = ? AND partner_person_id = ? AND source_system = 'vomo';

-- Find all systems linked to a given partner person
SELECT source_system, source_id FROM person_external_mapping
WHERE customer_id = ? AND partner_person_id = ?;

A single table maps partner IDs to any number of source-system IDs. Adding a new system (e.g., a future “Raise” integration) means adding rows, not new columns or tables.

Principle 3: snake_case → camelCase translation at the boundary

VOMO uses snake_case for all fields. Your application code probably uses something else (camelCase in JavaScript, PascalCase in C#, etc.). Translate at the API client boundary, not throughout your code.

The pattern

JavaScript

class VomoUser {
  static fromVomo(raw) {
    return new VomoUser({
      id: raw.id,
      firstName: raw.first_name,
      lastName: raw.last_name,
      fullName: raw.full_name,
      email: raw.email,
      phone: raw.phone,
      birthday: raw.birthday,
      gender: raw.gender,
      userStatus: raw.user_status,
      membershipStatus: raw.membership_status,
      membershipRole: raw.membership_role,
      createdAt: raw.created_at ? new Date(raw.created_at) : null,
      updatedAt: raw.updated_at ? new Date(raw.updated_at) : null,
      // Defensive type coercion (audit-flagged quirks)
      hours: raw.hours !== undefined ? parseFloat(raw.hours) : null,
      verified: raw.verified === 1 || raw.verified === true,
    });
  }

  toVomo() {
    return {
      id: this.id,
      first_name: this.firstName,
      last_name: this.lastName,
      // ... reverse translation ...
    };
  }
}

Two big wins:

Benefit	Detail
Application code uses one convention	No mixed snake_case/camelCase across the codebase
API quirks are quarantined	Audit-flagged issues (hours as string, verified as 0/1) are handled in one place

Defensive translation for audit-flagged quirks

The Volunteer API has documented quirks (see audits) that production code should handle defensively:

JavaScript

function defensiveParseHours(raw) {
  // Spec says integer; live returns "4.00" string per audit #40
  if (raw === null || raw === undefined) return null;
  if (typeof raw === 'number') return raw;
  if (typeof raw === 'string') return parseFloat(raw) || 0;
  return null;
}

function defensiveParseVerified(raw) {
  // Live returns 0 or 1 integer; defensive parsing
  if (raw === null || raw === undefined) return null;
  return raw === 1 || raw === true || raw === '1';
}

function defensiveParseProjectId(raw) {
  // Audit #41: project_id typed string but should be integer
  if (raw === null || raw === undefined) return null;
  if (typeof raw === 'number') return raw;
  return parseInt(raw, 10) || null;
}

These helpers belong in your API client layer, not scattered through application code. When the API is eventually fixed (post-v2 overhaul), the fix is one file, not hundreds.

Principle 4: model the participation challenge explicitly

Participations are the most operationally challenging part of any Volunteer integration. They have no direct endpoint, they live embedded in two different resources (User detail and Project Date detail), and they change post-event in ways that aren’t always polled correctly.

The pattern: dedicated participations table keyed by (project_date_id, user_id)

CREATE TABLE participations (
  customer_id        VARCHAR NOT NULL,
  vomo_project_date_id  INTEGER NOT NULL,
  vomo_user_id       INTEGER NOT NULL,

  vomo_project_id    INTEGER NOT NULL,
  partner_person_id  VARCHAR,  -- joined back to person via mapping

  starts_at          TIMESTAMP NOT NULL,
  ends_at            TIMESTAMP NOT NULL,
  signed_up_at       TIMESTAMP,
  checked_in_at     TIMESTAMP,
  checked_out_at    TIMESTAMP,

  hours              DECIMAL(10, 2),  -- Parsed from VOMO's string
  verified           BOOLEAN,
  role               VARCHAR,
  guests             INTEGER DEFAULT 0,

  first_seen_at      TIMESTAMP DEFAULT NOW(),
  last_synced_at     TIMESTAMP DEFAULT NOW(),

  -- Soft-delete marker (record disappears from VOMO but we keep history)
  vomo_disappeared_at  TIMESTAMP,

  PRIMARY KEY (customer_id, vomo_project_date_id, vomo_user_id)
);

CREATE INDEX participations_by_user ON participations (customer_id, vomo_user_id, signed_up_at);
CREATE INDEX participations_by_project ON participations (customer_id, vomo_project_id);
CREATE INDEX participations_by_partner_person ON participations (customer_id, partner_person_id);
CREATE INDEX participations_active ON participations (customer_id) WHERE vomo_disappeared_at IS NULL;

The composite primary key is the natural unique identifier. The vomo_disappeared_at column lets you keep history when VOMO removes a participation (organizer correction, Project deletion) — historical reports remain accurate.

Tracking changes to participation fields

Participation fields change over time — checked_in_at is set when the volunteer arrives; hours may be adjusted post-event. For audit-quality reporting, track changes:

CREATE TABLE participation_changes (
  customer_id           VARCHAR NOT NULL,
  vomo_project_date_id  INTEGER NOT NULL,
  vomo_user_id          INTEGER NOT NULL,
  changed_at            TIMESTAMP NOT NULL,
  field_name            VARCHAR NOT NULL,
  previous_value        TEXT,
  new_value             TEXT,

  PRIMARY KEY (customer_id, vomo_project_date_id, vomo_user_id, changed_at, field_name)
);

Adding entries on every change produces a complete event log of “Bruce’s hours were initially 4, changed to 3.75 a week later.” For compliance use cases this is essential.

Principle 5: denormalize for query, normalize for truth

VOMO’s API returns denormalized data (participations array embedded on UserDetailResource, participants embedded on ProjectDateResource). Don’t blindly mirror this — model normalized canonical data, then denormalize for query-specific views.

Canonical layer (normalized)

-- One row per person
CREATE TABLE persons (partner_person_id PRIMARY KEY, ...);

-- One row per Project
CREATE TABLE projects (vomo_project_id PRIMARY KEY, ...);

-- One row per Project Date
CREATE TABLE project_dates (vomo_project_date_id PRIMARY KEY, vomo_project_id, ...);

-- One row per participation
CREATE TABLE participations (
  vomo_project_date_id, vomo_user_id, ...,
  PRIMARY KEY (vomo_project_date_id, vomo_user_id)
);

This is the truth. Updates happen here. Other tables derive from this.

Query layer (denormalized)

-- For fast "lifetime hours per user" lookups
CREATE MATERIALIZED VIEW user_lifetime_hours AS
SELECT
  customer_id,
  vomo_user_id,
  SUM(hours) AS lifetime_hours,
  COUNT(*) AS lifetime_participations,
  COUNT(DISTINCT vomo_project_id) AS unique_projects,
  MAX(signed_up_at) AS last_participated_at
FROM participations
WHERE verified = true
GROUP BY customer_id, vomo_user_id;

-- For fast "Project performance" lookups
CREATE MATERIALIZED VIEW project_summary AS
SELECT
  customer_id,
  vomo_project_id,
  COUNT(*) AS total_participations,
  COUNT(DISTINCT vomo_user_id) AS unique_volunteers,
  SUM(hours) AS total_hours
FROM participations
WHERE verified = true
GROUP BY customer_id, vomo_project_id;

Materialized views refresh on a schedule (every hour or daily). Dashboards query the views; the canonical tables stay clean.

When to denormalize

Workload	Strategy
One-off ad-hoc queries	Query the normalized tables directly
Dashboard with sub-second response needs	Materialized view
Per-customer or per-user dashboards (high cardinality, fast response)	Pre-computed per-key summary table
Aggregations refreshed many times per day	Incremental materialized view if database supports it
Cross-API joins	Joined-store pattern from Combine Volunteer Data with CRM+ Data

The general rule: normalize first, denormalize when a specific workload demands it.

Principle 6: store snapshots for change detection

For change-detection patterns (schedule diffs, member-list diffs), store the previous snapshot so you can compute deltas:

CREATE TABLE project_schedule_snapshots (
  customer_id      VARCHAR NOT NULL,
  vomo_project_id  INTEGER NOT NULL,
  snapshot_at      TIMESTAMP DEFAULT NOW(),
  dates            JSONB NOT NULL,  -- The all_dates array as observed
  PRIMARY KEY (customer_id, vomo_project_id, snapshot_at)
);

-- Keep only the most recent N snapshots per project
CREATE INDEX project_schedule_snapshots_recent
  ON project_schedule_snapshots (customer_id, vomo_project_id, snapshot_at DESC);

On each polling cycle: fetch current, compare to most-recent snapshot, emit events for the diff, write the new snapshot. See Detecting Project Changes.

Why JSONB for the snapshot

The snapshot isn’t queried — it’s only used for diff comparison. JSONB (or your DB’s JSON type) lets you store the entire schedule structure without designing a normalized schema for it. The diff code parses both old and new JSON into in-memory objects and compares. For PostgreSQL specifically, JSONB has good compression and is reasonably fast. For other databases, consider a serialized format that your application can deserialize cheaply.

Principle 7: customer is a top-level dimension

Almost every table should have customer_id as part of its primary key or composite key:

-- ✅ Multi-tenant aware
CREATE TABLE persons (
  partner_person_id  VARCHAR NOT NULL,
  customer_id        VARCHAR NOT NULL,
  ...,
  PRIMARY KEY (customer_id, partner_person_id)
);

-- ❌ Anti-pattern: customer_id is a regular column
CREATE TABLE persons (
  partner_person_id  VARCHAR PRIMARY KEY,
  customer_id        VARCHAR NOT NULL,  -- Just a column; risk of cross-customer access
  ...
);

The composite key approach makes it impossible to accidentally write a query that crosses customer boundaries. Every WHERE clause must specify the customer.

Customer-scoped queries everywhere

JavaScript

// Helper that enforces customer scope
async function findPerson(customerId, partnerPersonId) {
  return db.query(`
    SELECT * FROM persons
    WHERE customer_id = $1 AND partner_person_id = $2
  `, [customerId, partnerPersonId]);
}

// Anti-pattern: query without customer scope
async function findPerson(partnerPersonId) {  // ❌ Where's the customer scope?
  return db.query(`SELECT * FROM persons WHERE partner_person_id = $1`, [partnerPersonId]);
}

Every data-access function takes customerId as the first argument. This makes accidental cross-customer access syntactically harder.

Principle 8: model what VOMO doesn’t expose

Sometimes the most valuable data in your model is data VOMO doesn’t have but the customer needs. Partner-side fields you might add:

Field	Why
`vomo_disappeared_at` (per record)	Track when a VOMO record was removed; historical accuracy
`last_communication_at` (per person)	Track when the partner integration last reached out
`outreach_status` (per person)	The partner-side status: contacted, lapsed, do-not-contact
`partner_notes` (per person)	Annotations the customer adds in the partner UI
`derived_segment` (per person)	“Donor-volunteer,” “first-timer,” etc. — partner-computed labels
`last_sync_outcome` (per record)	Did the most recent sync succeed?

These are partner-side facts; they don’t belong in VOMO. Modeling them explicitly lets the partner integration provide value beyond a pure mirror.

Anti-patterns to avoid

A few common modeling mistakes:

Anti-pattern: storing VOMO’s raw response

Some integrations dump the entire VOMO response as JSON into a single column. This is appealingly simple but creates problems:

Queries on specific fields require JSON path expressions (slow and verbose)
Schema evolution is invisible — when VOMO adds a field, you don’t know unless you re-process
Translating snake_case to your conventions has to happen at every read site

Better: extract the fields you care about into proper columns; keep the raw JSON only if you have a specific reason (e.g., debugging integration issues).

Anti-pattern: assuming VOMO IDs are stable

VOMO IDs are stable in normal operation, but partner integrations should defensively handle the possibility that they aren’t:

JavaScript

// ❌ Assumes ID is forever
async function getPersonHistory(vomoUserId) {
  return db.query(`SELECT * FROM participations WHERE vomo_user_id = $1`, [vomoUserId]);
}

// ✅ Resolves through partner mapping
async function getPersonHistory(partnerPersonId) {
  const mapping = await getMapping(partnerPersonId);
  return db.query(`SELECT * FROM participations WHERE vomo_user_id = $1`,
    [mapping.vomo_user_id]);
}

If VOMO IDs change (rare but possible during customer migrations), the mapping is the single source of truth — business records stay untouched.

Anti-pattern: not modeling deletions

Many integrations don’t handle deletions explicitly — when a VOMO record disappears, the partner record just sits orphaned. Better:

Detect deletions via reconciliation
Mark partner records with vomo_disappeared_at rather than deleting them
Surface “this record no longer exists in VOMO” in the partner UI
Decide per-record whether to cascade or preserve

Anti-pattern: shared tables across customers

A single users table containing all customers’ users sounds efficient but introduces:

Risk of cross-customer access bugs
Hard-to-isolate performance issues (one customer’s bulk operation slows everyone)
Complex deletion when a customer offboards
Compliance complications (data sovereignty, retention policies)

Better: per-customer keys throughout (the Principle 7 pattern). For true multi-tenant isolation, consider per-customer schemas or databases.

A reference schema for a typical integration

Putting the principles together:

-- Customers (managed by partner-side admin)
CREATE TABLE customers (
  customer_id       VARCHAR PRIMARY KEY,
  name              VARCHAR NOT NULL,
  vomo_token_id     VARCHAR REFERENCES credentials_store(id),
  enabled           BOOLEAN DEFAULT true,
  onboarded_at      TIMESTAMP DEFAULT NOW()
);

-- People (the partner-side canonical person model)
CREATE TABLE persons (
  customer_id       VARCHAR NOT NULL,
  partner_person_id VARCHAR NOT NULL,
  email             VARCHAR NOT NULL,
  first_name        VARCHAR,
  last_name         VARCHAR,
  created_at        TIMESTAMP DEFAULT NOW(),
  PRIMARY KEY (customer_id, partner_person_id),
  UNIQUE (customer_id, email)
);

-- VOMO-side mapping
CREATE TABLE person_vomo_mapping (
  customer_id        VARCHAR NOT NULL,
  partner_person_id  VARCHAR NOT NULL,
  vomo_user_id       INTEGER NOT NULL,
  last_known_email   VARCHAR,
  last_synced_at     TIMESTAMP,
  PRIMARY KEY (customer_id, partner_person_id),
  UNIQUE (customer_id, vomo_user_id),
  FOREIGN KEY (customer_id, partner_person_id) REFERENCES persons (customer_id, partner_person_id)
);

-- Projects (VOMO-side mirror)
CREATE TABLE projects (
  customer_id       VARCHAR NOT NULL,
  vomo_project_id   INTEGER NOT NULL,
  name              VARCHAR,
  description       TEXT,
  org_slug          VARCHAR,
  draft             BOOLEAN,
  last_synced_at    TIMESTAMP,
  vomo_disappeared_at TIMESTAMP,
  PRIMARY KEY (customer_id, vomo_project_id)
);

-- Participations (the central data table)
CREATE TABLE participations (
  customer_id          VARCHAR NOT NULL,
  vomo_project_date_id INTEGER NOT NULL,
  vomo_user_id         INTEGER NOT NULL,
  vomo_project_id      INTEGER NOT NULL,
  partner_person_id    VARCHAR,

  starts_at            TIMESTAMP,
  ends_at              TIMESTAMP,
  signed_up_at         TIMESTAMP,
  checked_in_at        TIMESTAMP,
  checked_out_at       TIMESTAMP,
  hours                DECIMAL(10, 2),
  verified             BOOLEAN,

  last_synced_at       TIMESTAMP,
  vomo_disappeared_at  TIMESTAMP,
  PRIMARY KEY (customer_id, vomo_project_date_id, vomo_user_id)
);

-- Sync checkpoints
CREATE TABLE sync_checkpoints (
  customer_id  VARCHAR NOT NULL,
  resource     VARCHAR NOT NULL,
  checkpoint   TIMESTAMP NOT NULL,
  PRIMARY KEY (customer_id, resource)
);

-- Audit log
CREATE TABLE sync_audit (
  customer_id    VARCHAR NOT NULL,
  resource_type  VARCHAR NOT NULL,
  record_id      VARCHAR NOT NULL,
  operation      VARCHAR NOT NULL,
  trace_id       VARCHAR,
  succeeded      BOOLEAN,
  error          TEXT,
  processed_at   TIMESTAMP DEFAULT NOW()
);
CREATE INDEX sync_audit_recent ON sync_audit (customer_id, processed_at DESC);

Modify for your specific needs, but the shapes — partner-assigned IDs, mapping tables, customer-scoped composite keys, vomo_disappeared_at for soft deletes, audit logging — apply broadly.

Where to go next

API Performance Tips

The caching and performance patterns that make this data model fast.

Error Recovery Patterns

The resilience patterns that protect this data model from API issues.

Sync Architecture Patterns

The broader architectural patterns this modeling fits into.

The Volunteer Data Model

VOMO’s own data model — the source these patterns map from.

​When this matters

​Principle 1: stable partner identifiers

​The pattern

​Why not just use VOMO IDs

​Identifier generation

​Principle 2: mapping tables, not foreign keys

​The pattern

​When mapping tables get complex

​Principle 3: snake_case → camelCase translation at the boundary

​The pattern

​Defensive translation for audit-flagged quirks

​Principle 4: model the participation challenge explicitly

​The pattern: dedicated participations table keyed by (project_date_id, user_id)

​Tracking changes to participation fields

​Principle 5: denormalize for query, normalize for truth

​Canonical layer (normalized)

​Query layer (denormalized)

​When to denormalize

​Principle 6: store snapshots for change detection

​Why JSONB for the snapshot

​Principle 7: customer is a top-level dimension

​Customer-scoped queries everywhere

​Principle 8: model what VOMO doesn’t expose

​Anti-patterns to avoid

​Anti-pattern: storing VOMO’s raw response

​Anti-pattern: assuming VOMO IDs are stable

​Anti-pattern: not modeling deletions

​Anti-pattern: shared tables across customers

​A reference schema for a typical integration

​Where to go next

API Performance Tips

Error Recovery Patterns

Sync Architecture Patterns

The Volunteer Data Model

When this matters

Principle 1: stable partner identifiers

The pattern

Why not just use VOMO IDs

Identifier generation

Principle 2: mapping tables, not foreign keys

The pattern

When mapping tables get complex

Principle 3: snake_case → camelCase translation at the boundary

The pattern

Defensive translation for audit-flagged quirks

Principle 4: model the participation challenge explicitly

The pattern: dedicated participations table keyed by (project_date_id, user_id)

Tracking changes to participation fields

Principle 5: denormalize for query, normalize for truth

Canonical layer (normalized)

Query layer (denormalized)

When to denormalize

Principle 6: store snapshots for change detection

Why JSONB for the snapshot

Principle 7: customer is a top-level dimension

Customer-scoped queries everywhere

Principle 8: model what VOMO doesn’t expose

Anti-patterns to avoid

Anti-pattern: storing VOMO’s raw response

Anti-pattern: assuming VOMO IDs are stable

Anti-pattern: not modeling deletions

Anti-pattern: shared tables across customers

A reference schema for a typical integration

Where to go next