Rate Limits - Virtuous API Docs

The Volunteer API’s rate-limit story is entirely undocumented in the OpenAPI spec. No endpoint documents a 429 Too Many Requests response, no header on any response describes a budget or remaining-allowance, and no overall policy is published. Yet rate limits almost certainly exist — most production APIs enforce them, and Volunteer is no exception. This page covers what’s known (very little, formally), what to expect in practice (limits exist), and the defensive patterns that keep partner integrations inside whatever budget the platform applies. ⚠️ Spec gap: The Volunteer OpenAPI spec is entirely silent on rate limits. No 429 response is documented on any endpoint. No headers describing budget or remaining allowance are documented in any response. The thresholds, retry policies, and tier-based limits (if any) are not published. The patterns on this page assume standard rate-limit conventions (token bucket per API token, 429 response with Retry-After header). Confirm against actual API behavior for production-critical workloads.

What the spec confirms

Nothing — literally nothing. There’s no rate-limit information anywhere in the OpenAPI spec. For comparison: CRM+ and Raise specs are also silent on rate limits, but at least one of them (CRM+) has rate-limit information available elsewhere through partner channels. The Volunteer case is similar but with less external documentation.

What to expect in practice

Based on common API conventions and the platform’s likely Laravel architecture, partner integrations should reasonably expect:

Concern	Likely behavior
Per-token rate limit	Yes — typical for any production API
`429 Too Many Requests` response when exceeded	Yes — standard HTTP convention
`Retry-After` header on `429` responses	Likely — Laravel’s throttle middleware emits this by default
Per-endpoint limits	Possibly — different endpoints may have different budgets
Burst allowance	Possibly — short bursts often allowed even within sustained limits
Daily or hourly caps	Possibly — some APIs enforce these in addition to per-second rates

None of this is in the spec. The patterns on this page work even if the specific values differ.

The defensive baseline

Build the integration to behave well within any reasonable rate budget. Three principles:

Principle	Description
Pace requests appropriately for the workload	Bulk reads space out requests; real-time reads can be tighter
Honor `429` responses with backoff	When the API says “slow down,” slow down. Use `Retry-After` if present.
Monitor request volume per customer	A customer’s usage shouldn’t grow uncontrolled without alerting

These principles produce well-behaved integrations regardless of the specific rate limit. The integration won’t be the partner that causes problems for other partners by hammering the API.

A throttled HTTP client

For partner integrations operating at scale, build a throttled client that paces requests automatically:

class ThrottledVomoClient {
  constructor({ token, requestsPerSecond = 5 }) {
    this.token = token;
    this.requestsPerSecond = requestsPerSecond;
    this.queue = [];
    this.lastRequestAt = 0;
  }
 
  async request(url, options = {}) {
    await this._throttle();
 
    const response = await fetch(url, {
      ...options,
      headers: {
        Authorization: `Bearer ${this.token}`,
        Accept: 'application/json',
        ...options.headers,
      },
    });
 
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After');
      const delayMs = retryAfter ? parseInt(retryAfter, 10) * 1000 : 30 * 1000;
      await sleep(delayMs);
      return this.request(url, options); // Retry after backoff
    }
 
    return response;
  }
 
  async _throttle() {
    const minIntervalMs = 1000 / this.requestsPerSecond;
    const elapsed = Date.now() - this.lastRequestAt;
    if (elapsed < minIntervalMs) {
      await sleep(minIntervalMs - elapsed);
    }
    this.lastRequestAt = Date.now();
  }
}
 
function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

A few design decisions worth understanding:

Decision	Why
Default to 5 requests/second	Conservative for any plausible limit; rarely too slow for real workloads
Sequential throttle (one request at a time)	Simpler than token-bucket; sufficient for most integrations
Automatic retry on `429`	Handles the rare hits without caller intervention
Honor `Retry-After` when present	Respects platform guidance even when undocumented
Default to 30s backoff if no header	Reasonable conservative fallback

For higher-throughput integrations, replace the sequential throttle with a token-bucket implementation that allows controlled bursts.

A token-bucket for higher throughput

When you need concurrent requests but still want to stay inside a budget:

class TokenBucket {
  constructor({ capacity, refillPerSecond }) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillPerSecond = refillPerSecond;
    this.lastRefillAt = Date.now();
  }
 
  async acquire() {
    while (true) {
      this._refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      const waitMs = ((1 - this.tokens) / this.refillPerSecond) * 1000;
      await sleep(waitMs);
    }
  }
 
  _refill() {
    const now = Date.now();
    const elapsedSec = (now - this.lastRefillAt) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsedSec * this.refillPerSecond);
    this.lastRefillAt = now;
  }
}
 
const bucket = new TokenBucket({ capacity: 20, refillPerSecond: 10 });
 
async function rateLimitedRequest(url, options) {
  await bucket.acquire();
  return fetch(url, options);
}

This allows short bursts up to 20 requests but sustains an average of 10 requests per second. Tune capacity (burst size) and refillPerSecond (sustained rate) based on the workload. For most partner integrations, the simpler sequential throttle is sufficient. Use token bucket when you need parallelism (e.g., concurrent calls across multiple customers’ workers).

Per-workload rate-limit budgets

Different workloads have different needs. A reasonable starting allocation per customer:

Workload	Suggested rate	Rationale
Interactive UI lookups	10–20 req/sec burst	User waiting for response — fast matters
Steady-state sync (polling)	1–5 req/sec	Lots of background work; latency tolerant
Backfill (one-time bulk read)	2–5 req/sec	Throttle harder during backfills to avoid sustained pressure
Daily reconciliation	1–2 req/sec	Off-hours work; no need to rush

These are starting points. Tune based on what works against the actual API. For partner integrations serving many customers, the per-customer rate matters less than the total across all customers. If 100 customers’ integrations each run at 5 req/sec, the partner’s total rate is 500 req/sec — which may itself trigger any platform-side per-source limits.

Monitoring rate-limit pressure

Track metrics that surface rate-limit issues before they become customer-facing:

Metric	Healthy baseline	Alert threshold
`429` rate per customer	0	Any non-zero sustained
Average request latency per customer	Stable	Sudden spike (often a precursor to `429`)
Time between requests per customer	At configured pacing	Falling below configured rate — bug
Total requests per customer per day	Steady	Sudden growth without traffic change

A few patterns to look for:

async function request(url, options) {
  const start = Date.now();
  const response = await fetch(url, options);
  const latencyMs = Date.now() - start;
 
  metrics.timing('vomo.request.latency', latencyMs, { endpoint: simplifyUrl(url) });
 
  if (response.status === 429) {
    metrics.increment('vomo.request.rate_limited', { customerId });
    // Alert if this is sustained
  }
 
  return response;
}

Per-customer 429 rate is the canary. If one customer’s integration starts seeing rate limits, investigate before it affects others.

When the integration is the culprit

A 429 response means your integration’s traffic exceeded what the platform allows. Common causes:

Cause	Solution
Workload changed scale (new customer, new feature)	Spread the new load over more time
Inefficient pagination (re-reading pages, not using `links.next`)	Use `links.next` and stop at `null`
Polling too frequently	Increase the poll interval; use longer windows per poll
Lookups in a tight loop without batching	Batch or cache the lookups
Reference data re-fetched on every request	Cache reference data with appropriate TTL

The fixes are often more substantial than “just slow down.” Address the underlying inefficiency rather than papering over it with throttling. See API Performance Tips for the broader performance patterns.

When the platform is the culprit

Sometimes the rate limit feels too strict for the workload — typically because the customer has growth needs that exceed what the default budget supports. The path forward:

Step	Action
Document the workload requirements	How many records, how often, by which integration path
Confirm the limit is being hit consistently	Not just occasional; show the data
Coordinate with the customer’s VOMO concierge	Request a higher rate tier for this customer’s account
Confirm the new limit works	After the upgrade, monitor to confirm

Rate-limit tiers (if they exist) are administered per-customer, not per-integration. The customer’s relationship with VOMO is the path to a higher tier.

Polling-specific rate considerations

Volunteer has no webhooks — partner integrations that need change detection must poll. This creates a specific rate-limit consideration: don’t poll faster than your business need actually requires.

Workload type	Reasonable poll frequency
Daily reporting refresh	Once daily
Same-day data freshness	Once every 1–4 hours
Near-real-time sync	Once every 5–15 minutes
Real-time required	Reconsider the design — polling at <1 minute intervals is rarely sustainable

For partner integrations that need real-time updates, the architecture, not the rate limit, is usually the issue. Polling every 30 seconds wastes rate-limit budget and still has a 30-second lag. Consider whether your customer’s workflow genuinely needs sub-minute freshness or whether the perceived need is a constraint that could be relaxed. See Polling and Sync for the broader polling architecture.

A common-cases reference

Quick examples of what reasonable rate-limit handling looks like for typical workloads:

Interactive lookup (user-waiting workflow)

// Direct request without throttling — fast for users
async function lookupUser(userId) {
  const response = await fetch(`https://api.vomo.org/v1/users/${userId}`, {
    headers: { Authorization: `Bearer ${token}` },
  });
 
  if (response.status === 429) {
    const retryAfter = parseInt(response.headers.get('Retry-After') ?? '5', 10);
    await sleep(retryAfter * 1000);
    return lookupUser(userId); // Single retry
  }
 
  if (!response.ok) throw new Error(`Lookup failed: ${response.status}`);
  return response.json();
}

Steady-state polling (every 15 minutes)

async function pollUsersSync(customerId) {
  const client = new ThrottledVomoClient({ token, requestsPerSecond: 3 });
  const lastSync = await getCheckpoint(customerId);
 
  let url = `https://api.vomo.org/v1/users?updated_after=${encodeURIComponent(lastSync)}`;
 
  while (url) {
    const response = await client.request(url);
    const page = await response.json();
    await processUsers(customerId, page.data);
    url = page.links.next;
  }
 
  await advanceCheckpoint(customerId);
}
 
// Run on 15-minute interval
setInterval(() => pollUsersSync(customerId).catch(console.error), 15 * 60 * 1000);

Backfill (one-time bulk read)

async function backfillUsers(customerId) {
  const client = new ThrottledVomoClient({ token, requestsPerSecond: 2 });
  let url = 'https://api.vomo.org/v1/users';
 
  while (url) {
    const response = await client.request(url);
    const page = await response.json();
    await processUsers(customerId, page.data);
 
    console.log(`Backfill: ${page.meta.to}/${page.meta.total}`);
    url = page.links.next;
  }
}

Backfills should pace more aggressively than steady-state — they’re a sustained bulk read that benefits from slower pacing.

A rate-limit checklist

Walk through this when designing or auditing the integration:

All API requests go through a throttled client (sequential or token-bucket)
429 responses honor Retry-After when present
Default backoff (when no Retry-After) is at least 15 seconds
Per-customer request volume is monitored
429 rate is alerted on (any sustained non-zero)
Polling intervals match the actual business need (not “as fast as possible”)
Reference data is cached to avoid repeated lookups
Bulk operations use throttling sized for sustained work
Interactive operations have a separate (higher) throttle
No retry-forever loops — bounded attempts only These practices keep the integration well-behaved even though the specific rate limit isn’t documented.

Where to go next

The pagination pattern that’s tightly coupled with rate-limit-aware design.The classification framework for 429 and other error responses.The broader patterns for keeping request volume manageable.The polling-specific architecture that depends on rate-limit-aware design.

​What the spec confirms

​What to expect in practice

​The defensive baseline

​A throttled HTTP client

​A token-bucket for higher throughput

​Per-workload rate-limit budgets

​Monitoring rate-limit pressure

​When the integration is the culprit

​When the platform is the culprit

​Polling-specific rate considerations

​A common-cases reference

​Interactive lookup (user-waiting workflow)

​Steady-state polling (every 15 minutes)

​Backfill (one-time bulk read)

​A rate-limit checklist

​Where to go next