Back to home

StatsAI docs

Sync contract

The sync_batch.v1 contract between the StatsAI CLI and the hosted API, including privacy defaults and response shape.

Last updated: June 9, 2026

Who this is for

This page documents the boundary between the open-source collector and the hosted StatsAI API. If you are integrating a compatible backend or auditing what leaves your machine, start here. For a user-facing summary of what syncs, see the privacy model.

Contract boundary

sync_batch.v1 is the first backend-facing contract for StatsAI. The collector owns local scanning, normalization, idempotent local storage, and privacy scrubbing. The hosted API owns authentication, validation, deduplication, rollups, and dashboard queries.

Producing a batch

Shell
statsai sync --dry-run
statsai sync --sink stdout
statsai sync --sink file --output ./statsai-sync-batch.json
statsai sync --sink http --since-last
statsai sync --sink http --verify
statsai schema sync-batch

Use schema sync-batch to print the JSON Schema the CLI validates against before sending HTTP batches. Use --dry-run for a quick count summary, or a file/stdout sink when you need the full payload.

Privacy defaults

The production sync path strips record-level local evidence before sending:

FieldWhy it is removed
SourceLocation.path_labelHides local directory names
ProviderAccount.plan_nameRedundant with subscription records
UsageEvent.source.source_record_idRaw local record pointer
UsageEvent.parse_evidence.*Line numbers and record IDs from parsing
UsageSummary.source.source_record_idRaw local record pointer
UsageSummary.parse_evidence.*Line numbers and record IDs from parsing
Subscription.notesUser-entered private commentary

Hashed path, source, event, and summary identifiers remain so the server can deduplicate records without seeing local file names directly. ProjectInfo.path_label is retained for owner-facing project location displays and manual project linking.

Canonical provider account identity may sync through ProviderAccount.provider_user_id and ProviderAccount.email. User-defined aliases remain in ProviderAccount.account_label for display, but they are not the primary account key.

HTTP endpoint

Production sync posts to:

Text
POST /api/sync/batches

A compatible backend should:

  • require an authenticated device access token
  • accept Authorization: Bearer <device_access_token>
  • validate the request body against sync_batch.v1
  • reject unsupported schema_version values
  • deduplicate sources, accounts, assignments, subscriptions, and summaries by stable IDs
  • compute daily, monthly, and dashboard rollups server-side from accepted summaries
  • return accepted, updated, duplicate, and rejected counts

The loopback daemon still supports /v1/sync/batches for local diagnostics, but /api/sync/batches is the production contract.

Response shape

Successful HTTP sync returns sync_ack.v1:

JSON
{
  "schema_version": "sync_ack.v1",
  "batch_id": "batch_1710000000000",
  "accepted": {
    "sources": 1,
    "accounts": 1,
    "source_account_assignments": 1,
    "subscriptions": 0,
    "events": 1,
    "summaries": 0
  },
  "duplicates": {
    "sources": 0,
    "accounts": 0,
    "source_account_assignments": 0,
    "events": 0,
    "summaries": 0,
    "subscriptions": 0
  },
  "rejected": []
}

The HTTP sink parses sync_ack.v1 before updating local sync state.

Incremental sync

After a successful sync, the collector records local sync state keyed by sink and target. Passing --since-last sends only events and summaries after the recorded cursor while still including current source, account, assignment, and subscription metadata.

Auth token precedence:

Text
--auth-token > STATSAI_SYNC_TOKEN > stored device access token

Repeated batches are idempotent by stable IDs. The dashboard reads compact API responses backed by server-side rollups instead of scanning all synced records in the browser.