StatsAI docs
Sync contract
The sync_batch.v1 contract between the StatsAI CLI and the hosted API, including privacy defaults and response shape.
Last updated: June 9, 2026
Who this is for
This page documents the boundary between the open-source collector and the hosted StatsAI API. If you are integrating a compatible backend or auditing what leaves your machine, start here. For a user-facing summary of what syncs, see the privacy model.
Contract boundary
sync_batch.v1 is the first backend-facing contract for StatsAI. The collector
owns local scanning, normalization, idempotent local storage, and privacy
scrubbing. The hosted API owns authentication, validation, deduplication,
rollups, and dashboard queries.
Producing a batch
statsai sync --dry-run
statsai sync --sink stdout
statsai sync --sink file --output ./statsai-sync-batch.json
statsai sync --sink http --since-last
statsai sync --sink http --verify
statsai schema sync-batch
Use schema sync-batch to print the JSON Schema the CLI validates against
before sending HTTP batches. Use --dry-run for a quick count summary, or a
file/stdout sink when you need the full payload.
Privacy defaults
The production sync path strips record-level local evidence before sending:
| Field | Why it is removed |
|---|---|
SourceLocation.path_label | Hides local directory names |
ProviderAccount.plan_name | Redundant with subscription records |
UsageEvent.source.source_record_id | Raw local record pointer |
UsageEvent.parse_evidence.* | Line numbers and record IDs from parsing |
UsageSummary.source.source_record_id | Raw local record pointer |
UsageSummary.parse_evidence.* | Line numbers and record IDs from parsing |
Subscription.notes | User-entered private commentary |
Hashed path, source, event, and summary identifiers remain so the server can
deduplicate records without seeing local file names directly.
ProjectInfo.path_label is retained for owner-facing project location displays
and manual project linking.
Canonical provider account identity may sync through ProviderAccount.provider_user_id
and ProviderAccount.email. User-defined aliases remain in
ProviderAccount.account_label for display, but they are not the primary
account key.
HTTP endpoint
Production sync posts to:
POST /api/sync/batches
A compatible backend should:
- require an authenticated device access token
- accept
Authorization: Bearer <device_access_token> - validate the request body against
sync_batch.v1 - reject unsupported
schema_versionvalues - deduplicate sources, accounts, assignments, subscriptions, and summaries by stable IDs
- compute daily, monthly, and dashboard rollups server-side from accepted summaries
- return accepted, updated, duplicate, and rejected counts
The loopback daemon still supports /v1/sync/batches for local diagnostics,
but /api/sync/batches is the production contract.
Response shape
Successful HTTP sync returns sync_ack.v1:
{
"schema_version": "sync_ack.v1",
"batch_id": "batch_1710000000000",
"accepted": {
"sources": 1,
"accounts": 1,
"source_account_assignments": 1,
"subscriptions": 0,
"events": 1,
"summaries": 0
},
"duplicates": {
"sources": 0,
"accounts": 0,
"source_account_assignments": 0,
"events": 0,
"summaries": 0,
"subscriptions": 0
},
"rejected": []
}
The HTTP sink parses sync_ack.v1 before updating local sync state.
Incremental sync
After a successful sync, the collector records local sync state keyed by sink
and target. Passing --since-last sends only events and summaries after the
recorded cursor while still including current source, account, assignment, and
subscription metadata.
Auth token precedence:
--auth-token > STATSAI_SYNC_TOKEN > stored device access token
Repeated batches are idempotent by stable IDs. The dashboard reads compact API responses backed by server-side rollups instead of scanning all synced records in the browser.