Skip to content
UCP
Menu

Satellite

Audit methodology

A transparent, reproducible methodology for auditing a merchant's agent-readiness — so any team can run it internally and any auditor can defend its findings.

Updated : April 2026 · Primary query : agent commerce audit methodology

This page documents the audit methodology that backs the readiness checklist. It is intentionally reproducible: a merchant team, a consultant and a platform partner should produce similar scores when applied to the same catalog.

Principles

  • Sample-based. You never need to audit every SKU; you need a statistically meaningful sample per category.
  • Weighted. P0 items are blocking; P1 and P2 weight the score progressively.
  • Observable. Every check produces a concrete pass/fail with evidence (URL, feed row, screenshot, API payload).
  • Reproducible. Two auditors running the method on the same data land within 5 percentage points.

Inputs

  1. Access to catalog feed (URL and credentials).
  2. Public PDP URLs.
  3. Public returns and shipping policy pages.
  4. Read-only access to PSP configuration (optional but improves transactional scoring).
  5. Server logs, 90 days (optional but enables observability scoring).

Sampling

  1. Identify top 5 categories by revenue.
  2. For each category, take a random sample of 10 SKUs (stratified by in-stock / out-of-stock if possible).
  3. For catalogs with >10,000 SKUs, increase sample to 20 per category.
  4. Exclude end-of-life and draft SKUs.

Check categories (six)

CategoryWeightWhat it measures
Identity15%GTIN / MPN / Brand coverage and consistency.
Semantics25%Structured data, typed attributes, taxonomy.
Freshness20%Price/stock parity, feed cadence, updated_at accuracy.
Policy15%Returns/shipping/warranty as data.
Discovery15%Feed compliance, canonical URLs, sitemap, AI-crawler access.
Transaction10%Agent-pay readiness, lifecycle events.

Scoring rules

  • Each check returns 0 (fail), 0.5 (partial) or 1 (pass).
  • Category score = average of checks in that category.
  • Overall score = weighted sum across categories.
  • If any P0 check fails, the category score is capped at 0.6.

Evidence requirements

Each check must store one of:

  • A URL (for PDP/policy checks), screenshotted at audit date.
  • A feed row (anonymized, stored as JSON).
  • An API response (headers + body, timestamped).
  • A tool output (Rich Results Test, Merchant Center diagnostic, log query).

Audits without evidence are not reproducible and not defensible.

Deliverables of an audit

  1. Score card — overall, by category, per SKU sampled.
  2. Top 10 remediation items — concrete, ranked by impact and effort.
  3. 90-day remediation plan — who owns what, expected score lift.
  4. Re-audit schedule — quarterly recommended.

Common audit pitfalls

  • Auditing marketing promises instead of system behaviour. A beautiful returns page prose means nothing if JSON-LD is missing.
  • Sampling only hero SKUs. Long-tail SKUs often fail faster.
  • Trusting the feed alone. Feed and PDP must be cross-checked.
  • Ignoring region variants — a catalog can be agent-ready in one region and not in another.
  • Grading schema.org markup on presence instead of correctness.

Tools we recommend

  • Google Rich Results Test and Schema Markup Validator.
  • Google Merchant Center diagnostics.
  • A feed linter (Channable, Feedonomics, or custom scripts).
  • A headless browser for consent-wall / rendering checks.
  • A log analyzer for crawler-traffic segmentation.

Where to go next