Audit methodology, agent readiness for commerce, Universal Commerce Protocol

This page documents the audit methodology that backs the readiness checklist. It is intentionally reproducible: a merchant team, a consultant and a platform partner should produce similar scores when applied to the same catalog.

Principles

Sample-based. You never need to audit every SKU; you need a statistically meaningful sample per category.
Weighted. P0 items are blocking; P1 and P2 weight the score progressively.
Observable. Every check produces a concrete pass/fail with evidence (URL, feed row, screenshot, API payload).
Reproducible. Two auditors running the method on the same data land within 5 percentage points.

Inputs

Access to catalog feed (URL and credentials).
Public PDP URLs.
Public returns and shipping policy pages.
Read-only access to PSP configuration (optional but improves transactional scoring).
Server logs, 90 days (optional but enables observability scoring).

Sampling

Identify top 5 categories by revenue.
For each category, take a random sample of 10 SKUs (stratified by in-stock / out-of-stock if possible).
For catalogs with >10,000 SKUs, increase sample to 20 per category.
Exclude end-of-life and draft SKUs.

Check categories (six)

Category	Weight	What it measures
Identity	15%	GTIN / MPN / Brand coverage and consistency.
Semantics	25%	Structured data, typed attributes, taxonomy.
Freshness	20%	Price/stock parity, feed cadence, updated_at accuracy.
Policy	15%	Returns/shipping/warranty as data.
Discovery	15%	Feed compliance, canonical URLs, sitemap, AI-crawler access.
Transaction	10%	Agent-pay readiness, lifecycle events.

Scoring rules

Each check returns 0 (fail), 0.5 (partial) or 1 (pass).
Category score = average of checks in that category.
Overall score = weighted sum across categories.
If any P0 check fails, the category score is capped at 0.6.

Evidence requirements

Each check must store one of:

A URL (for PDP/policy checks), screenshotted at audit date.
A feed row (anonymized, stored as JSON).
An API response (headers + body, timestamped).
A tool output (Rich Results Test, Merchant Center diagnostic, log query).

Audits without evidence are not reproducible and not defensible.

Deliverables of an audit

Score card, overall, by category, per SKU sampled.
Top 10 remediation items, concrete, ranked by impact and effort.
90-day remediation plan, who owns what, expected score lift.
Re-audit schedule, quarterly recommended.

Common audit pitfalls

Auditing marketing promises instead of system behaviour. A beautiful returns page prose means nothing if JSON-LD is missing.
Sampling only hero SKUs. Long-tail SKUs often fail faster.
Trusting the feed alone. Feed and PDP must be cross-checked.
Ignoring region variants, a catalog can be agent-ready in one region and not in another.
Grading schema.org markup on presence instead of correctness.

Tools we recommend

Google Rich Results Test and Schema Markup Validator.
Google Merchant Center diagnostics.
A feed linter (Channable, Feedonomics, or custom scripts).
A headless browser for consent-wall / rendering checks.
A log analyzer for crawler-traffic segmentation.

Where to go next

Run the readiness checklist against a 50-SKU sample.
Apply the principles of best practices.
Compare against the target state in product catalogs for AI.