Provenance & Attribution Schema: Practical Patterns for ClaimReview, Source Chains & AI Citations
Introduction — Why provenance and attribution matter now
As generative engines and retrieval-augmented systems surface condensed answers, the ability to trace an assertion back to verifiable sources has become a core trust and compliance requirement. This article gives implementable patterns for publishers and platform teams to add ClaimReview markup, expose source chains and publish machine-readable AI citations that support both human verification and automated audits.
Key standards and references used in this guide include the ClaimReview type in schema.org, the schema.org citation property, and the W3C PROV model for durable provenance records. Each of these is discussed with practical JSON-LD examples and operational guidance below.
Pattern 1 — ClaimReview: practical rules and a minimal JSON-LD template
ClaimReview remains the canonical schema.org type for recording fact-checks and verified assessments of discrete claims. Note: Google Search Central documents the eligibility and technical rules for ClaimReview and also states that support for ClaimReview in Search is being phased out while it remains supported in the Fact Check Explorer; publishers should continue to implement correct markup for interoperability and for other surfaces that consume ClaimReview.
Minimal JSON-LD ClaimReview (embed on the fact-check page):
{
"@context": "https://schema.org",
"@type": "ClaimReview",
"datePublished": "2026-01-15",
"url": "https://example.com/factcheck/claim-123",
"author": {
"@type": "Organization",
"name": "Example FactCheck",
"url": "https://example.com"
},
"claimReviewed": "The product reduces energy usage by 50%.",
"reviewRating": {
"@type": "Rating",
"ratingValue": 2,
"bestRating": 5,
"alternateName": "Mostly false"
},
"itemReviewed": {
"@type": "CreativeWork",
"name": "Press release from Acme Corp",
"url": "https://acme.example/press/energy"
}
}
Implementation notes:
- Keep
claimReviewedconcise (under ~75 characters) and place the numeric/text rating inreviewRating. - Only one ClaimReview per page is eligible for the single fact-check rich result; avoid duplicating the same ClaimReview across multiple pages unless they are true variants.
- Document your rating scale and provide links to primary-source evidence inside the article body (structured data must reflect page content).
Pattern 2 — Source chains, AI citation arrays and PROV-backed bundles
Single-line citations are often insufficient for audit. Two complementary approaches improve traceability:
- Publish a concise public citation block (titles, URLs, short excerpt, author, date) that is rendered for users and indexed by crawlers.
- Publish a machine-readable, append-only provenance bundle that records the full source chain (chunk IDs, ingestion job IDs, retriever and reranker versions, timestamps)—use W3C PROV concepts (Entity, Activity, Agent) to model the events and relationships.
In HTML/JSON-LD you can glue these together: add a citation array or isBasedOn links on the CreativeWork/Answer object for browser-friendly citations, and include a provenance pointer (URL) to a PROV bundle for auditors. Schema.org has a citation property designed for references.
Example JSON-LD for an AI-generated answer with citations and a provenance bundle pointer:
{
"@context": "https://schema.org",
"@type": "CreativeWork",
"name": "Answer: How to reduce energy consumption",
"text": "Summary answer...",
"citation": [
{
"@type": "CreativeWork",
"name": "Acme energy report 2025",
"url": "https://acme.example/report-2025",
"datePublished": "2025-11-10",
"author": { "@type": "Organization", "name": "Acme Research" }
}
],
"subjectOf": {
"@type": "WebPage",
"url": "https://example.com/answer/123",
"sameAs": "https://example.com/provenance/response-123.jsonld"
}
}
Operational pattern:
- Keep a stable
response_idand expose a provenance bundle at a resolvable URL (for example/provenance/response-123.jsonld) that follows PROV conventions (entities = chunks/docs, activities = ingestion/retrieval/rerank, agents = retriever model and human reviewer). This makes the public citation a navigable entry point to the full audit trail. - For span-level or claim-level attribution (map a generated claim to the exact source span), adopt localized attribution patterns such as those described in LAQuer research — these reduce verification overhead by pointing users to the exact supporting paragraph or sentence. Implementations may return
claim_to_source_mapentries that map claim IDs to{source_url, chunk_id, excerpt}.
Governance, rollout and testing checklist
Follow a staged rollout with automated validators and human review gates:
- Schema validation: Add JSON-LD linting into your CI and verify ClaimReview and citation properties against schema.org examples. Use Search Console or equivalent tools where applicable.
- Provenance persistence: Store full provenance logs in an append-only store (index by response_id and chunk_id) and provide a signed bundle when auditors request a trace—map your logs to PROV primitives for consistency.
- UI design: Surface concise citations (title + author + date + 1-2 line excerpt) with a one-click expansion to the provenance bundle. Include feedback affordances for users to report bad or stale sources.
- Privacy & retention: Separate public citations (safe to display) from private provenance data that may contain PII. Define retention windows and access controls for audit data.
- Monitoring: Track citation fidelity (claims where cited source no longer contains the excerpt), source-staleness rates, and reviewer override frequency. If an excerpt's hash differs from the live URL, surface a "source drift" indicator and queue a re-check.
Final considerations: consistent machine-readable citations and an accessible provenance bundle make grounding, retractions, and legal reviews far faster. While some search engines are adjusting how they surface ClaimReview markup in results, structured provenance and explicit citation arrays remain critical for publisher interoperability and for internal auditability.
Further reading & references — schema.org ClaimReview and citation property, Google Search Central fact check guidelines, W3C PROV primer, and LAQuer (localized attribution research).