Automated
Highlight overlap between answer spans and retrieved chunks.
Human spot-check
Sample 20 weekly; track drift when documents update.
Versioning
Pin corpus versions next to answers in audit logs.
Citations are liability surface area
For regulated or customer-facing answers, citation checks must verify span overlap, not URL wallpaper. A link to the wrong section is worse than no link—it trains false confidence.
Automate highlight comparison between model answer sentences and retrieved chunks; flag paraphrases that drift from source meaning.
Quality sampling under load
Stratify samples by risk tier and document freshness. A weekly twenty-ticket review beats a monthly two-hundred-ticket binge that nobody finishes honestly.
Record disagreements in a shared taxonomy (stale doc, ambiguous chunk, model overreach) so engineering and knowledge teams fix the right layer.
Corpus drift alarms
When upstream PDFs or policies change, trigger re-validation of affected answers. Tie corpus versions to retrieval metadata displayed in internal UIs.
Legal prefers boring logs with hashes over clever prose explaining “trust us.”
Closing the loop
Feed citation failures back into chunking rules and ingestion priorities—not only prompt tweaks.
SignalSpring’s compliance memo stance: if you cannot explain why a citation appeared, do not ship the feature externally yet.