Open Standard · Draft for Public Comment
The Legal AI Verification Standard
A Framework for Trustworthy AI Output in Legal Document Work
Abstract
Legal AI tools have achieved impressive results in generating plausible outputs across a wide range of document tasks — summarisation, clause extraction, contract drafting, and compliance review. Yet adoption among practising lawyers remains limited for high-stakes work. The barrier is not capability; it is verifiability. A tool that produces a correct answer most of the time is insufficient when professional responsibility requires knowing which answers are correct. This standard defines the minimum requirements for a legal AI output to be considered verified: every claim must be sourced, every mandatory element must be accounted for, and no gap may be silently omitted. It applies to document review, compliance checking, and AI-assisted drafting. It does not prescribe implementation. It prescribes the bar any implementation must clear.
1. Introduction
The legal profession has had access to capable AI tools for several years. The adoption curve has nonetheless remained shallow for the category of work that matters most: high-stakes document review, regulatory compliance checking, and contract drafting where errors carry professional and financial consequences.
The common explanation attributes this gap to lawyer conservatism or institutional inertia. This explanation is insufficient. Lawyers adopt technology that reduces risk. They resist technology that introduces it. The correct diagnosis is that current legal AI tools transfer risk from the document task to the verification task: the lawyer must now verify the AI's output, a task that is often as time-consuming as the original work and for which the tool provides no support.
The underlying problem is that most legal AI systems are optimised for generation — producing outputs that are accurate on average, plausible in form, and responsive to the query. They are not optimised for verification — proving that what they produced is correct, complete, and traceable to source material. These are different problems. Solving the first does not solve the second.
A lawyer who receives an AI-generated LPA review summary faces two questions the tool typically cannot answer: Is everything here correct? Is everything relevant here? The first is a soundness question. The second is a completeness question. Both must be answered before the output can be used without independent re-review.
This standard defines what it means to answer both questions rigorously. It is not a product specification. It is a verification standard — a description of the properties a legal AI output must possess for a practitioner to rely on it without repeating the underlying work. Any tool, built by any organisation, may be assessed against it.
2. The Verification Problem in Legal AI
2.1 Two Failure Modes
Legal AI systems fail in two distinct ways. Both must be addressed. Neither is sufficient alone.
Hallucination occurs when a tool outputs a value, fact, or clause that has no basis in the source material. The tool asserts something that is not there. In legal document work, hallucination can take many forms: a clause number that does not exist, a defined term that was never introduced, a regulatory threshold that appears in no source, a party name that is subtly wrong. Hallucination is the failure mode most discussed in the literature and most commonly tested in product evaluations. It is detectable in principle: if every output is accompanied by a source reference, hallucinations can be identified by checking that the reference is real and that it supports the stated value.
Incompleteness occurs when a tool fails to surface a relevant element that exists in the source material or that is mandated by the applicable legal framework. The tool does not assert something false — it simply does not assert something true. In legal document work, incompleteness is at least as consequential as hallucination and considerably harder to detect. A hallucinated clause is visible in the output; an omitted clause is not. A lawyer reviewing a checklist output that lists forty correctly sourced findings has no indication whether the list is complete unless the tool explicitly accounts for everything it was required to find.
2.2 The Asymmetry of the Two Problems
Hallucination and incompleteness are not symmetric in difficulty or in how they are currently addressed.
Tools that produce source citations for every output make hallucination detectable: a reviewer can spot-check references. The effort required scales with output size, not document size. A 200-field checklist with 200 source citations can be sampled. Tools in this category exist; the practice of citing sources in AI legal output is established.
Incompleteness is harder because it requires reasoning over what is not in the output. A tool can have a perfect hallucination rate — every finding it produces is correctly sourced — and still miss forty percent of what a competent reviewer would have found. Detecting this requires knowing what should have been found, which in turn requires a complete specification of the task. For regulatory documents, that specification is partially derivable from the regulatory framework. For contract review, it is partially derivable from standard market practice and the firm's own precedent corpus. For drafting, it is derivable from the instructions and the precedents used.
The consequence is that a soundness-only approach — one that ensures every output is correctly sourced — is necessary but not sufficient for professional reliance. A completeness-only approach — one that flags everything that might be relevant — produces unverifiable noise. Both requirements must be satisfied simultaneously, and the output must make clear which has been achieved for every element.
3. The Two Verification Requirements
3.1 Soundness
A legal AI output is sound with respect to an element if, and only if, there exists a verifiable witness in the source material for the value or classification attributed to that element.
A verifiable witness is a specific, locatable position in the source — a page number and clause identifier, a section heading and paragraph, a table row and column — at which the stated value appears in a form that supports the tool's output. The witness must be independently checkable: a third party with access to the source document must be able to locate the cited position and confirm that it contains what the tool claims.
Three clarifications follow from this definition:
First, a witness must be specific. A citation to a document section that spans multiple pages and contains many provisions does not constitute a verifiable witness for a specific value. The citation must be precise enough that a reviewer can locate the supporting text without reading the entire section.
Second, a witness must support the stated value. A citation to a clause that discusses a related provision, but does not contain the stated value, is not a valid witness. The tool must not conflate proximity with support.
Third, inference without a witness is not sound. If a tool derives a value through reasoning — even valid reasoning — from other stated values, and no document position directly states that value, the output is not sound with respect to that element. It may be correct; it is not verified. The correct classification in this case is UNVERIFIABLE, not FOUND.
3.2 Completeness
A legal AI output is complete with respect to a task if, and only if, every element that the task requires to be addressed is accounted for in the output — either as FOUND (present and sourced), ABSENT (not present in the source, with an explanation), or UNVERIFIABLE (present but not confirmable from source material alone).
The key requirement is that silence is not a valid response. If a task requires the tool to address a mandatory element and the tool does not mention that element, the output is incomplete. The tool may have found the element and not reported it. It may have failed to look for it. It may have looked and found nothing. The output must distinguish between these cases. An output that reports forty findings without indicating that forty-three elements were expected, and that three are missing or unverifiable, does not satisfy the completeness requirement.
3.3 The Relationship Between Soundness and Completeness
These two properties are logically independent. A tool can be sound without being complete — every output it produces is correctly sourced, but the set of outputs is a strict subset of what was required. A tool can aim for completeness without achieving soundness — it accounts for every required element, but some of its accountings are unsourced or incorrectly cited.
For legal AI output to be professionally reliable, both properties must be achieved simultaneously. The output must be sound: every FOUND classification must have a valid witness. And it must be complete: every required element must appear in the output as FOUND, ABSENT, or UNVERIFIABLE.
The combined requirement is the verification standard. Either property alone is insufficient for professional reliance.
4. Categories of Legal Verification
Legal documents contain elements of different types, and the verification requirements differ by type. Any legal AI system that claims compliance with this standard must be able to handle all of the following categories.
4.1 Deterministic Identifiers
Some legal document elements have a formal, independently checkable structure. Entity identifiers (LEI codes, ISIN numbers, CRN numbers), date formats, currency codes, and jurisdiction codes fall into this category. For these elements, verification is mechanical: the stated value either matches the expected format and, where applicable, resolves to a known record in an external registry, or it does not.
Hallucination in this category is detectable without legal expertise. An LEI code that does not resolve to a known entity is a verifiable hallucination. A date expressed in an invalid format is immediately flagged. The verification standard requires that tools check deterministic identifiers against their applicable specifications and report mismatches as findings, not silently accept them.
4.2 Enumerated Values
Many elements in legal and regulatory documents must take a value from a closed set defined by regulation or market convention. Asset categories under AIFMD, share class types under UCITS, risk profile classifications under MiFID II, and jurisdiction codes under applicable tax treaties are examples. The correct value is not one that sounds plausible — it is one that appears in the applicable enumeration.
Verification in this category requires that the tool maintain or access the relevant enumerations and check stated values against them. A value that does not appear in the applicable enumeration is either incorrectly extracted or reflects a non-standard document that requires flagging.
4.3 Structural Dependencies
Legal documents contain elements that are logically dependent on each other. The existence of a share class entails the existence of associated fee structures, currency designations, and distribution policies. The definition of a term in one section implies a consistent use of that term throughout the document. The appointment of a management company implies the presence of a delegation agreement or an explanation of its absence.
Verification in this category requires relational reasoning: the tool must track what the document has established and identify where structural dependencies are satisfied or violated. A document that introduces a share class without specifying a currency is not malformed in a syntactic sense; it is incomplete in a structural sense. The verification standard requires that structural dependencies be checked and violations reported.
4.4 Cross-Document References
Legal work frequently involves multiple documents that must be consistent with each other — a fund prospectus and its KIID, an LPA and its side letters, a credit agreement and its security documents. Elements defined in one document may be referenced in another, and the references must be consistent.
Verification in this category requires tracking defined terms, values, and obligations across the document set and identifying inconsistencies. A party name that appears differently in two documents, a defined term used in one document that is not defined in either, or a threshold that differs between a main agreement and its schedule — all constitute cross-document verification failures.
4.5 Arithmetic Constraints
Legal and regulatory documents contain numerical relationships that must hold. Percentage allocations must sum to appropriate totals. Leverage ratios must be consistent across the document. Fee calculations must follow the stated method. Capital contribution schedules must be internally consistent.
Verification in this category requires that the tool identify stated numerical values, extract the applicable constraints, and check them. Arithmetic errors in legal documents are not rare; they are a known source of dispute and liability. A verification standard that does not include arithmetic checking is incomplete.
4.6 Regulatory Mandates
Some elements must appear in a document because the applicable regulatory framework requires them — regardless of whether the drafter has included them. An AIFMD-compliant fund prospectus must contain specific disclosures. A UCITS KIID must include a prescribed risk indicator. A CSSF-regulated fund agreement must address certain governance requirements.
Verification in this category requires that the tool know what the applicable regulatory framework mandates and check the document against those mandates. An element that is required by regulation and absent from the document is a compliance finding, not a neutral observation. The verification standard requires that regulatory mandates be checked and absences surfaced as explicit findings.
4.7 Drafting Completeness
For AI-generated legal text, the verification problem takes a different but analogous form. A generated clause is verified if it traces to a precedent in the applicable corpus or to an explicit instruction from the drafter. Generation that cannot be traced to either source — a clause that is plausible but has no precedent in the firm's archive and was not requested — is unverified generation.
The verification standard for drafting requires that every material clause in an AI-generated document be attributed to its source: a specific precedent document and clause, or a specific instruction. Clauses without attribution are flagged as GENERATED-WITHOUT-PRECEDENT, not silently included in the output.
5. The Standard: Minimum Requirements for Verified Legal AI Output
A legal AI output satisfies this standard if and only if it meets the following eight requirements in their entirety.
Requirement 1 — Source attribution: Every extracted or generated value in the output must be attributable to a specific, locatable source: a page number and clause identifier in the source document, or an identified precedent and drafting instruction for generated text. Attribution must be specific enough for independent verification.
Requirement 2 — Mandatory element coverage: Every element that is mandatory for the applicable document type, task, and jurisdiction must appear in the output. An output that addresses a subset of mandatory elements without accounting for the remainder does not satisfy this standard, regardless of the accuracy of what it does address.
Requirement 3 — Three-state classification: Every mandatory element must be classified as one of three states: FOUND (the value is present in the source and has been sourced), ABSENT (the element is not present in the source, with an explanation of what is missing and why its absence is noted), or UNVERIFIABLE (the element is present in the source but cannot be verified to the applicable standard without information not available to the tool). No other classification is permitted for mandatory elements.
Requirement 4 — No inference as FOUND: A value derived through inference, even valid inference, may not be classified as FOUND unless a direct verifiable witness exists in the source material. Inference without a witness is classified as UNVERIFIABLE. This requirement prevents tools from presenting correct conclusions as verified findings when the underlying evidence is indirect.
Requirement 5 — Drafting provenance: For AI-generated text, every material clause must be attributed to a source: a precedent document in the applicable corpus with a specific clause reference, or an explicit instruction from the drafter. Clauses without attribution to either source are classified as GENERATED-WITHOUT-PRECEDENT and must be clearly distinguished from sourced clauses in the output.
Requirement 6 — Third-party auditability: The output must be auditable without access to the AI system. A qualified reviewer with access to the source documents and the output must be able to verify every FOUND classification by locating the cited source and confirming that it supports the stated value. An output that requires access to the tool's internal processing to verify is not auditable and does not satisfy this standard.
Requirement 7 — Explicit omission reporting: If a mandatory element is absent from the source document, this absence must be reported as an explicit finding in the output. Mandatory elements that are absent from the source are not failures of the document review — they are findings of potential compliance or drafting gaps. Omitting them from the output is a failure of the tool.
Requirement 8 — No confidence by silence: If a tool cannot verify a field, it must report the field as UNVERIFIABLE with an explanation. Omitting an unverifiable field from the output — thereby implying by absence that all reported fields are verified — does not satisfy this standard. Every field in scope for the task must appear in the output in one of the three states defined in Requirement 3.
6. Compliance and Testing
6.1 The Benchmark Approach
A legal AI tool demonstrates compliance with this standard through public benchmarking: publishing source documents together with complete outputs, and making both available for independent verification by any qualified reviewer.
A compliant benchmark consists of: (i) a source document or document set, redacted as necessary for confidentiality; (ii) the tool's full output in the applicable classification format (FOUND, ABSENT, UNVERIFIABLE, or GENERATED-WITHOUT-PRECEDENT for each element); (iii) the source attributions for all FOUND classifications; and (iv) a specification of the mandatory elements applicable to the document type and jurisdiction.
Any reviewer with access to the source documents can verify the benchmark by checking source attributions, confirming ABSENT and UNVERIFIABLE classifications against the source, and assessing whether the mandatory element coverage is complete.
6.2 Dispute Arbitration
Disagreements about whether a given output satisfies this standard — whether a source citation adequately supports a FOUND classification, whether an element is genuinely mandatory for the applicable document type, or whether a classification is correct — should be resolved by a panel comprising at least one legal practitioner with expertise in the relevant document type, one independent legal technology specialist, and one academic with relevant research expertise.
The panel's role is not to evaluate the tool's general performance but to assess specific disputed classifications against the requirements defined in Section 5. Disputes should be resolved on the basis of the standard's text, the source documents, and the output in question.
6.3 Enidia AI's Commitment
Enidia AI commits to publishing benchmark documents, full outputs, and source attributions for public verification at www.enidia.ai. Enidia AI invites independent reviewers to identify cases where its outputs do not satisfy the requirements of this standard, and commits to publishing the results of verified disputes together with any corrections made.
7. Implications for Legal Practice
7.1 Verification as a Professional Requirement
The requirements defined in this standard are not arbitrary technical criteria. They are the logical minimum for a legal AI output to be usable without independent re-review of the source material.
A lawyer who relies on a legal AI output that does not satisfy Requirement 6 — third-party auditability — cannot discharge their professional responsibility by citing the tool. If the output cannot be verified by examining the source, the lawyer cannot verify it either. The professional obligation to check the work falls back to them in full.
A lawyer who relies on an output that does not satisfy Requirement 7 — explicit omission reporting — cannot know whether the output is complete. The tool may have omitted mandatory findings without notice. The professional obligation to check for omissions falls back to them in full.
In both cases, the tool has transferred risk rather than reduced it. The purpose of a verification standard is to specify what a tool must do to genuinely reduce the verification burden on the practitioner rather than merely displacing it.
7.2 Verification as a Procurement Criterion
Law firms and in-house legal teams evaluating AI tools should assess compliance with this standard as a procurement criterion alongside capability and cost. A tool that generates accurate outputs but cannot demonstrate compliance with Sections 4 and 5 of this standard requires the same verification effort after its use as before. The efficiency gains are partially or wholly offset by the verification cost.
The benchmark approach in Section 6 provides a practical mechanism for procurement evaluation: request that a vendor publish a benchmark on a document type relevant to the firm's work, and assess the output against the requirements in Section 5 independently.
8. Conclusion
The adoption gap in legal AI is not primarily a capability problem. Tools that can extract, classify, and generate legal content with high average accuracy have existed for several years. The gap is a verification problem: practitioners have no principled basis for relying on outputs they cannot verify, and current tools provide inadequate verification support.
The requirements defined in this standard — source attribution, mandatory element coverage, three-state classification, no inference as FOUND, drafting provenance, third-party auditability, explicit omission reporting, and no confidence by silence — constitute the minimum bar for a legal AI output to be professionally reliant without independent re-review.
These requirements are achievable. Production deployment of systems that meet them has demonstrated significant time savings on document review tasks while maintaining complete verifiability of every output — a combination that was not previously available and that demonstrates the commercial and professional viability of the verification-first approach.
This standard is published as a draft for public comment. Responses, disputes, and proposed amendments are invited. Enidia AI commits to publishing benchmark results against this standard and to testing any legal AI tool that chooses to submit its outputs for evaluation.
References
Goguen, J. A., & Burstall, R. M. (1992). Institutions: Abstract model theory for specification and programming. Journal of the ACM, 39(1), 95–146.
Halpern, J. Y., & Moses, Y. (1990). Knowledge and common knowledge in a distributed environment. Journal of the ACM, 37(3), 549–587.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Aligning AI with shared human values. arXiv preprint arXiv:2008.02275.
Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2024). GPT-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270).
Libal, T. (2022). The LegAi Editor: A tool for the construction of legal knowledge bases. In Proceedings of JURIX 2022, IOS Press.
Libal, T., Smywiński-Pohl, A., Kaczmarczyk, A., & Król, M. (2025). Are manual annotations necessary for statutory interpretations retrieval? In Proceedings of ICAIL 2025. arXiv:2506.13965.
Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. In Proceedings of ACL 2020, 1906–1919.