Quick reference · AI & emerging risk
What AI Is Actually Good At — and What It Isn’t
Most AI buying decisions turn on one question the demo is designed to keep you from asking: can the output be cheaply tested against an objective standard? Where it can, AI is genuinely strong. Where it can’t, a confident wrong answer can cost more than the tool ever saved. This is a quick reference for telling the two apart before you sign.
The one test that sorts the good fits from the bad
Strip away the category labels — “copilot,” “agent,” “autonomous” — and a single question separates the tasks where current AI earns its keep from the ones where it quietly creates liability: can the output be tested cheaply, by a human, against an objective standard?
When the answer is yes, the technology is genuinely useful. The work is rules-driven, structured, and bounded, and a mistake is caught the moment someone checks it against a known reference: a required field is missing or it isn’t; the clause matches the template or it doesn’t; the document satisfies the filing rules or it gets rejected. The model does the first pass at speed, and verification is fast and certain.
When the answer is no, the same fluency becomes a hazard. The task demands autonomous legal or factual judgment, or it produces assertions that are expensive to verify and easy to wave through — what the law actually requires, how a specific dispute will come out, whether a given representation is true. Here the model can be confidently, articulately wrong, and the cost of missing it is high.
The through-line of this series holds: AI changes the workflow, not the legal duties. Competence, verification, candor, and non-discrimination remain the company’s responsibility. “The vendor’s algorithm did it” has not been a defense in any setting where it has been tested.
Strong fit vs. poor or high-risk fit
The split below is about the shape of the task, not the vendor. The same underlying model can sit on the green side of this table for one job and the red side for another. Match the tool to the bounded, testable work; keep a human verifier wherever an error is costly.
| Task type | Strong fit (bounded, cheaply verifiable) | Poor or high-risk fit (judgment, hard to verify) |
|---|---|---|
| Document validation | Checking a filing or form against known formatting and required-field rules, where the standard is objective | Judging whether a document is legally sufficient or strategically sound |
| Drafting | First drafts from approved templates and clause libraries, reviewed before use | Bespoke drafting on novel facts, or anything sent out unread |
| Review and summarization | First-pass review, summarization, and triage of documents, with a human check before reliance | Final substantive review treated as complete because the summary reads well |
| Classification and routing | Tagging, sorting, and routing items into defined categories against clear criteria | Adverse or consequential decisions about people made without human review |
| Search and retrieval | Surfacing the right form, clause, precedent, or policy from a known corpus | Stating what the law is without verification — the hallucinated-citation problem |
| Process and intake | Encoding clear procedural requirements and guiding users through structured steps | Case-specific legal advice, or predicting outcomes as if they were facts |
Two patterns sit on the red side of nearly every row. The first is any output that asserts what the law requires or how a matter will resolve — assertions that sound authoritative and are costly to check, which is exactly the hallucinated-citation problem that has already drawn judicial sanctions against lawyers who filed AI-generated briefs without reading the cases. The second is anything sold as “no human needed.” That phrase does not describe a capability; it describes who absorbs the loss when the tool is wrong.
The cognitive trap
The most dangerous failure here is not technical — it is psychological, and it operates on sophisticated readers as readily as anyone. Fluent, well-formatted output produces a feeling of correctness. A clean answer in confident prose feels checked even when nothing checked it. That sense of ease is precisely what invites skipping the verification the output most needs.
The discipline that has served clients well is to treat polish as a reason for more scrutiny, not less, on any task where an error is costly. The better an unverifiable answer looks, the more it deserves a second pass against the source — because its fluency is doing the work that evidence should be doing.
Questions to ask any AI vendor
Run any pitch through these before the procurement conversation gets serious. The pattern of the answers — specific and testable, or hand-waving and brand-driven — usually tells you more than any single response. Vague answers on verification, error rate, or liability are themselves the finding.
Frameworks worth tracking as you evaluate tools — the EU AI Act, the Colorado AI Act, NYC Local Law 144 on automated employment decision tools, Illinois’s BIPA and Artificial Intelligence Video Interview Act, and EEOC and FTC guidance, among others — are moving fast and differ sharply by jurisdiction. Treat any reference to them here as a prompt to confirm current status, effective dates, scope, and applicability to your situation with counsel, not as a statement of settled law. A standing inventory and governance process is what lets you answer the audit question above when it is asked.
The most expensive AI mistakes come from tasks that look automatable but are not cheaply verifiable. If you cannot test the output against an objective standard, a human still owns the judgment — no matter how good the demo looked, and no matter what the contract says about whose algorithm did it.