What AI Is Actually Good At — and What It Isn’t

The one test that sorts the good fits from the bad

Strip away the category labels — “copilot,” “agent,” “autonomous” — and a single question separates the tasks where current AI earns its keep from the ones where it quietly creates liability: can the output be tested cheaply, by a human, against an objective standard?

When the answer is yes, the technology is genuinely useful. The work is rules-driven, structured, and bounded, and a mistake is caught the moment someone checks it against a known reference: a required field is missing or it isn’t; the clause matches the template or it doesn’t; the document satisfies the filing rules or it gets rejected. The model does the first pass at speed, and verification is fast and certain.

When the answer is no, the same fluency becomes a hazard. The task demands autonomous legal or factual judgment, or it produces assertions that are expensive to verify and easy to wave through — what the law actually requires, how a specific dispute will come out, whether a given representation is true. Here the model can be confidently, articulately wrong, and the cost of missing it is high.

The through-line of this series holds: AI changes the workflow, not the legal duties. Competence, verification, candor, and non-discrimination remain the company’s responsibility. “The vendor’s algorithm did it” has not been a defense in any setting where it has been tested.

Strong fit vs. poor or high-risk fit

The split below is about the shape of the task, not the vendor. The same underlying model can sit on the green side of this table for one job and the red side for another. Match the tool to the bounded, testable work; keep a human verifier wherever an error is costly.

Task type	Strong fit (bounded, cheaply verifiable)	Poor or high-risk fit (judgment, hard to verify)
Document validation	Checking a filing or form against known formatting and required-field rules, where the standard is objective	Judging whether a document is legally sufficient or strategically sound
Drafting	First drafts from approved templates and clause libraries, reviewed before use	Bespoke drafting on novel facts, or anything sent out unread
Review and summarization	First-pass review, summarization, and triage of documents, with a human check before reliance	Final substantive review treated as complete because the summary reads well
Classification and routing	Tagging, sorting, and routing items into defined categories against clear criteria	Adverse or consequential decisions about people made without human review
Search and retrieval	Surfacing the right form, clause, precedent, or policy from a known corpus	Stating what the law is without verification — the hallucinated-citation problem
Process and intake	Encoding clear procedural requirements and guiding users through structured steps	Case-specific legal advice, or predicting outcomes as if they were facts

Two patterns sit on the red side of nearly every row. The first is any output that asserts what the law requires or how a matter will resolve — assertions that sound authoritative and are costly to check, which is exactly the hallucinated-citation problem that has already drawn judicial sanctions against lawyers who filed AI-generated briefs without reading the cases. The second is anything sold as “no human needed.” That phrase does not describe a capability; it describes who absorbs the loss when the tool is wrong.

The cognitive trap

The most dangerous failure here is not technical — it is psychological, and it operates on sophisticated readers as readily as anyone. Fluent, well-formatted output produces a feeling of correctness. A clean answer in confident prose feels checked even when nothing checked it. That sense of ease is precisely what invites skipping the verification the output most needs.

The discipline that has served clients well is to treat polish as a reason for more scrutiny, not less, on any task where an error is costly. The better an unverifiable answer looks, the more it deserves a second pass against the source — because its fluency is doing the work that evidence should be doing.

Questions to ask any AI vendor

Run any pitch through these before the procurement conversation gets serious. The pattern of the answers — specific and testable, or hand-waving and brand-driven — usually tells you more than any single response. Vague answers on verification, error rate, or liability are themselves the finding.

What exactly is the task, and how is success measured? A precise, bounded task with an objective success metric is a good sign. “It handles your legal work” is not a task description.
How is each output verified, and by whom? Ask to see the actual checking step, not the marketing claim. If verification depends on a human, confirm who that is and that they have the time and the standard to do it.
What is the error rate, and what does a wrong answer look like? A vendor who cannot describe the failure mode has not characterized it. The shape of a wrong answer — loud and obvious, or quiet and plausible — determines how much it can hurt you.
What data trains the model, and where does our data go? Confirm whether your inputs train the vendor’s model, who else can see them, and where they are stored. Privilege, trade secrets, and regulated data all turn on this answer, and the rules vary by jurisdiction and are in active flux.
Who is liable when it is wrong? Read the contract, not the slide. Many AI agreements disclaim accuracy and cap liability at a token amount, leaving the loss with you. See the AI vendor contract terms to negotiate.
Can we audit or explain a given output? If you cannot reconstruct why the tool produced a particular result, you cannot defend it to a regulator, a court, or an employee who was adversely affected — a recurring theme in emerging AI rules.

Frameworks worth tracking as you evaluate tools — the EU AI Act, the Colorado AI Act, NYC Local Law 144 on automated employment decision tools, Illinois’s BIPA and Artificial Intelligence Video Interview Act, and EEOC and FTC guidance, among others — are moving fast and differ sharply by jurisdiction. Treat any reference to them here as a prompt to confirm current status, effective dates, scope, and applicability to your situation with counsel, not as a statement of settled law. A standing inventory and governance process is what lets you answer the audit question above when it is asked.

The thing behind the thing

The most expensive AI mistakes come from tasks that look automatable but are not cheaply verifiable. If you cannot test the output against an objective standard, a human still owns the judgment — no matter how good the demo looked, and no matter what the contract says about whose algorithm did it.