HomeBlog › Your AI Governance Policy Requires "Human Oversight"
AI Governance

Your AI Governance Policy Requires "Human Oversight." Can You Prove Your Humans Are Capable of Providing It?

By Leon Kopelev · Apr 27, 2026 · 9 min read
AI Governance: Your policy says human oversight required. Can your humans actually provide it?

The Audit Question You Cannot Answer With a Completion Dashboard

Picture the conference room. The audit team has spent two days reviewing your AI program. They have your governance policy, your approved-tools list, your usage logs, your training completion dashboard. The room has been polite. The questions have been procedural.

Then the lead auditor closes her laptop and asks one more thing.

"Show us how you verify your employees can actually evaluate what the AI produces."

You have training completion records. You have attestation forms. You have a policy that says human oversight required at every step. You have a vendor who emailed last quarter's "98% completion" stat in a triumphant subject line. What you don't have is a single piece of evidence that your team can do the thinking the policy requires.

The question itself is the finding. Not because anyone broke a rule. Because the rule is built on an assumption nobody tested.

This is the AI governance gap. Most organizations have closed the documentation half. Almost none have closed the cognitive half. The policies require people to evaluate, judge, override, and verify. The training programs measure none of those things.

That gap is where the audit lives. It is also where the liability lives.

The Doom Loop Comes to the Workforce

The gap is not a hypothetical. Two recent findings show it is large and probably growing.

A 2025 RAND survey of higher-education students found that 67% recognize using AI is harming their critical thinking. They keep using it anyway. Brookings has called the same dynamic the "doom loop of dependence": the more capable the AI gets, the less the user practices the underlying judgment, the more dependent on the AI they become, the harder it gets to evaluate what the AI says when it matters.

Now substitute "your employees" for "students." The mechanics are identical. Your underwriters are using AI for first-pass risk assessment. Your analysts are using AI to summarize 10-K filings. Your claims handlers are using AI to draft customer letters. Your researchers are using AI to scaffold legal arguments. Your compliance officers are using AI to draft procedure documents. Each one of them is subject to the same loop.

Most of them, if asked, would tell you they know they should verify the output. Most of them, on a Tuesday afternoon with a deadline, do not. The training said "human oversight required." Nothing in the training built the capability the oversight requires.

This is the part that doesn't show up on your governance dashboard. Adoption rate is high. Completion rate is high. Policy attestation rate is high. The capacity to spot a hallucinated citation, a missing evidentiary chain, an unsupported correlation, or a confidence-laundered guess: nobody measured that, because measuring it is hard.

What the Cases That Already Went to Court Cost

The cases that have already gone to court tell you what the gap costs.

Air Canada (2024). The airline's chatbot promised a passenger a bereavement-fare refund the airline did not actually offer. When the passenger sued, Air Canada argued the chatbot was a separate legal entity. The British Columbia Civil Resolution Tribunal disagreed and held the airline responsible. The case was decided on a small dollar amount. The precedent is not small. Companies own the output of their AI systems. The policy that said "review the chatbot's responses" did not survive contact with the actual workflow.

Mata v. Avianca (2023). Two attorneys filed a brief that cited six judicial opinions. The opinions did not exist. ChatGPT had fabricated them. The court fined the attorneys $5,000 each. The bar associations followed. Every law firm in the country read the sanction order. Every law firm in the country has a written policy requiring associates to verify AI-assisted research. Read the deposition. The associate had a policy. He knew the policy. He did not know how to operationalize the policy on a Friday night with a Monday filing deadline. He confirmed the cases existed by asking the same AI that had invented them.

Samsung (2023). Engineers pasted proprietary semiconductor source code into ChatGPT to debug it. The data left the building. Samsung did not learn about it from a security tool. They learned from the news. The AI usage policy at the time prohibited sharing confidential information with external services. Three engineers, three workflows, three independent decisions to ignore the policy under workload pressure. The policy required judgment the engineers did not exercise.

These are not corner cases. They are the cases auditors and regulators study to write the next round of compliance expectations. The EU AI Act, NIST AI RMF, SR 11-7, and the Colorado AI Act all require some version of "human oversight" or "competent human review." Each one of them is built on a load-bearing assumption that the human in the loop is, in fact, capable of doing the thinking. None of them tells you how to know.

Completion Is Not Competency

Completion rates measure training consumption. They do not measure cognitive capability.

Watching a video about prompt engineering does not produce a person who can recognize when an AI's confident-sounding answer rests on an unsupported claim. Reading a policy document does not produce a person who can spot a correlation-causation error in a generated risk memo. Clicking through a multiple-choice quiz on "responsible AI" does not produce a person who can identify what is missing from an AI's analysis.

What the policies require is a specific class of skill. The skill is not generic critical thinking and it is not generic AI literacy. It is the operational ability to do five things on demand:

  1. Spot unsupported claims in AI output. The AI says "studies show." Which studies?
  2. Recognize correlation presented as causation. The AI says "X is associated with Y, suggesting Y causes X." It might. It might not.
  3. Identify what evidence is missing. The AI says "the company is well-positioned." Compared to what? On what metric? Over what period?
  4. Notice when the framing is loaded. The AI presents three options. Are those the only three? Who chose them?
  5. Override AI confidence with human judgment when the costs of being wrong are asymmetric. Underwriting a $50M policy is not symmetric with rejecting it.

None of these are checkboxes. Each one is a thinking move that has to be practiced under conditions that resemble the actual job. Watch a video, you do not get the skill. Take a quiz, you do not get the skill. Sign an attestation, you definitely do not get the skill.

The audit question is asking whether your training program produced the skill. The completion dashboard cannot answer that question.

Shift From Attestation to Assessment

That sentence is short and the work behind it is not. But the principle is plain. Every dollar you currently spend on "we trained our team" should be paired with a dollar that produces "and here is the evidence they can do the thinking the policy requires." The evidence is the artifact your audit team needs.

Three concrete moves.

Replace completion gates with capability gates. Today: an employee completes a course, gets the certificate, the ticket closes. Tomorrow: an employee completes a course AND produces a session report that shows them practicing the five thinking moves listed above on industry-realistic scenarios. The certificate is a byproduct. The session report is the artifact. When the auditor asks how you verify capability, you hand them the report.

Make the practice scenarios match the actual workflow. A claims handler does not need to evaluate a generic AI summary of a generic news article. They need to evaluate an AI-generated summary of a first notice of loss with specific contradictions, missing dates, and confidence-laundered conclusions. The training has to put the specific thinking move under the specific operational conditions. Industry-specific scenarios are not a nice-to-have. They are the load-bearing piece of the assessment.

Create a competency record per employee, not just a completion record. The completion record is one column. The competency record has rows: which thinking moves the employee performed, on what scenarios, with what feedback, with what evidence of growth over time. This is the artifact the EU AI Act will eventually require by name, the artifact NIST AI RMF Govern function already implies, and the artifact a plaintiff's attorney will subpoena the moment your AI workflow produces a customer harm.

Three implementation details that decide whether this works. The scenarios have to be written by people who understand the workflow, not by a generic AI literacy vendor. The assessment has to be conducted in a way that surfaces the thinking, not just the answer. The output has to be structured so a non-technical audit team can read it without translation.

The good news: every component of this exists. Industry-specific scenarios. Coaching that elicits and scores the thinking moves. A report that shows what the employee got right, where they got stuck, and what to practice next. Not as a future product. As an artifact you can hand the auditor in two months, not two years.

The Five-Audience Test

Run a small thought experiment before you close this tab.

Pick the AI workflow your organization is most exposed on. The one that, if it produced a customer harm, would land in front of a regulator, a court, or a journalist. Underwriting recommendations. Claims handling. Clinical documentation. Legal research. Financial advice. Customer-facing chat. KYC review. Pick one.

Now ask: does your current AI training program produce a competency score for the people running that workflow that you would show, without flinching, to:

If the answer is yes, you are ahead of where most organizations are. If the answer is "we have completion records," you have what most organizations have, which is documentation theater. The completion record is real. The thing it documents is not.

The audit question is not "did your people watch the video." It is "can your people do the thinking the policy requires." Today, you probably cannot answer that question with anything that holds up under scrutiny. A year from now, "we did not measure that" will not be a defensible answer. The frameworks are explicit. The cases are public. The pattern is repeating.

The gap is closeable. It just is not closeable with a completion dashboard. It is closeable with a competency record.

What does your competency record look like?

Leon Kopelev Founder of Cogito Coach. Builds the AI Fluency course used by L&D and risk teams in regulated industries to turn AI consumers into AI evaluators.

Show your auditor a competency report, not a completion certificate.

Cogito's AI Fluency course puts your team through industry-specific scenarios and produces a Coaching Session Report after every session. The artifact your audit team can read.

Schedule an enterprise demo

One practical thinking tip every week.

Takes 2 minutes to read. No spam, no fluff.

← Back to Blog