Every tax leader believes their processes are solid. You have controls. You have review procedures. Your team is experienced. When someone pitches AI that's "only" 93% accurate, you probably think: "We're already doing better than that."
Are you?
Field studies of spreadsheet-based tax calculations found material errors in 24% to 90% of complex spreadsheets. Not typos. Material errors that affect outcomes. Two experienced tax experts reviewing the same ambiguous scenario regularly reach different conclusions. Manual data entry tasks consistently show high error rates. And that's before accounting for what you never check at all.
The uncomfortable truth is that most tax functions have no idea what their actual error rate is. They just know they haven't been caught yet.
This is the final piece in our series on preparing indirect tax leaders for AI. We've explored why tax systems can't be both perfectly certain and infinitely adaptable, why AI's occasional mistakes might improve overall compliance, and where AI belongs in your tax stack. Now we close with the question that matters most: when you actually measure error rates, does AI make things better or worse?
The data might surprise you.
Human tax processes fail more often than anyone wants to admit
A practical way to assess the promise of AI in indirect tax is to compare error rates: how often do humans err versus AI, and in what ways?
Human error in tax compliance is non-trivial. In complex in-house tax functions, mistakes happen regularly. Coding an item wrong. Missing a filing deadline. Misinterpreting an exemption. These aren't deliberate evasion. They're just human slip-ups.
Studies in accounting have shown that manual data tasks have high error rates. Field audits of operational spreadsheets (common in tax workflows for calculations) found errors in 24% of spreadsheets in one study. Other research indicates over half of large spreadsheets contain material mistakes. These errors range from simple typos to logic flaws, and they can lead to incorrect tax calculations.
Human reviewers in audit or compliance also exhibit fatigue and inconsistency. A person might catch an issue on page 1 of an invoice stack but overlook a similar issue on page 101 after hours of work.
Common tax-specific errors include misclassified product taxability (especially when dealing with thousands of products), errors in aggregating data for returns, or accidentally excluding a data file when compiling a VAT return. With e-invoicing and digital transaction reporting becoming the norm worldwide, discrepancies between reported transaction data and periodic VAT return filing data are becoming more visible. Many large enterprise tax teams are questioning their data reconciliation quality and processes.
None of these represent poor performers. They represent normal humans doing complex work under time pressure.
AI catches different errors but introduces its own
AI tends to be "wrong" in different ways than humans.
In the auditing field, a survey of auditors highlighted that AI is perceived to improve accuracy and reduce the chance of human mistakes in auditing processes. AI can check every entry and cross-verify patterns, something humans can't do at scale. Another study noted that AI-based analysis led to a "considerable fall in the number of misstatements" in audits.
In tax, we see similar patterns. AI can cross-check every transaction against tax rules and historical patterns, potentially catching things a human would miss. An AI might notice that one invoice out of 5,000 from a vendor lacked VAT, whereas a human might overlook that single case without an obvious cue.
On the flip side, AI introduces errors of its own, primarily false positives (flagging something as wrong when it isn't) or false negatives (missing something). If configured well, AI false positives can be managed by filters and thresholds so you're not chasing trivial issues.
We also have to consider qualitative differences. Humans might not flag an issue because they don't see it or they assume it's fine. AI might flag it because it doesn't make assumptions, which is good, except when the issue truly is fine. But false alarms are arguably preferable to misses in compliance, as long as you can review them efficiently.
It's like a smoke detector. Better it occasionally goes off when there's just toast burning than it never goes off at all. The key is tuning AI so its sensitivity is appropriate and doesn't create alert fatigue.
Coverage matters as much as accuracy per transaction
Let's make this concrete with a comparison.
Suppose a human tax analyst reviewing transactions manually has a 97% accuracy rate on each transaction they review (pretty high), but they only review 10% of all transactions due to time. Overall, perhaps 9.7% of all transactions were checked correctly and 90% went unchecked, potentially containing undetected errors.
An AI system might correctly auto-classify 93% of all transactions, review 100% of them, and flag 7% as uncertain. If humans then correct those 7%, the net accuracy approaches 100% on 100% of data. Far better coverage.
Even if those numbers vary, the concept holds. AI can dramatically increase coverage (scope of analysis) even if per-item accuracy is slightly lower initially. The total errors left in the system can be lower than in a traditional process.
This is essentially a precision versus recall trade-off. Rule-based and human methods deliver high precision but low recall. AI methods offer lower precision but high recall. In compliance, high recall (catch all potential issues) is very valuable, so long as you have processes to handle the lower precision through review workflows.
Real benchmarks show AI reaching human-level performance
There have been some real benchmarks in related domains. In tax document classification, a well-trained ML model might achieve 95% accuracy where humans were at 90%. In reading legal texts to answer tax questions, an experiment found an LLM got 85% of questions correct without help, whereas a human tax junior got 80%.
Interestingly, the mistakes each made were in different areas, meaning a human plus AI team got nearly 100%.
In an EY study, they noted that new generative AI models could produce summaries of tax law changes in seconds which are accurate enough to act on, but still require verification. The errors were typically minor omissions or the need for clarifications, not wholly wrong interpretations.
This suggests AI's quality is reaching a point where it's comparable to a knowledgeable human in many tasks, and often faster. But collaboration yields the best result.
The Journal of Accountancy reported that AI tools allowed auditors to analyze entire populations of transactions and found that doing so increased audit quality and confidence. One Big Four firm found that using AI in contract review cut the error rate in identifying relevant tax clauses significantly, as the AI didn't overlook items due to boredom or time constraints. The human auditors provided the final judgment on those clauses.
AI might initially appear to increase your error rate (that's actually good)
From a leadership perspective, this means tax leaders should adjust their expectations.
Historically, a "perfect" tax compliance process meant no known errors but with the tacit acceptance that we might not catch everything. In the AI paradigm, a "better" process might surface a handful of errors (AI finds them, so we become aware of them), which can be unsettling. "Our new system keeps finding issues!"
But in reality this is an improvement because those issues always existed. You just didn't see them before.
So initially, AI might make it seem like error rates went up, because it's finding more errors. That's a positive outcome for compliance health. Over time, as those errors are fixed and fed back into the system, quality improves further.
Benchmarks in continuous learning show that AI models can incrementally improve with retraining, whereas human error tends to be static or can even worsen with fatigue.
Humans and AI err differently, together they reduce total errors
Studies on human versus AI decision quality often conclude that a well-designed AI plus human oversight outperforms either alone. The key is to get that synergy.
Humans still outperform AI in understanding context, handling novel one-off scenarios, and exercising judgment on qualitative factors. AI outperforms humans in speed, consistency, and data breadth.
Therefore, error rates alone don't tell the full story. It's about error types. AI might never mis-key a number (a common human error), but it might misinterpret an unprecedented scenario. A human would catch a glaringly absurd output (like an AI claiming a VAT rate is 42% when that doesn't exist) but might fail to notice a subtle pattern that AI would catch.
One Big Four firm's experience illustrates this perfectly. Using AI in contract review, they found it caught tax clauses that human reviewers missed due to volume and fatigue. But humans provided the final judgment on whether those clauses actually mattered. Neither could have achieved the result alone.
Measure whether your tax function is making fewer mistakes with AI in the mix
Tax leaders should pivot their mindset from "can the AI get each transaction exactly right?" to "is our tax function making fewer mistakes and catching more issues with AI in the mix?"
If the latter is true (and evidence so far suggests it is, when implemented carefully), then the probabilistic approach is proving its worth.
It requires humility to accept that no system (human or AI) will be flawless, and wisdom to design processes that achieve continuous improvement. Over time, error rates, whether human or AI, should decline as the system learns and as controls tighten.
If you track metrics like "post-filing adjustments due to errors" or "tax audit findings," an AI-augmented tax function should aim to reduce those, even if internally it had to catch and fix more issues pre-filing.
For example, an indirect tax review aided by AI anomaly detection spotted patterns of underreported VAT in certain branches that the rules-based monitoring failed to notice. This resulted in a fix that saved penalties down the line, even though the AI also raised some false alarms that took time to clear.
Key takeaway: Stop chasing perfect transactions, start building better systems
Perfect accuracy on every transaction is a false goal. It's not achievable with humans, rules, or AI.
The question isn't whether AI makes occasional mistakes. Of course it does. So do humans. So do rule-based systems.
The question is whether your overall compliance posture improves. Whether you catch more errors, cover more ground, and free your team to focus on judgment calls instead of data entry. Whether you can analyze 100% of transactions instead of sampling 10%. Whether the errors that do occur get found and fixed faster.
Humans and AI err differently, but together they can push error rates down to levels neither could achieve alone. That's not a compromise. That's the point.












