Most sanctions screening evaluations are built on the wrong criteria.
They focus on list coverage, matching accuracy, false positive rate, customer support, and price. These are reasonable questions. They are also the reason most companies end up with a screening tool that works and a compliance process that does not.
The evaluation criteria that dominate vendor conversations address detection. They do not address what happens after detection. And what happens after detection is where the actual compliance cost, risk, and audit exposure live.
This article covers the standard criteria, explains why they are necessary but insufficient, and identifies the questions most buyers do not ask until after they have signed the contract.
The Standard Criteria
These dimensions appear in nearly every RFP and vendor comparison. They matter. Treat them as the foundation, not the ceiling.
List coverage
Which sanctions lists does the tool screen against? Most mainstream vendors cover 300 to 500 lists. The marginal value of each additional list depends on where the organisation trades. A company with significant Chinese supply chain exposure needs different coverage than one operating primarily within the EU.
Matching quality
How does the tool handle fuzzy matching, phonetic matching, and transliteration? Can it match across Latin and non-Latin scripts? Does it explain why a specific match was triggered, or does it present a name pair and a percentage with no context?
If you cannot see why a match was triggered, you cannot scale investigation. Matching transparency matters more than most evaluations acknowledge. When the analyst cannot understand the basis for an alert, they start every investigation from zero. Over thousands of alerts, that opacity compounds into significant wasted time.
Ask to see the matching output for a complex test case. Submit a transliterated Arabic or Chinese name and evaluate what the tool returns. The difference between vendors is most visible at the edges, not in straightforward Latin-script matches.
False positive rate
Every vendor will claim a low false positive rate. The number is meaningless without context.
False positive rates depend on the entity data being screened, the matching sensitivity configuration, the sanctions lists in scope, and the organisation's trade footprint. A vendor demonstrating a 5% rate on clean test data tells you nothing about what the rate will be on your production data.
False positive rate is not just a detection metric. It is a workload multiplier. Every false positive that the tool generates is an alert that an analyst must investigate, decide on, and document. The more useful question is not "what is your false positive rate" but "what tools do you provide to reduce false positives in sanctions screening once they are generated?" Threshold tuning, exclusion list management, and the ability to suppress previously cleared combinations are operational necessities.
One consistent finding across compliance teams: vendors that require a support ticket for every configuration change create a dependency that slows response times and increases cost. If the compliance team cannot adjust sensitivity or manage exclusion lists independently, that is a constraint worth understanding before choosing a provider.
Integration
Does the tool connect to the organisation's ERP, trade management, or CRM systems? Via API, batch upload, or both? For ERP environments, is there native integration, or does it require middleware?
Integration quality is more important than integration existence. A tool that offers an API but requires six months of custom development is not meaningfully integrated. A tool that connects natively but does not return results back into the source system, requiring manual status updates in the ERP, solves half the problem.
Ask specifically: does the screening result flow back into the originating system automatically? In multiple organisations, this manual return feed is where compliance gaps appear. An entity is cleared in the screening tool but not updated in SAP. The systems disagree. Nobody notices until an audit.
Screening workflow
Does the tool support onboarding screening, transactional screening (at the point of order or shipment), and ongoing (daily) monitoring (automatic rescreening when lists are updated)?
Onboarding-only screening is insufficient for most regulatory expectations. If the tool does not support transactional screening or continuous monitoring, the organisation will need to build those workflows separately. That typically means manual processes with coverage gaps.
Price
Screening tools are priced per user, per entity screened, or as platform subscriptions with volume tiers. Get clarity on what is included. List updates, API access, ongoing monitoring, and support are sometimes bundled and sometimes billed separately.
It is also the least useful differentiator. The software license is typically the smaller cost. The investigation burden behind it is the larger one. A tool that costs 20% less but generates 30% more false positives is not cheaper. It is more expensive in analyst time.
The Questions Most Buyers Do Not Ask
The criteria above will narrow the field to a shortlist of capable vendors. Most of them will perform adequately on detection. The differences that matter most in practice sit in a category that rarely appears in the selection process.
What happens after the alert?
This is the single most important question in the evaluation, and it is the one most often skipped.
When the screening tool generates an alert, what does the analyst see? Is there enough information to begin an investigation, or does the analyst have to leave the tool immediately to gather context from external sources?
Does the tool surface designation details, the relevant sanctions programme, the scope of the restriction? Does it assemble external context (corporate registry data, ownership structures, adverse media) or leave that entirely to manual research?
Most screening tools stop at the alert. The analyst opens it, reads a name pair and a match score, and spends five to thirty minutes toggling between the tool, web browsers, corporate registries, and internal systems to assemble the information needed to make a decision. That is the sanctions alert investigation process in practice: the analyst becomes the integration layer between systems.
If the answer is "the analyst investigates manually," then you are not buying a system. You are buying a workload.
How are decisions documented?
Ask to see the case management interface. Where does the analyst record their decision? Is there a structured format with required fields, or a free-text box?
Most tools provide a place to write a note, not a system to produce evidence. A text field attached to an alert record with no required structure, no source citations, and no standardised format is not documentation infrastructure. It is a box.
This matters because documentation quality is what auditors actually test. A screening tool with excellent matching and poor case management produces a program that detects well and documents badly. That is a program that can be questioned in audits.
Can the audit trail be retrieved without reconstruction?
Ask the vendor to demonstrate how you would produce the complete decision trail for a specific counterparty screened six months ago. Every alert generated, every investigation conducted, every decision made, and the reasoning behind each one.
If you have to assemble the audit trail, you do not have one.
If the answer involves exporting CSVs, searching shared drives, or pulling records from multiple systems, the tool provides alert records, not an audit trail. Under audit pressure, that reconstruction is slow, incomplete, and unconvincing.
What does the tool require from your team?
This is the total cost of ownership question, and it extends well beyond the license fee.
How much configuration does the matching logic require, and who maintains it? Can the compliance team adjust thresholds independently, or does every change require vendor support? What happens to the configuration during an ERP migration?
How much training do analysts need? For organisations where screening is handled by non-specialists (regional managers, procurement staff, sales operations), the usability threshold matters more than the feature list.
What is the vendor's support model? Response times, escalation paths, and whether configuration changes are included or billed separately are details that compound over the life of the relationship.
Does the tool improve or just maintain?
Most tools forget everything.
They perform the same function on day one as they do on day one thousand. The matching logic does not learn. Previous decisions do not inform future ones. An entity cleared as a false positive last month generates the same alert this month, requiring the same investigation.
Ask whether the tool captures structured data from previous decisions and applies it to future screening. Does it recognise recurring false positives? Does it learn from the patterns in the organisation's screening history? Or does every alert arrive as if it has never been seen before?
This is not a standard requirement today. It will be. The organisations that evaluate for it now will be ahead of the ones that discover the need later.
How to Structure the Evaluation
Most evaluations fail not because the wrong criteria are used but because the criteria are weighted incorrectly.
Detection capability is the qualifier
List coverage, matching quality, integration, and monitoring are requirements. If a tool does not meet the baseline, it is eliminated. But above the baseline, the differences between mainstream vendors on detection are incremental. Choosing between 350 lists and 400 lists, or between two mature fuzzy matching algorithms, rarely determines whether the compliance program works.
Operational effectiveness is the differentiator
What happens after the alert, how investigations are supported, how decisions are documented, and how the audit trail is maintained are the factors that determine the total cost and risk profile of the screening program. These are also the factors where vendor capabilities diverge most sharply.
Total cost of ownership is the tiebreaker
The license fee is one input. Analyst time per alert, configuration burden, vendor dependency, training requirements, and audit preparation cost are the others. A tool that costs more per seat but reduces investigation time by 50% is cheaper in practice than one with a lower license fee and no investigation support.
The evaluation should weight accordingly. Detection is pass/fail above a threshold. Operational effectiveness and total cost of ownership determine the winner.
The Evaluation Most Companies Actually Run
In practice, most evaluations are dominated by detection criteria and price. The vendor with the best matching demo and the most competitive quote wins.
This produces a predictable outcome. The company buys a capable screening tool. Alerts are generated. Analysts investigate them manually, toggling between systems, documenting decisions in spreadsheets and shared drives. The compliance team absorbs the operational burden because no one evaluated whether the tool would reduce it.
Two years later, alert volumes have grown. The team is under pressure. An audit surfaces documentation gaps.
The tool worked exactly as evaluated. The process did not.
What You Are Actually Buying
A sanctions screening tool is not a compliance program. It is one component of one. The tool handles detection. The compliance program requires detection, investigation, decision-making, documentation, and audit readiness.
When you evaluate screening software, you are not choosing a matching algorithm. You are choosing the foundation of an operational workflow that will run thousands of times per month, across multiple analysts, for years.
The best screening tool is not the one with the most lists or the lowest false positive rate on test data. It is the one that leaves your compliance team in the strongest position to investigate, decide, and defend every alert that comes through.
Detection is what you buy. Resolution is what you operate. Evaluate both.
☰