Architecture10 min read

DORA, AI, and the Black Box Problem

Regulators auditing AI-enabled financial systems do not inspect model weights, training data distributions, or inference accuracy metrics. They inspect system topology, data flows, failure modes, and recovery procedures. An architecture that cannot be described to an auditor, one where the behavior of an AI component under failure conditions is unknown, undocumented, or untestable, fails the audit regardless of how accurate the model is in production.

Money20/20 Europe 2026's Policy20 roundtable convened more than 40 senior policymakers, regulators, and central bankers. The message was consistent: AI systems touching financial state will be audited as ICT systems, not as software products. The applicable frameworks are DORA (for operational resilience) and the EU AI Act (for risk-based AI governance). Both were in force before the roundtable. Neither is waiting for an industry readiness assessment.

Two Frameworks, One Audit Target

DORA, Regulation (EU) 2022/2554, has been enforceable since January 17, 2025. It applies to financial entities and their critical ICT third-party providers. Its core requirement is operational resilience: the ability to withstand, adapt to, and recover from ICT-related disruptions without material impact on financial services. The audit instrument is a documentation and evidence trail, not a performance test. Auditors do not ask "does your system work?" They ask "can you prove that your system behaves predictably under failure, and can you demonstrate how it recovers?"

The EU AI Act, Regulation (EU) 2024/1689, was effective from August 2024. It classifies AI systems used in financial services (credit scoring, fraud detection, insurance pricing, AML screening) as high-risk under Annex III. High-risk AI systems are subject to a conformity assessment before deployment, ongoing monitoring requirements, logging obligations, and technical documentation requirements under Art. 11 that describe the system's design, development methodology, performance characteristics, and limitations.

The two frameworks converge on one system when an AI component is part of the ICT infrastructure of a financial institution. An AI model used for AML transaction screening is simultaneously a high-risk AI system under the EU AI Act and an ICT component under DORA. The institution must satisfy both frameworks simultaneously, using overlapping but distinct documentation standards, audit methods, and governance requirements.

What the Audit Actually Covers

DORA Art. 5 requires financial entities to maintain a "digital operational resilience strategy" supported by documented policies and an ICT risk management framework. Art. 8 requires a "complete and up-to-date mapping of all ICT assets, including legacy systems and physical components." Art. 11 requires ICT-related incident management with full traceability.

For an AI-enabled system, these requirements translate to:

System topology documentation. Where does the AI component sit in the data flow? What inputs does it receive, from which systems? What decisions does it produce, and which systems consume those decisions? If the AI model is a black box in the topology, existing as a service call without documented data contracts or dependency mapping, the topology is incomplete. DORA Art. 8 requires that every ICT asset is mapped. A service call to an undocumented model endpoint is an unmapped ICT asset.

Failure mode documentation. What happens when the AI component is unavailable? Is the system designed to fail open (proceed without the AI decision) or fail closed (block the operation until the component is available)? For an AML screening AI, fail-open means transactions proceed without screening, a potential regulatory violation. Fail-closed means the institution cannot process transactions during model downtime, a potential DORA operational resilience violation. Neither default is automatically correct; both must be a documented decision with documented justification.

Recovery procedure documentation. DORA Art. 12 requires documented and tested recovery procedures for every critical ICT component. For an AI component, recovery includes: detecting model degradation (accuracy drop, latency increase, input distribution shift), rolling back to a previous model version, falling back to a rule-based system, and restoring the AI component without reprocessing events that were handled by the fallback. Each step must be documented, assigned to an owner, and tested at least annually.

Audit trail for AI decisions. DORA Art. 11 requires full traceability for ICT-related incidents. If an AI model makes a decision (flag this transaction, approve this credit application, classify this customer as high-risk) and that decision is later disputed, the institution must be able to reconstruct the exact decision: which model version was running, what input it received, what output it produced, and what action followed. An audit trail that says "the model flagged it" without the input data, the model version, and the output score fails the DORA traceability requirement.

The EU AI Act Technical Documentation Requirement

EU AI Act Art. 11 requires that providers of high-risk AI systems maintain technical documentation containing:

A general description of the AI system and its intended purpose
A description of the development methodology, including training data provenance and validation methodology
Performance metrics, including accuracy, precision, recall, and the conditions under which performance degrades
Known limitations, failure modes, and foreseeable misuse scenarios
A description of all technical measures taken to ensure robustness, cybersecurity, and accuracy across the lifecycle

This documentation must be kept up to date throughout the system's operational life. Every model update, whether retraining on new data, parameter adjustment, or architectural change, requires updating the technical documentation before the updated model is deployed.

For institutions using third-party AI models (vendor-supplied fraud detection, third-party AML screening), Art. 25 establishes shared responsibility: the deploying institution cannot simply point to the vendor's documentation and consider its obligations met. The institution must verify that the vendor's documentation is accurate, complete, and specific to the institution's deployment context. A vendor that provides generic documentation not specific to how the model behaves on the institution's transaction data does not satisfy the Art. 11 requirement.

BBVA and ABN AMRO: What Production AI Deployment Looks Like

BBVA and ABN AMRO were among the institutions cited at Money20/20 Europe 2026 as having deployed AI in production financial operations. Both institutions have published descriptions of their AI governance frameworks that illustrate what compliant deployment looks like.

BBVA's AI deployment framework includes an internal AI ethics and governance committee, mandatory model risk assessments before production deployment, and model cards (structured documentation describing each model's intended use, known limitations, and performance benchmarks) for all models used in customer-facing decisions. The model card is not a regulatory document. It is an internal governance artifact that DORA and EU AI Act documentation requirements can be satisfied from.

ABN AMRO's transaction monitoring AI systems operate with documented fallback procedures: when the AI system's confidence score falls below a defined threshold, the transaction is routed to human review rather than to an automated decision. The threshold is a governance decision, documented in the model risk assessment, and reviewed quarterly. The human review step is the fail-safe mechanism that prevents the AI from making unreviewed high-risk decisions, and it is part of the system topology documentation required by DORA Art. 8.

The pattern in both cases: the AI component is explicitly in the topology, its failure modes are documented, its decisions are loggable and auditable, and there is a defined fallback that does not require the AI to function correctly.

The Black Box Problem

A "black box" AI system, in the regulatory sense, is not necessarily one where the model internals are opaque. It is one where the system cannot answer the questions that DORA and the EU AI Act require it to answer.

A model may be internally interpretable, a gradient-boosted tree with published feature importance scores, but still constitute a black box if it is deployed in infrastructure that does not log its inputs and outputs, does not version its deployments, does not have documented failure modes, and does not have a tested recovery procedure. The interpretability of the model is irrelevant to a DORA audit. The topology, the logging, the failure documentation, and the recovery procedure are what the audit examines.

Conversely, a large language model, whose internal computations are not human-interpretable, can satisfy both DORA and the EU AI Act requirements if it is deployed with complete input/output logging, versioned deployment records, documented failure modes (including a description of how the model behaves on out-of-distribution inputs), and a tested fallback to a rule-based system when the model is unavailable.

The common failure pattern is not using an AI model with opaque internals. It is deploying any AI model, interpretable or not, in infrastructure that was built to process transactions, not to satisfy audit requirements. Transaction processing infrastructure tracks what happened to transactions. Audit-ready AI infrastructure tracks what the AI component decided, why it decided it, and what happened as a result.

What Transparent Architecture Requires

For an AI component embedded in financial infrastructure, transparent architecture means five specific properties.

Explicit data contracts at every interface. The input schema that the AI model receives is documented, versioned, and validated before the model is invoked. The output schema is documented, versioned, and validated before downstream systems consume it. Schema drift, where the model receives a field that changed format or emits a score field that changed scale, is detected before it produces incorrect decisions.

Decision logging with model version and input hash. Every AI decision is logged with: the model version identifier, a hash of the input data (not the input data itself, for data minimization), the output score, and the action taken. The log is append-only and cannot be retroactively modified. An auditor can query any decision by transaction ID and receive the exact model version and output score, even for decisions made years ago.

Defined and tested fallback procedures. The fallback is not "the system tries the AI first and handles errors." The fallback is a specific alternative decision path (a rule-based threshold, a human review queue, a conservative default action) that activates when the AI component is unavailable, when its confidence score falls below a defined threshold, or when its input fails validation. The fallback is tested quarterly, not just described in a runbook.

Model update governance. Every model update is documented before deployment: what changed, why, what performance benchmarks were run, and what the expected impact on decision distributions is. The update is deployed to a shadow environment first, where it runs against production traffic without affecting decisions. The shadow results are compared against the current model's decisions. The comparison is reviewed by the model risk function before the update goes live.

Incident response for AI-specific failure modes. AI systems fail in ways that general ICT incident response procedures do not cover: silent accuracy degradation (the model is running but producing increasingly wrong decisions), input distribution shift (production data drifts from training data, degrading performance without raising an error), and adversarial inputs (inputs deliberately crafted to produce incorrect decisions). The incident response procedure must define how each failure mode is detected and who is responsible for remediation.

Trade-offs

Transparent AI architecture as described above imposes costs.

Development overhead. Instrumenting an AI pipeline with decision logging, schema validation, fallback procedures, and model versioning adds development time. For institutions with existing AI systems not built with these properties, retrofitting is more expensive than building them in from the start.

Latency from logging and validation. Decision logging, input hashing, and output schema validation add latency to every AI invocation. For AML screening that must complete within payment processing time (sub-second for SCT Inst), this latency budget is tight. Asynchronous logging (log the decision after responding) reduces latency but introduces a window where the decision exists without a log record, a period that must be bounded and documented.

Fallback accuracy. A rule-based fallback for an AI-powered fraud detection system will produce more false positives and more false negatives than the AI system under normal conditions. That is why the AI system was built. Activating the fallback during AI downtime means accepting degraded detection accuracy for the duration. The institution must have assessed this accuracy degradation and determined it is acceptable, documented that assessment, and ensured that AML compliance obligations are still met at the fallback accuracy level.

Fernel Context

Fernel's architecture exposes explicit, documented interfaces at every layer. The orchestration engine logs every workflow step, including calls to external AI or screening components, as part of the durable execution journal. Each external call is logged with its input parameters, the response received, and the timing. A call to an AML screening service is a journaled workflow step: the input transaction data, the screening result, and the decision to proceed or flag are all in the workflow journal with a single correlation ID. Model version tracking and fallback routing, to a rule-based screener or to a human review queue, are workflow configuration, not code changes.

Sources:

Money20/20 Europe 2026, Policy20 Roundtable: 40+ senior policymakers, regulators, central bankers; AI governance in financial services
DORA, Regulation (EU) 2022/2554, Art. 5 (ICT Risk Management Strategy), Art. 8 (ICT Asset Mapping), Art. 11 (ICT Incident Management and Traceability), Art. 12 (Recovery Procedures)
EU AI Act, Regulation (EU) 2024/1689, Annex III (High-Risk AI Systems in Financial Services), Art. 9 (Risk Management), Art. 11 (Technical Documentation), Art. 25 (Shared Responsibilities), effective August 2024
BBVA AI Governance Framework: model cards, ethics committee, production deployment requirements (public disclosures 2024-2025)
ABN AMRO Transaction Monitoring AI: confidence threshold routing, human review fallback (public disclosures 2025)
DORA Regulatory Technical Standards (RTS), ESA joint publication 2024, on ICT risk management tools, methods, processes, and policies