Skip to main content
Back to Insights
Architecture12 min read

Anatomy of a Financial Operating System

Three layers, three failure domains. How the separation of ledger engine, orchestration, and domain logic creates a system that regulators can audit.

The Architecture of a Financial Operating System


A typical core banking system handles accounts, payments, compliance, reporting, and product configuration in a single application. One codebase. One database. One deployment unit. When the rewards engine has a bug, the deployment that fixes it also touches the payment processing path. When the compliance team needs a new screening rule, the release includes unrelated changes to the fee calculator.

This is not a hypothetical. It is the architecture of most systems built in the last two decades, and the reason banks routinely spend three to five years on core transformation projects that frequently stall, overrun, or get shelved entirely.

The alternative is not a microservice explosion. It is a deliberate separation into three layers, each with a distinct failure domain, a distinct change frequency, and a distinct correctness requirement.

Three Layers, Three Failure Domains

The insight behind what ThoughtWorks calls "Coreless Banking" is straightforward: not everything in a banking system carries the same risk. The ledger must never be wrong. The workflow engine must never lose state. The product logic changes every sprint. Treating all three with the same deployment cadence, the same technology choices, and the same testing rigor is either wasteful (over-engineering the product layer) or dangerous (under-engineering the ledger).

Architect's Note: Direct integration with clearing houses requires exact message sequences. Fernel manages pacs.008 origination and camt.054 settlement autonomously.

Layer 1: The Ledger Engine

Responsibility: double-entry invariants, balance integrity, immutability. This is the accounting record. It answers one question: what is the authoritative state of every account?

Properties:

  • Strict serializability. Concurrent transfers to the same account produce the same result regardless of timing. No anomalies. No caveats.
  • Immutable append-only. No UPDATE, no DELETE. Corrections are new entries referencing the original. The complete history is always preserved.
  • Engine-enforced invariants. A transfer that doesn't balance (sum of debits ≠ sum of credits) is rejected by the engine, not by application code. This eliminates money creation and destruction as a category of bug.

Change frequency: almost never. The ledger engine is infrastructure. You upgrade it for performance or security patches, not for business logic changes.

Failure impact: catastrophic. If the ledger is corrupted, the financial record is unreliable. Every downstream system, reconciliation, reporting, compliance, inherits the corruption.

Layer 2: The Orchestration Engine

Responsibility: coordinating multi-step processes. Account opening (create entity → provision accounts → assign IBAN → trigger KYC). Payment execution (validate → screen → debit → submit to clearing → track settlement). These are not single operations, they are workflows that span multiple services and may take seconds, minutes, or days to complete.

Properties:

  • Durable execution. Every step is journaled before execution. If the process is interrupted, crash, timeout, provider outage, the engine replays the journal and resumes at the exact point of failure. No duplicate side effects. No skipped steps.
  • Exactly-once semantics. The journal guarantees that each step executes once, even across restarts. A property of the execution model, not application-level idempotency.
  • Built-in audit trail. The journal IS the audit trail. Every workflow has a correlation ID. Every step is logged with inputs, outputs, and timing. An auditor can reconstruct the complete lifecycle of any operation from a single identifier.

Change frequency: rarely. New workflow types are added when new capabilities are introduced (a new payment scheme, a new compliance check), but the engine itself is stable.

Failure impact: degraded. If the orchestration engine goes down, in-flight processes stall, but the data is safe. The ledger is unaffected. When the engine recovers, it replays from the journal and completes the stalled processes.

Layer 3: The Domain Layer

Responsibility: everything that makes the product a product. Fee structures. Compliance rules. KYC provider integration. Onboarding flows. Partner API adapters. Reporting templates. This is where the business differentiates.

Properties:

  • Configuration as Code. Product rules expressed as versioned configuration, not embedded in application code. A new fee structure is a config deployment, not a code release.
  • Provider abstraction. KYC provider, payment scheme, AML screening service, all behind standardized interfaces. The domain layer knows it needs a KYC check; it does not know (or care) whether Identomat, Onfido, or a manual process fulfills it. Switching providers is a configuration change.
  • Frequent releases. This layer changes every sprint. New features, adjusted rules, new integrations. The deployment cadence is fast because failures here are contained.

Change frequency: high. This is the product. It evolves constantly.

Failure impact: contained. A bug in the reward calculation breaks rewards. It does not corrupt the ledger. It does not stall payments. The blast radius is bounded by the service boundary.

LayerResponsibilityFailure impactChange frequency
Ledger EngineBalance integrity, double-entry, immutabilityCatastrophic, data corruptionAlmost never
OrchestrationMulti-step coordination, compensation, journalDegraded, processes stall, data safeRarely
Domain LayerProduct rules, compliance, integrations, APIContained, specific feature breaksEvery sprint

Why "Just Use a Queue" Fails for Financial Processes

The instinct, when coordinating multi-step processes, is to reach for a message queue. Step 1 publishes an event. Step 2 subscribes, processes, publishes the next event. Compensation is another event published on failure.

This works for notification systems, analytics pipelines, and cache invalidation. It does not work well for financial processes, for a specific reason: the "process" is implicit. No single component knows the current state of the end-to-end flow. Debugging requires forensic analysis across multiple consumer logs. And the coordination itself, the ordering of steps, the handling of failures, the decision to compensate, is scattered across independent consumers.

Consider a SEPA Credit Transfer:

  1. Validate IBAN and amount
  2. AML screening (external provider call)
  3. Debit sender account (ledger operation)
  4. Submit to clearing network
  5. Track settlement status

Step 3 debits the sender's account. Step 4 fails, the clearing network rejects the payment. The system must reverse the debit from step 3.

With a queue-based approach: publish a compensation event. A consumer picks it up and reverses the debit. But what if the compensation message is lost? What if the consumer crashes before processing it? What if the reversal itself fails? Each failure mode requires its own handling, and the handling is distributed across the system without a central record of the process state.

With a durable execution engine: step 4 fails. The engine records the failure in the journal. It executes the compensation (reverse the debit) as a journaled step. If the compensation is interrupted, it is replayed from the journal. The entire process, including the failure and the compensation, is visible in a single journal with a single correlation ID.

The difference is operational: "we think we reversed it" versus "the journal proves we reversed it, and here is the exact sequence of events."

The Write Gateway Pattern

A consequence of this architecture: all state-changing operations route through the orchestration layer. Reads bypass it.

Reads (GET)

Client / Admin UI directly to Finance Service

No orchestration overhead. Fast.

Writes (POST/PUT)

Client / Admin UI through Orchestration Engine to Finance Service

Every mutation journaled. Durable. Auditable.

The Finance Service (Zig API) holds the ledger and domain data. It is never exposed to the public internet. The Orchestration Engine is the single controlled entry point for all mutations.

Reads go directly to the finance service. No orchestration overhead. Fast.

Writes go through the orchestration engine. Every mutation is journaled. Durable. Auditable. If the engine is down, writes queue until it recovers. Data integrity is never compromised.

This pattern has a security benefit: the finance service is never exposed to the public internet. The orchestration engine is the single controlled entry point for all mutations. Attack surface is minimized by architecture, not by firewall rules alone.

Where You Differentiate

The ledger and orchestration layers are infrastructure. They are necessary, but they are not what makes your product valuable to customers. Customers choose you for:

  • The fee structure that fits their business model (percentage-based, tiered, subscription, or hybrid).
  • The compliance flow that satisfies their regulator (jurisdiction-specific KYC depth, CDD policies, AML screening providers).
  • The integration with their existing systems (ERP reconciliation, partner bank connectivity, payment scheme support).
  • The onboarding experience that converts prospects into active accounts.

All of this lives in the domain layer. It changes frequently. It is configured, not hard-coded. A new market entry requires new jurisdiction profiles and compliance policies, not a new ledger or a new workflow engine.

The architecture enables this: the infrastructure layers provide correctness guarantees and operational resilience. The domain layer provides product flexibility. The two evolve independently.

Regulatory Alignment by Design

The three-layer architecture maps directly to specific regulatory requirements. Compliance is a property of the design, not a layer added afterward.

RegulationArticleRequirementArchitectural Response
DORAArt. 11-12ICT traceability, incident management, recovery plansOrchestration journal: every mutation journaled with correlation ID. Recovery = journal replay.
DORAArt. 15Third-party risk assessment, auditable supply chainMinimal dependency chain in financial core (~30 transitive). Dependency inventory publishable.
PSD2Art. 87Value date and availability of fundsLedger engine tracks value dates natively. Settlement lifecycle (pending → available) enforced at engine level.
AMLD5/6Art. 13, 16Customer due diligence, audit trail for compliance decisionsDomain layer: CDD policy engine with versioned records. Ledger: immutable, hash-chained audit events.
HGB§239Corrections as new entries, not modificationsLedger: append-only. Corrections are Stornobuchungen referencing the original entry via foreign key.

Each regulatory requirement is satisfied by the layer designed for it. The ledger handles accounting integrity (PSD2, HGB). The orchestration engine handles traceability and recovery (DORA Art. 11-12). The domain layer handles compliance logic (AMLD). No single layer carries the full regulatory burden.

Three Questions to Evaluate Any Financial Platform

If you are evaluating a core banking platform, whether you're building one, buying one, or modernizing an existing one, three questions reveal whether the architecture is sound:

1. Can a bug in your product logic corrupt your ledger?

If the ledger and the domain layer share a database, or if the application code is responsible for enforcing double-entry invariants, the answer is yes. A single unguarded UPDATE, a missing WHERE clause in a refund handler, a race condition in a fee calculator, any of these can silently corrupt the financial record. If the layers are isolated and the ledger enforces its own invariants, the answer is no.

2. If your orchestration engine restarts mid-payment, what happens to the payment?

If the answer is "it depends on the consumer's retry logic" or "we'd need to check the dead-letter queue," you don't have durable execution. If the answer is "the engine replays the journal and resumes at step 4," you do.

3. Can you switch your KYC provider without changing your onboarding code?

If the provider's API is called directly from your onboarding flow, provider and flow are coupled. A provider change is a code change, a test cycle, and a deployment. If the onboarding flow calls a provider interface and the configuration determines which implementation fulfills it, switching is a config change.

These are not trick questions. They have binary answers. And they predict the operational cost of the system over the next five years more accurately than any feature comparison matrix.


Read more: The Ledger | Workflows, Durable Execution | Why Double-Entry is Non-Negotiable


Sources:

  • ThoughtWorks, "Kill your core: The banking revolution you didn't see coming"
  • ThoughtWorks, "Cloud-native composable banking" (AWS whitepaper)
  • DORA, Regulation (EU) 2022/2554, Art. 11-12 (ICT traceability and recovery)
  • Stripe (Temporal), Wise (Saga → Temporal migration), N26 (Kafka+Sagas), Revolut (Temporal), industry adoption of durable execution for payment-critical paths