How to Build an Exchange: Sub Millisecond Response Times and 24/7 Uptimes in the Cloud

Frank Yu, Director of Engineering at Coinbase specializing in exchange platforms, recently detailed the intricate architectural and operational philosophies behind constructing and maintaining a high-performance, fault-tolerant financial exchange in the cloud. Speaking at InfoQ, Yu underscored the immense pressures and unique requirements of exchange infrastructure, where the margin for error is virtually nonexistent, and the potential for financial loss far outweighs transactional revenue. His insights, born from years of experience building and scaling such systems, from inception to optimizing high-volume operations, represent Coinbase’s current thinking on achieving sub-millisecond response times and continuous availability in a demanding global market.
The Foundational Role of Financial Exchanges
At its core, an exchange serves as critical financial infrastructure, a marketplace where participants can submit orders to buy or sell assets, discover real-time prices, and execute trades. This seemingly simple function masks a profound complexity, as exchanges act as trusted third parties, maintaining the integrity of markets and ensuring fair, orderly, and transparent price formation. For countless individuals and institutions engaged in finance, exchanges are the indispensable nexus for valuing assets and facilitating transactions.
The stakes are astronomically high. Yu starkly illustrated this, noting that potential losses from a system failure could be "multiple orders of magnitude more than any revenue we might get from any transaction." This inherent risk mandates an obsessive focus on reliability, leading market participants to effectively outsource their reliability concerns to the exchange, assuming its continuous operation. Consequently, exchanges are not merely maintaining internal state but are also safeguarding the financial positions of all their clients, often across vast and interconnected global markets.
Imperatives: Correctness, Fairness, Availability, and Regulatory Scrutiny
Building an exchange that withstands the rigors of modern finance necessitates adherence to several non-negotiable principles:
- Correctness: This is paramount. An exchange must accurately record every order, every trade, and every account balance without error. Data integrity is the bedrock upon which trust is built.
- Fairness: All participants must have equal opportunity for performance. Any system bias that advantages one group over another will inevitably drive away disadvantaged traders, eroding market liquidity and efficiency. The goal is a level playing field where price, time, and priority dictate order execution.
- Availability: Markets operate 24/7 in the digital age, particularly in the cryptocurrency space. The inability to execute a trade during a volatile market swing – for instance, trying to sell a depreciating asset to manage risk – can lead directly to financial ruin for users. Unplanned downtime is not merely an inconvenience; it is a catastrophe.
- Regulatory Compliance: Exchanges are under intense regulatory scrutiny. Authorities frequently demand precise historical market states, down to the microsecond, spanning years. This necessitates robust data persistence and the ability to reconstruct tick-by-tick market activity for audit and forensic analysis. Meeting these demands often requires storing vast quantities of immutable data with unparalleled precision.
- Consistency and Predictability: Market participants, especially algorithmic traders, build sophisticated models that factor in exchange variance and latency. Inconsistent performance, particularly "tail latencies" (P99s, P99.9s), can lead to unexpected delays and significant financial losses. Therefore, an exchange must strive for flat latency profiles across all percentiles, ensuring predictable behavior.
These demands necessitate an architectural approach that is both robust and highly optimized. Yu’s presentation highlighted that traditional financial institutions often update their core exchange technology on quarterly or even slower cycles, contrasting sharply with Coinbase’s ability to deploy updates weekly or even more frequently. This agility is a significant competitive advantage in rapidly evolving markets like cryptocurrency.
Architectural Philosophy: Embracing Determinism and Simplicity
Coinbase’s strategy for building a resilient and high-performance exchange centers on an engineering ideal: simplicity, particularly through determinism. Unlike many distributed systems that rely on complex parallelism to scale, Coinbase’s core matching engine adopts a single-threaded approach.
The rationale is clear: if the same inputs consistently produce the same outputs, the system is deterministic. This property is invaluable for debugging, auditing, and ensuring absolute correctness. By processing all orders sequentially in a single thread, the system avoids the complexities and non-determinism inherent in concurrent execution, which can make historical state reconstruction a "nightmare" when trying to ascertain market conditions from years past.
This single-threaded model means that scalability is directly proportional to the performance of this "hot path." The faster the single thread runs, the more orders per second the system can process. This simplification directs all optimization efforts to a single, critical component, streamlining development and performance tuning.
Achieving High Availability and Durability with Consensus
While a single-threaded core ensures determinism, it introduces a potential vulnerability: what happens if that single machine fails? Coinbase addresses this with a sophisticated, yet streamlined, approach to durability and availability, primarily leveraging Raft consensus.
Instead of relying on a traditional database in the hot path, which introduces blocking operations, network overhead, and potential jitter, Coinbase’s matching engine operates as a Raft cluster. Raft is a consensus algorithm that ensures all nodes in a distributed system agree on the same sequence of operations, effectively replicating the state machine.
Coinbase prefers a five-node Raft cluster over a three-node one. In a three-node cluster, the failure of just one machine can compromise quorum, leading to a loss of replication capability. With five nodes, the system can tolerate the failure of two machines before losing quorum, significantly enhancing resilience. This choice reflects a deep understanding of operational realities in cloud environments, where hardware failures, though rare, are an inevitability.
When a request comes in, the system ensures that a quorum (e.g., three out of five nodes) has received and logged the request before processing it. This "replicate then process" model ensures that even if the primary (leader) node fails, the data is not lost, and a new leader can be elected to continue processing from the last agreed-upon state. While a client might experience a brief outage (e.g., a 500 error) during a leader election, no data is lost, upholding the critical persistence guarantee.
This architecture enables seamless rolling deployments. Code updates can be pushed by gracefully shutting down and restarting individual nodes one by one. The system can transition leaders, allowing for blue-green deployments with effectively zero downtime. This continuous deployment capability is particularly vital for crypto markets, which never close, eliminating the need for disruptive maintenance windows that plague traditional exchanges. Such agility ensures markets remain liquid and accessible around the clock, preventing financial discontinuities and managing user risk effectively.
System Layout: The Coinbase International Exchange Example
The Coinbase International Exchange, for instance, operates its matching engine in a single cloud region (e.g., Tokyo), with the Raft cluster forming the core. API gateways, designed to be stateless with respect to business logic (though handling concerns like rate limiting), funnel client requests into this core.
The system relies on asynchronous, message-passing communication. Input messages (e.g., an order to buy Bitcoin) are small, while the resulting output messages (e.g., order confirmations, multiple trade events to various participants) can be voluminous. This disparity poses a challenge for cross-region data replication, as egress costs and network latency for large output streams can be prohibitive.
Leveraging determinism, Coinbase employs an ingenious solution: instead of replicating the output events downstream, it replicates the much smaller input request log. Downstream systems (e.g., for analytics, risk management, or client query services) can then deterministically re-run the matching logic based on the replicated input log. This "replay" mechanism eliminates the need to transmit large event streams over networks, drastically reducing egress costs and improving efficiency. This is akin to replicating a SQL query rather than the thousands of write-ahead log entries it might generate, then locally re-executing it.
Furthermore, copies of the matching engine logic can run locally to gateways or in other regions, acting as strongly consistent replicas that stream updates from the primary Raft cluster. These replicas can serve low-latency queries directly from memory, often within microseconds, without burdening the primary writer or incurring the latency of a round-trip to the master. This distributed query capability also extends to edge computing for less latency-sensitive analytics or back-office functions.
Disaster recovery (DR) is "by construction" in this architecture. Should an entire region fail, the core logic can be promoted in a pre-provisioned secondary region by cutting the replication link and establishing a new leader. While an automated DR switchover is technically feasible, the human element of verification and approval often means that the operational decision is the slowest part of the process.
Optimizing the Hot Path: Engineering for Microseconds
Achieving sub-millisecond response times requires relentless optimization of the hot path, focusing on eliminating any source of blocking, latency, or unpredictable behavior.
- Proximity of Core Components: Counterintuitively for some, Coinbase advocates placing all core Raft cluster nodes within the same Availability Zone (AZ). While multi-AZ deployments are common for high availability, spreading the Raft cluster across AZs introduces inter-AZ network latency (often 3-4 milliseconds RTT). For a system where every message needs quorum, this means every transaction would incur this latency, making it "an outage every message." Instead, the strategy prioritizes ultra-low latency within the core cluster, relying on regional DR for broader outages. Performance-sensitive clients are expected to co-locate their systems nearby, and DR exercises are conducted collaboratively with them.
- Efficient Data Ingress and Egress: Arbitrarily nested, variable-length data structures are avoided. Instead, simple binary encoding with fixed-length fields is used for internal messages. This allows CPUs to quickly parse messages by direct offset (e.g., "6 bytes in is the message type"), bypassing complex (re)marshaling. Snowflake IDs (64-bit, globally unique, time-sortable) are preferred over UUIDs for efficiency.
- JVM Optimization: Despite common perceptions of Java’s "slowness" due to garbage collection (GC), the JVM can be highly performant. The key is to eliminate memory allocations in the hot path. By pre-provisioning all necessary data structures and using off-heap memory (e.g., SBE structs in off-heap maps), the system avoids triggering GC pauses, even with concurrent collectors. This ensures consistent, low-latency execution.
- Operating System and Hardware Control: To prevent the operating system from unpredictably pausing the critical single thread, it is "pinned" to a dedicated CPU core. All interrupts are moved off this core, allowing the thread to busy-spin in a tight loop. If there’s new work, it’s processed immediately; otherwise, it continues spinning, never relinquishing its compute power. This guarantees maximum responsiveness.
- Algorithmic Efficiency: Unbounded loops, which can lead to pathological tail latencies, are strictly avoided. Instead, patterns common in database systems are adopted: indexed access (HashMaps) rather than full scans, and pagination for any large operations to break them into smaller, manageable chunks. This prevents any single operation from monopolizing the thread.
- Client Management: To protect the core logic, aggressive rate limiting is implemented at the gateways. Users are not incentivized to send economically irrelevant transactions, reducing "junk data" that consumes valuable processing cycles. As Yu succinctly put it, "The best optimization for a transaction is one that doesn’t need to happen."
Production Outcomes and Strategic Advantages
The rigorous application of these principles has yielded significant results for Coinbase. The system can handle six-figure transactions per second without issues, consistently maintaining P99 response times under a millisecond for customers. The ability to deploy code changes with minimal downtime, far surpassing traditional waterfall release cycles, provides unparalleled agility in a dynamic market.
One of the most powerful benefits of this deterministic, log-based architecture is the ability to fully reproduce functional issues. Because the entire memory state fits on a single machine, engineers can download production request logs, replay them on a laptop, and debug the production logic step-by-step in a debugger. This capability is invaluable for quickly identifying and rectifying strange behaviors or pathological performance issues, dramatically reducing the time to resolution.
Moreover, the deterministic request log can be streamed to other environments, allowing for "pre-production testing of rolling deployments" and "configuration changes" using live production streaming load. This "superpower" enables complex topology changes and experiments to be run with real-world data and scale, a crucial advantage in the evolving landscape of cloud infrastructure.
Broader Implications and the Engineering Ideal
Coinbase’s approach to building its exchange in the cloud demonstrates the increasing maturity of cloud platforms for even the most demanding financial workloads. It challenges traditional notions of high-performance computing, proving that with meticulous engineering, cloud environments can meet or even exceed the capabilities of on-premise solutions, particularly in terms of agility and resilience.
The core message remains one of simplicity. By distilling the complex requirements of an exchange into a simple, deterministic, single-threaded core, and then applying rigorous optimization and robust consensus mechanisms, Coinbase achieves superior stability, performance, and deployability. This philosophy allows for the removal of unnecessary complexity from the architecture, often revealing "easy 10X opportunities" for improvement. In an industry defined by speed, precision, and trust, simplicity, paradoxically, becomes the most sophisticated engineering strategy.







