Origin Story: From NYSE Pillar to a Streaming Operating System

Introductions

George: Hi everyone, I'm George Levin, co-founder and Chief Business Officer at AlgoX2, a data streaming operating system. In this podcast series, I'm talking with our CEO and the creator of AlgoX2, Alexei Lebedev. My goal is to understand the technology we're building and the thinking behind it. Hi, Alexei.

Alexei: How you doing?

George: I'm good. What's up in New Jersey?

Alexei: We're digging ourselves out of all the snow that has been bestowed upon us.

George: I feel you — same here in New York. We're just a few miles away from each other. I'm in our New York office, and Alexei is at home in New Jersey, looking at squirrels. Let's start with the story before AlgoX2. I know that you and Vladimir Parizhsky — our CTO and third co-founder — built the trading technology behind the New York Stock Exchange. Tell us that story.

Before the Exchange: Ten Years of High-Frequency Trading

Alexei: The story actually goes further back than that. For about 10 years before I decided to write a financial exchange platform, I was doing high-frequency trading. Those were the innocent years — before HFT even had a name. You'd basically just connect your computer and try to get things flowing faster.

That's where I first experienced the mess and chaos of distributed computing. If you have servers in several data centers, the state information lives on those servers, and you want it in one place. How many shares of a stock do you own? You may have traded it in a data center in Carteret and sold it in a data center in Mahwah, but you're not aware of it yet — the information arrives at different servers at different times.

You end up seeing events out of order. It's almost like relativity, where two observers see simultaneous events occurring in different sequences. That was traumatic for someone trying to build a coherent system. You're forced to use many small components — they need to be colocated with exchanges, far apart from each other, to be fast — but you want a unified whole. There's constant tension between the parts and the single narrative around them.

Why Build an Exchange?

Alexei: An exchange is fundamentally different from HFT. The thing that's unique about it is that it has to be very good at not losing information.

One of the original reasons we even decided to build an exchange was that there was an outage on one of the exchanges, and when we reconnected we couldn't get the list of trades that had actually happened. Later we received an Excel spreadsheet from the company containing that list of trades. That got us thinking: how is it possible that they have those trades but can't give them to us through the FIX gateway?

We realized they must have an architecture where data becomes inaccessible when certain failures happen. And we said: if this is running in production somewhere in the US, then we have to fix this. We have to write an exchange.

Journals: Architecture for Not Losing Data

George: That's a bold idea.

Alexei: The New York Stock Exchange runs on an architecture based on sequenced journals. People don't really publish books on how to build exchange architectures, but in the file systems world — where data loss has long been a dragon that everybody tries to slay — it's been known for tens of years that if you don't want to lose something, you use a journal. The original NTFS that came with Microsoft Windows was journal-based. ZFS, arguably the best file system today, uses a journal. The transaction engine we wrote for the New York Stock Exchange is also based on journals.

Correctness at the Speed of the Exchange

George: It sounds crazy that the New York Stock Exchange — a very established organization — is buying new technology. How did you convince them, and how did it feel?

Alexei: If you're not honest with yourself and you introduce a bug hoping it'll never trigger, and it does — that feeling is not very good. You really have to get to a level of confidence where you can go to sleep knowing you're not going to flip a bit from zero to one in the wrong place and cause someone's balance to go up by an order of a million. Depending on where that bit is, it could add a penny — or a billion dollars — to an account.

We're engineers. We understand our limitations, and no one is infallible. So we used a formal method. We know how to write fast code, but the more you try to make it fast, the more error-prone it becomes — by definition you're taking shortcuts. How do you reconcile that tension between wanting to write high-performance code and having to be correct?

You have to be the fastest exchange, because only the fastest one gets the orders. The unambiguous answer in our minds was: we have to build a code generator that emits code that is correct by construction. We did that. We open-sourced it. When we sold our technology to NYSE, we left a copy in open source.

Later, when I was leaving the New York Stock Exchange, their CTO Mayur Kapani — a very nice guy — said: let's open-source the improvements NYSE made to this code generator, so the two lines don't diverge and can be easily supported. And we did that.

From OpenACR to AlgoX2

Alexei: Today's startup — our X2 platform — is being built with the help of the same code generator. It's an open-source system that takes relational tables and translates them into C++ that doesn't have errors. I checked this morning: it generates about 1.6 million lines of code in the X2 codebase. In a couple of years, it will be several million. The further you go, the more dividends using a formal method like that pays.

George: There will definitely be another episode just about OpenACR — I'm fascinated and want to understand better how it works. Let's get back to the origin story. You had one of the most demanding data challenges running on your architecture. What inspired you to leave and start AlgoX2?

Alexei: There comes a time in a software architect's life when you look at all the code you've written, see how much of it is getting into production, and realize that your job is done. Your job is done when you're not adding to the production codebase anymore — because you've really done it. Once you spend a year or a year and a half in that mode, you realize: either I sit there and watch my 401k grow, or I go and do something more useful.

George: We know a lot of people who prioritize the 401k — no judgment. Let's talk about AlgoX2. You often describe it as a data streaming operating system, as message-oriented middleware with streaming platform APIs, as a transactional platform, as an append-only file system. So — what is it, big picture?

What is AlgoX2?

Alexei: It's a data streaming platform. It takes connections from the outside — producers and consumers. Producers send messages that are saved to stream files inside. Consumers subscribe to messages by indicating which range they want to read, and the system serves the messages out. That's simple to explain.

Where does the complexity come in? First, you have to do this at very high message rates. If producers are sending several gigabytes per second and consumers can't keep up and are reading older messages, your system should not backpressure the producers. It should be the consumer's problem to keep up.

Similarly, consumers can be very fast. One important concept is fan-out: how many times a message leaves the system. You could publish a message once and have a fan-out of ten — ten people reading it. If the system can't properly load-balance consumers and deliver that message to all ten of them, all of them fall behind, even though they're actually really fast. There are a lot of data-flow challenges packed into the simple formulation: publish messages, decouple producers and consumers, let them independently come and go.

Why "Operating System"?

Alexei: The reason we call it an operating system is that essentially all problems involving more than one computer can be separated into two layers.

Layer one takes unordered events and produces some ordered sequence out of them.
Layer two takes that ordered sequence and deterministically interprets it to produce a result.

This has been done very successfully — in exchanges, and in "run-to-completion" databases, which are very stable and have nice properties. First they collect all the inputs and order them. They form a journal out of the inputs. That journal determines how the transactions will run, and the transactions are executed to completion.

Once you form that journal, you can spread it across any number of nodes and have them execute in parallel without ever communicating. They've been externally synchronized. You can predict what conflicts or rollbacks will occur just by looking at the journal.

The blockchain is a famous example of a ledger — a journal. Every generation picks a new fancy word for it. Blockchain is famously decentralized, but in fact it's actually centralized — and it's the fact that it's centralized that allows the subsequent decentralized participants to do their thing. It's way more centralized than a financial system consisting of just two banks, because those banks operate independently: you can never know participant A's balance if you're participant B, because A is executing its own transactions without notifying you.

But in Bitcoin, every single transaction has to go through the same place. So it is in fact centralized — and that's a good property. Events from all over the world are brought into a single place, put into a sequence, and then everything else happens. That's layer number one: sequence things and produce a deterministic order.

Here's another way to look at it: the number of ways to interleave n events is n factorial. Even 52! — the number of shuffles of a deck of cards — is greater than the number of atoms in the universe. For n = a billion (which is a very small number in streaming — a billion messages is nothing), the benefits of a single ordering far, far outweigh any drawbacks like bottlenecks.

Layer Two: A Control Plane for Your Servers

Alexei: Layer two is where you start executing functions as a result of that ordered stream. A data streaming platform is a great control plane — you can send commands to a stream, have subscribers subscribe and execute the commands as they're read. That's essentially a way to control a group of computers. You think of the data streaming platform as a network overlay for a number of servers, turning them into one with a single tool.

Today, with so much AI going on, nothing fits one computer anymore. Everything is a cluster. Data flows are very fat, with lots of events. Even inference — when you're talking to a chatbot, you're talking to a whole rack of servers, because it doesn't fit a single server.

It is really fundamental to be able to turn that collection of servers — whether it's 72 or 300 or even two — into a single logical entity and to talk to it in a simple way. Believe it or not, we really don't have a lot of tools in this area. The streaming platforms of today are largely concerned with just getting messages into some log form, into some journal form, and serving them out eventually — we're talking latencies of thousands of milliseconds and more.

To be usable as a control plane, you want the system to be as interactive as a computer. It has to operate at the sub-microsecond level, it has to do millions of messages per second, and the latency tails have to be very predictably low.

Wrap-Up

George: Thanks a lot. That was our first episode — we shared a bigger picture, and in the next episode we'll talk about the details of our architecture: how it works, and what's special about it. Thanks, Alexei, and see you soon.

Alexei: My pleasure.

Want more context on the team and the technology? Read our company story, or get in touch with us on the contact page.

00:00 Introductions

01:02 Before the Exchange: Ten Years of High-Frequency Trading

02:51 Why Build an Exchange?

04:44 Journals: Architecture for Not Losing Data

05:19 Correctness at the Speed of the Exchange

07:59 From OpenACR to AlgoX2

09:34 What is AlgoX2?

11:05 Why "Operating System"?

14:38 Layer Two: A Control Plane for Your Servers

16:51 Wrap-Up