Engineering Principles: SESE, Single Static Action, and Code Generation
A long-form interview with Alexei Lebedev on the principles behind AlgoX2: single entry / single exit, single static action, single-threaded processes connected by streams, the OpenACR code generator, and what it is like to build serious systems with Claude.
Interviewer: My first question. You once described a concept called single entry, single exit. Tell me what it is, where it came from, and why you think it still matters in 2026. Maybe it doesn't matter, but it does. A computer can probably write any program for you now, but you can also tell it what style to write in, so it follows your preferences.
Alexei: Let's set aside the idea of a computer generating code for a moment. It doesn't care what it writes, in what language, or in what shape. Let's go back to the old-fashioned idea that a human writes code. If a human writes and reads and tries to understand code, then understandability, readability, and a certain aesthetic of the code do matter.
We should consciously remove everything having to do with agentic development, with Claude, with code generation, and so on. We acknowledge where we are, but for the discussion to be coherent, we're not talking about that. We're considering the purely human approach to development. The way we write code as people, the way we read it and perceive it. So: what is single entry, single exit, and why did we focus on it?
It's an old idea that originally appeared with the arrival of structured programming, as opposed to a flat list of instructions with goto. It was an attempt to lay out instructions as a hierarchical tree that visually reflects the structure of the task.
Interviewer: So we picture code as a hierarchical, tree-like, graph-like, directed structure.
Alexei: Yes, but it's actually simpler than that. A program is just some number of nested loops. Even an if can be seen as a loop from zero to one. A loop over an empty set or a single element is an if. An if is a special case of a loop.
Interviewer: That immediately reminds me of a more functional way of thinking, either recursion or running a function. If the entire program is some number of nested loops, then the program is a search over some space. Because any loop is a traversal. We walk along an axis of some space, up to 10, or over the elements of a set.
Alexei: Right. Sometimes the elements you walk are a while loop, an implicit set. Those are the most interesting loops, because it's not entirely clear why we're iterating. But they can be handled too. We don't have to go there. Either way, a program is a collection of loops. It is searching for something. It terminates when it has found or constructed it.
We conceptually move away from the calculator, where you say 3, 2, plus, you pull the handle, and Odhner's mechanical adder gives you 5, toward a more visual kind of programming, where we say: my program is a search. First I consider this direction. Inside it I recursively descend and iterate along that direction. I now have a double loop. I can sum over those loops, work out the algorithmic complexity, look for shortcuts. Can I swap the loops? Is there some trick to dramatically reduce the work? You can reason about the program: how densely it covers the space, how many steps it takes.
A program is akin to a mathematical proof. You go top-to-bottom, and at each point (and this is Dijkstra's classical work on program correctness) you can state, as concisely as possible, what is true. Dijkstra called these preconditions. If you can't say what is established at a given point, the program is poorly written, poorly designed. You walked through half of the program and it's unclear what you've achieved. The right formulation is: by this point in the program, the following facts are guaranteed. To a place in the program we attach a stage of our proof.
Interviewer: So it's a path with branches and possibly intersections, and at every moment of execution we're at some point along the path. Each is like a station: we arrive at it, we know where we are, we can go this way or that way. Is that just how you visualize it, or does it show up in the code?
Alexei: Both. When I write a program, I picture it. I have a very good sense of the space it operates in. Unless your program is a homework solution, it's almost always one cog in something larger. I'm used to working in big systems with dozens or hundreds of programs participating.
If we take the simplest possible program, a REST client, say, there's an enormous amount of side knowledge you have to bring. Things about the server, about the schema, about request rates, what you can and cannot ask. None of that is in the program itself. It lives off to the side.
The same is true of algorithms, the hardest kind of program. There's a deeper structure that the program has to fit, which is just another way of saying: when I execute these nested loops, I'm searching in some other space for a value.
So picture a program as a number of nested blocks. Stop it at some point. How do you describe where you are? It's not hard. We did ten steps, we entered a loop, we're on its 11th iteration. That position can already be labeled "11th iteration of this loop." If the loops are nested, it's the 13th iteration inside the 11th iteration. Sometimes you can label it with numbers, sometimes with strings.
If we look at a program as a nested search, and at program transformations as just moving complete blocks around, then if your program always enters at the top and exits at the bottom, it becomes self-similar. You can take any program, cut it out, and paste it as a line into another program. Self-similarity is the property that you can paste a program into itself. At every zoom level, it looks the same: step-step-step, or many steps as a step, or one step containing a loop of steps, or no steps. Empty step counts too.
A function is self-similar. A function is single entry, single exit. You enter at the top, you exit at the bottom.
Interviewer: Interesting how you visualize it. When you said "top in, bottom out," I thought of left for entry, right for exit.
Alexei: You can't visualize it that way, because if you've ever stepped through a program in a debugger, the cursor always moves down, down, down.
Interviewer: Like the meme about how people picture the seasons. Some see an oval with winter at the bottom, some a square, some a circle. I learned programming from a notebook, by writing it out. I picture program execution like the motion of a pen across a page. You have a debugger model. Different teachers, different visualizations.
Alexei: For me a program absolutely executes top to bottom. Each line expands into some number of smaller steps. In assembly each step is much simpler, but a program still runs top to bottom. You could say left to right, because the instruction pointer increments by the length of the instruction. If the X axis goes right and you place instructions at memory addresses, then yes, the processor scans them left to right. Both are true. For me a program is text, and in the western world we read text left to right, top to bottom.
The one thing I never understood is breakpoint. Why is it called a breakpoint and not a breakline? Where is the point? Between which letters? I've never seen a debugger that lets me set a breakpoint at an actual point. I set it on a line, I get there, the line has all kinds of nested calls, and I have to step through curly braces until I get into the function I actually wanted. If I could put a cursor on a point on the screen and stop the program there, that would be cool. A feature I'd like.
Interviewer: Reminding you that "cursor" originally meant the blinking thing, not the editor.
Alexei: You're talking to someone who learned to program in 1990.
Let's bring this back. Self-similarity, single entry, single exit. There's a historical detour worth mentioning: flowcharts. Those diagrams with diamonds, lines, yes/no branches. Structured programming in the classical sense was invented as a textual representation of a flowchart, laid out with indentation. If you take a flowchart and write its boxes in a specific way, you get a structured program.
So we have two properties. The first is self-similarity. One program embeds into another, step equals steps, beautifully recursive. If you ask Claude to refactor something, refactoring is much easier when the code is self-similar. He can cut and paste blocks with fewer surprises.
The second is that at every place in the program, you can attach a mental annotation: by this point, this has been accomplished. Now imagine that ten lines back, there's an odd return that jumps over this place, and after that return more text is executed. You can no longer point a finger at this place in the program and say what's true. You've tangled the flow graph.
Interviewer: So if we program in single-entry, single-exit style, mid-function returns shouldn't exist?
Alexei: They shouldn't.
Interviewer: I want to understand why. Because I have written code with early returns. Picture the self-similar program: nested levels, one entry, one exit, loops and flowcharts inside. Why does inserting a return inside, say, a loop break the concept? "I found my element, return." Why can I no longer say what's happened in the program?
Alexei: You picked exactly the example where I would also use a return. Functions split into those that change something and those that change nothing. A pure search function can legitimately stop the search the moment it finds the element and exit through any number of nested levels. Conceptually nothing changes, because it opened nothing it must close, allocated nothing it must free. A simple double loop with an early return is fine.
Now look at a function that mutates. If the loop is single, I wouldn't write a return. I'd introduce a result variable, initialize it to a sensible default, and inside the loop, if I find what I'm looking for, I assign and break. At the end I return the variable. The function looks almost like a Pascal function: the return variable is declared at the top and initialized there. Right after that declaration I can stop and say: at this point the program is already correct. There is a default. I'm about to look for something better. I've already closed a number of questions with this systematic approach.
Interviewer: Practical question. I read this as your personal authorial style. If I bring you a code review with a mid-function return, will you ask me to fix it?
Alexei: I'll fail your commit. The rule is declared. It's in CLAUDE.md. You can read it yourself. It's one of the few rules we follow.
When many people work on a big system, it is very important that they all follow one convention. If you allow democracy of chaos, no one will thank you. You won't get a skyscraper, you'll get a favela. We want a skyscraper. The beams must hit exactly the same spots. The structure has to be deliberate so the building grows up, not sideways. Of course people solve sub-tasks differently, but conceptually, when I read your code, I shouldn't be thinking "this is your code." It should just be correct. Correct, concise, understandable. If you've developed some baroque style, a signature pattern of comments, say, I'll reject it. You have to fit into the larger system and its prevailing style.
This rule applies to me too. If I'm in a playful mood in some local file, I will not introduce a new convention there. I override myself and follow the project conventions.
Interviewer: Inside me there is this engineer-aesthete who loves to watch the skyscraper rise, brick by brick, by the rules. And there is also a rebel who says, "what does it matter how I wrote the function? The task is solved. So I returned earlier. Who am I to grade someone's handwriting?"
Alexei: Let me put it this way. When Claude reads your repo, it picks up patterns. The fewer patterns it has to learn, the easier it is to produce the next one correctly. Agreed? Code might have been correct, but we're all meat computers. And Claude is approaching the power of a meat computer, with Iron Felix becoming Meat Vasya. It will also make mistakes if you feed it ambiguous patterns.
So patterns should be strict. They should suggest, they should project that strictness of form. They should be saying, "we look like this and not otherwise." Because there is a deliberate scheme there. You can be repetitive, even tedious, and there is nothing wrong with that. In the end what we want is for the 45th floor to look like the 46th. We don't want to re-prove little lemmas on the 46th floor. We want it to look identical, so induction works.
Imagine you're writing a project with very high stakes. If anything crashes, everyone gets fired and the company shuts down. Possibly bankrupts. What principles would you reach for to survive?
Interviewer: Honestly, I'd start with testing. Incredible amounts of testing: performance testing, controlling every module, every program's resource use, file handle counts at every moment, CPU at every moment, behavior at every moment.
Alexei: Before this conversation you probably wouldn't have reached for SESE. Now you might. I won't try to convince anyone. If they have another style, maybe we'll make them think, but I'm not hoping to convert anyone.
There's a related principle that is even less known than SESE, possibly entirely unknown. It's called Single Static Action. Briefly, on SESE: if you want to read about it, there is Bertrand Meyer, practically the only person who really talked about it. The topic was last raised seriously by Niklaus Wirth, who created Pascal. After that everyone dropped it. Bertrand Meyer, with his Eiffel, tried to develop programming patterns around it. I've never met a working programmer building real systems who had read his books. I've never met anyone who programmed in Eiffel. That doesn't stop people from reading Meyer and thinking about what he wrote, although his object-oriented ideas, in my view, are dated. Object orientation, I think, was a false direction.
Anyway. Single Static Action. It's similar in spirit to SESE. A program is some number of nested loops. Each microstep of the program either reads one field of some structure, or writes one. So somewhere in memory you have a database. Each type of structure is a record type. All living instances of that structure form a table, never mind whether it's iterable, indexed, whatever. The instances form a table conceptually. There's a type X and there are records of it living somewhere on the stack, in the heap.
Single Static Action says: ideally, if I name a field, it should be written in one place. This file, this line.
Interviewer: Let me retell it, so it lands for the listeners. The program is nested loops, that's logic. The table is state. The loop says: now the state is this, now this, now this, now this.
Alexei: I'm not sure what you mean by "state." State can mean all of RAM, the whole program state. In the older sense of the word, state meant just the instruction pointer. In state machines, "state" is the number of the instruction about to be executed. But since nobody tracks that cursor anymore, "state" is often used to mean all of memory, all of the program's RAM.
You meant memory: a table with fields, with values, that we read and write and update.
Let's take a concrete example. A function searching for a number in an array. A simple single loop in C++. Elements of the array are instances of a struct. We walk them and read them. Query, read-only. That's fine.
Now consider a function that mutates. Say one that just searches for the minimum. The minimum gets written every time we find a new candidate. Or a program that, on startup, rewrites parameters passed on the command line, with chosen defaults overriding what the user supplied because the values made no sense. If you're going to rewrite, do it once. There should be one place in the program where this variable is set.
There's a rule: "don't use global variables." Where does it come from? Nothing, really. People just follow it. I don't have global variables in my programs, unless there is no other path. In Rust lately I use statics sometimes, but read-only, initialized once. No mutable globals.
Interviewer: The reasoning was: globals are problematic in two ways. One is multi-threading, races, volatility. The other is that if access is global, mutations come from anywhere, and you can't reason about the variable's lifecycle.
Alexei: Exactly. The rule was actually invented against the second one, against changing them from many places, not against them being globally reachable.
I'll confess: I have lost the ability to write programs without global variables. I used to follow all those principles. Gradually I moved to another set. I now write programs only with global variables, and they all grow from one place, the same way the program itself grows from main and branches out. All my variables grow out of one structure called _db, the database, and they branch out too. Every variable that has cardinality one becomes a field of _db. If it has other cardinality, it's reachable via _db, in an array, or off a linked list, or through some pointer. They are all global. But all writes are controlled.
Interviewer: So functions reach into _db, read what they need, and update in place?
Alexei: They update, but a rule is introduced: you cannot, from file A, go and modify variables that are modified in file B. Because that violates Single Static Action. Call a function in B and reach the change through a standard channel.
Let me tell you a secret. You've written code in the browser, in JavaScript? document, isn't that a global variable? It absolutely is. Beneath it sits the tree of all elements on the page. And it is the best way to live. Imagine if it didn't exist. You wouldn't know how to find any element on the page. The idea that the program operates on records, that records live in a database, and that the database is global, that's what makes all variables global.
We were sold a wrong idea by people who wanted programming to be context-free, a language parseable from any point as a recursive expression, with no context. Context-free grammars, if I remember right. And from that grew this idea that "for our functions to be correct, they will just perform transformations." That doesn't survive contact with real life. You have one window on the screen and you have to replace one picture with the next without flickering. How do you follow this absurd notion that nothing is reachable, just transforms?
Your view is the right one, and it's also self-similar. You have a computer, there are bytes in memory, the bytes are absolutely global, each with a unique address. The program transforms them. Your little program is isolated in a virtual address space so it can't trash anything else, but it still has bytes and it still mutates them.
We structure those bytes because we want to operate on more abstract, more compact concepts. We define structs, arrays of structs, indices. We transform them. A function walks the DOM, replaces every button with a checkbox. That's a transformation. Not a returned array you have to apply somewhere, but a substitution. A transaction against this database. Easy to grasp, easy to test, easy to implement, and the program is over.
All variables are global. All actions on a given variable happen in one place, so you can think about whether they're correct. All control flows from one point, main. All data flows from one point, _db. That's the right factorization of programming ideas. Two completely different worlds. The world of data is structured as indexed tables. The world of instructions is structured as nested tracks for the CPU. The processor transforms the data.
Interviewer: And reads can be from anywhere?
Alexei: Any number. No private, no nothing. I don't use any object-oriented machinery in any of my programs, beyond the most trivial. Nothing private, no virtuals, no templates.
These two principles, SESE and Single Static Action, set the authorial style of how a program is built. From this clay, the whole structure is shaped.
Interviewer: This is what I find interesting. Most listeners, the majority of programmers, do not write code this way. Most languages have object models: encapsulation, inheritance, all of that. And what hits me is that this is some kind of genuine engineering freedom. We were taught by books, Clean Code, Code Complete, refactoring, and at the start of your career you're tested on this stuff and you absorb it as gospel.
Alexei: Robert Martin, Steve McConnell, Martin Fowler. They are berries of the same field.
Interviewer: And here you are, having built a lot of systems, building a huge skyscraper now by a completely different set of rules, rarely used in the industry. That is engineering freedom.
Alexei: Maybe. Or it's a non-freedom. As they say, freedom is recognized necessity. I have a recognized necessity. What I want is for the skyscraper to stand, and for me not to be running between floors patching pipes and rewiring outlets. I don't want to live next to the project doing maintenance. It just has to work. Period. So I can close the lid and walk away.
Look at how people love to ship code: "I'm not sure this function is right, so I'll insert a print. Let's ship it." Because when that print fires, somebody, somewhere, will complain, the message will reach me eventually, I'll look at it and say, "ah, yes, now I know I was wrong. Before, I was just unsure." This is every programmer. Anyone who leaves a print in code is counting on exactly this. They haven't let it go. They are still in a relationship with that piece of code. If you wrote correct code, there should be no prints. The program should just run.
You can add the dreadful topic of exceptions. They are very catchy and very useful. In a certain class of programs, RPC programs where you make a query and return, fine, use exceptions. But generally, no. Because an exception follows from SESE. An exception is nothing other than a multi-level return to an unknown number of levels. It's worse than a return. It could be 150 levels. If you are such a genius that you can prove that across every possible return path your program satisfies every invariant, including ones you'd have to name, then by all means, put your throw there. But if you can't say that, neither can I. No programmer can predict anything about their own program.
Proof by thought experiment: ask anyone to write the most elementary program and sign their name to a guarantee that it will compile and run. Nobody will sign it, even for two lines, because somewhere a semicolon will be missing. So practically nobody can write a working program on the first try. Which means: you look at the three lines you wrote and you don't really see them. You only approximately understand what you wrote. You need the program to run for the compiler error or the unexpected output to tell you where the typo is. You can't write code without that feedback loop with the computer. That means you don't know what you wrote.
So vibe-coding is not new. The original vibe-coding was: run, run, run, debug, run. Print-driven development. And because we can't write a correct program on the first try, we invented the word bug. There's no bug in the program. There's a missing understanding in your head of what you wrote. That would be the honest thing to say. No offense. It offends me that I don't understand my own program. For me it's an axiom.
From that axiom I draw the conclusion that my program must be as close as possible to one I can still understand. So I exclude complex ways of writing in favor of simple ones. A non-local return is complex. I look at a screen of code, and somewhere above the first line there's a return that jumps over the whole screen and lands below. I can't keep that program in my head. I don't want it. I want it simple, close to what I can understand, with the smallest number of surprises. That's why exceptions are excluded. An exception is the hiding place for everything I didn't understand about what I wrote. I don't know what value to return, so I'll just throw.
Interviewer: It's an abdication of responsibility. Somebody will catch it.
Alexei: That's literally what throw means. You toss it over your shoulder. The important thing is not to turn around.
Interviewer: Now, the listeners are engineers. With this single global database, the obvious question is multi-threaded access. How do you handle contention?
Alexei: I don't. The programs are single-threaded.
Interviewer: And scale?
Alexei: There's a semantic trick. If I run 128 single-threaded programs on your machine and use all the cores, is that still a single-threaded program? Each one is. You utilize the machine's resources optimally, just through processes rather than threads.
For the last few years I've been working with the notion of a tree of programs. I create sub-programs from one program, all single-threaded, and they communicate only by message passing. That is the only way they communicate. Actor model, message passing, whatever you want to call it. There's a computer scientist named C.A.R. Hoare who wrote Communicating Sequential Processes, an enormously influential book, even though his model is slightly different (instant message delivery). Go's channels are a direct descendant. Plan 9 and 9P are a direct descendant. The idea is: we are so bad at understanding even a single-threaded program, let's not write multi-threaded ones.
I wrote my first multi-threaded program in 1993. I happened to spend a summer at an institute in New Jersey where someone had shipped an SGI Onyx with 16 processors, 64-bit memory, the size of a wardrobe. Nobody used it. Nobody knew how. I got root access because the security was nonexistent and nobody was watching. I wrote programs for myself, played with multi-threading. Then I lost access, and always wanted to come back to it.
Dual-core machines arrived. Everyone tried to use the second core. Create a critical section, create a mutex, hot-spin on something. I played with this for ten years. And then I realized: I have forgotten how to program. I look at text on the screen and I no longer understand what it does. I can't say anything about it. I don't even understand in what order it executes. Complete dead end.
I decided to close this topic forever and only write single-threaded programs. I couldn't have done it alone, since it's hard to give up the performance argument. If my program is twice as fast, that might not matter if you only run it once. But if it doesn't fit in one computer, then there's a difference between one computer and two. The difference between a data center and a single box doesn't need explaining. Performance is real.
Around 2008 or 2009, my high-frequency trading company was putting servers next to exchanges. They sent market data over multicast on my own networks. The bottleneck was always Ethernet, gigabit at the time. On eBay there were terrifying devices: 20-gigabit Myricom, 40-gigabit InfiniBand. I bought them, learned them, and discovered an entire topic: scale-out computing. You execute a mov instruction, you write a byte to your own memory, and that byte ends up on a different machine. Memory is just a device on the bus. There is a router there that routes that byte. Put a different device on the bus, a card, and the byte arrives on the card. The card knows: "ah, a byte was written here, let's forward it to another computer." That's almost how it works. It's used in supercomputers. A scientist writes a Fortran weather simulation. He doesn't want to know about threading, he writes a formula. He has an array of a trillion by a trillion, and somehow this has to be spread across the supercomputer. Wherever he reads a cell that should live on the first machine, or writes one that should live on the second, the bytes get forwarded. OpenMPI annotations get compiled into a program that uses verbs and runs in parallel.
There are special cards you can drive directly. They forward bytes in roughly 600 nanoseconds host-to-host. You write, and 600 ns later the value is in the other machine's memory. Faster than a Linux system call to the kernel. And you can send a message that arrives on five machines at once via multicast over InfiniBand, which the kernel can't even cover, because you're communicating one-to-many.
When I saw this, I saw the future. I realized I'd been suffering for nothing. Write single-threaded programs. The hardware will catch up. It already exists. By the way, all the AI today runs on NVIDIA cards. NVIDIA's cards are Mellanox cards. They bought Mellanox, the Israeli company that invented InfiniBand. They all do kernel bypass. 600 ns is long gone. They push 800 Gbit/s in every direction.
Interviewer: Let's describe the system. What does AlgoX2 look like?
Alexei: I used to call it a data streaming platform. Then I understood it's a distributed OS. It's a file system where the unit of storage is a stream, and there are two operations: append a message to a stream, and read an already-written message from it. Any process can append to any stream. Forget ACLs for a moment. It just appends. That point of appending is one place. It's called the transaction module or sequencer. The only transactionality in the system lives there. Messages are appended to the stream: 0, 1, 2, … 147. If you unplug the cluster and look at the disks with a magnifying glass, you'll see some number of streams, each containing some number of messages. That is the entire state of the system. Nothing else.
Management of such a system: I introduce a special stream called /sys/cmd. The path looks Unix-like on purpose. /sys is the system directory, and cmd is commands. When any process starts up, it begins reading /sys/cmd from offset zero. It has no Unix-style file system. That doesn't exist. There is only the world of streams and the two operations.
By reading /sys/cmd, every process learns how many nodes are in the cluster, what streams exist, who has joined, who has disconnected. To talk to the system, you authenticate at a gateway and append a message to /sys/cmd. Every process sees it, every process interprets it. In Linux everything is a file. In X2, everything is a stream: append-only, 2⁶⁴ messages, and that's it.
Interviewer: So if a node restarts, does it have to replay the whole history?
Alexei: There is a snapshot mechanism. You read a different stream that contains "continue from here." But let's skip it, it conceptually pollutes the description. The startup problem is solved, but every node doesn't replay the history of the universe. It replays what matters to it. Snapshots are how we shrink a trillion records down to a million. Conceptually we can describe the system without them.
The point is: you control the system using the single operation present in it, streaming. You write to a stream, and your modules see those messages. Modules are themselves producers and consumers. All the way down. Self-similar.
To handle external producers and consumers, we essentially add more internal producers and consumers. Which means the hottest, most loaded part of the program had better be very fast. Dogfooding. You can't take a shortcut on the path that serves your own internals. Speeds are on the order of microseconds. We currently support three communication methods: InfiniBand, Ethernet multicast, and Ethernet unicast.
Interviewer: Kafka's good thing is throughput by parallelism, many producers and consumers in parallel. With single-threaded processes, how do you absorb load?
Alexei: First, single-threaded, but many of them. Many means many resources. But let's compare directly. First, how Kafka works.
There's a server, on the server there's a broker, and that is Kafka. You connect and say, "I want to append a message here." That "here" is a topic plus a partition number. The broker says: "I don't write here, go to another broker. Bye." Fine. "Give me metadata." "Okay, brokers serve these subsets, go there." You go to the right broker and append. It says: "Done, here is the base offset." You read by going to a broker, asking for messages from orders/47 starting at offset X. It says "I'm not the leader replica, go there." Right.
Kafka is, essentially, a single-host system. The broker writes to a file on disk. It opens a second file (the index) and appends 4 or 8 bytes per message, the position of that message in the first file. The index. To read message 1,131,131 you don't know its position (variable length), so you open the index, jump to 1,131,131 × 4 bytes, read 4 bytes, that's the offset in the first file, you read the message and stream it. That's all of Kafka.
The trick before serving: Kafka has to be sure that the new messages were synced to a replica and that the replica returned OK, meaning it called write(), not that the bytes are physically on disk. Nobody fsyncs after every message. Performance would die.
Now imagine a producer writing 100 Gbit/s, the network is 100 Gbit/s. Theoretically the broker can take all of it. What if it serves it to 10 consumers? Each one gets 10 Gbit/s, since the broker can't emit more than 100 in total. That means the effective broker throughput is 10 Gbit/s. Goldratt's theory of constraints: throughput equals the bottleneck. How do you fix it? Have the broker multicast onto 10 servers, each serving at 100 Gbit/s. Ten consumers connect to ten gateways. Head-of-line blocking, gone. With unicast you can roughly get to 50. Your node sends to two, each of those sends to two, and so on. In three hops you have eight nodes, each at 100 Gbit/s, supporting sixteen consumers at 50 each. You traded three hops of latency for throughput.
If ten consumers all want 100 Gbit/s, "the latency got worse" is the wrong framing. The alternative is infinite latency, since you can't deliver. Three hops is much better than infinity.
A node in X2 is a tree of processes, in the base configuration around 8, sometimes 12 or 16. The main parameter is the number of gateways and the number of commits. More gateways means more clients (TCP overhead, many connections per gateway). More commits means better-amortized disk access, since not all disks are fast. Our optimal configuration uses NVMe, and we have a special SPDK driver that talks to NVMe in the most native way.
A byte arrives at a gateway. Path is open, ACLs done, authentication done. The gateway hands the message to a sequencer, always. Single Static Action: it always goes to the sequencer. If the stream is served on this node, the sequencer publishes it and assigns it the next sequence number. If the stream is served elsewhere, the sequencer publishes it into an auxiliary stream that the other sequencer is subscribed to. Publication always happens, into the right place. The other sequencer reads the auxiliary stream, sees a publish request, publishes, and sends an acknowledgement.
A message is served to a client once we have evidence that it has been received by N nodes, by some set of commits. Internally, our components operate at ack level 1: they read messages that exist in at least one copy, so the system can make progress. You, as an external consumer, want messages that exist in two or three copies, durably replicated. Same mechanism, different parameter. Lower ack inside, higher ack outside. To realize a higher ack level we use the lower one.
The subscription mechanism is uniform. A module says, "I want to see the first message of /sys/cmd." It sends a request into the ether. When it receives it, it asks for the second. And so on. The whole cluster is a humming network of these little requests.
Interviewer: How do you debug something like that?
Alexei: When many processes interact, patterns emerge that you can't see by testing components individually. That's exactly what makes this domain interesting: emergent phenomena. You test one component at a time. Take a commit. Conceptually simple, it writes streams to disk. It is a subscriber like everyone else. The subscription API is small: one call. You subscribe to a stream and a range of messages, say 0 to infinity (i.e. 2⁶⁴ − 1). The library guarantees you receive messages in published order at the requested ack level. You handle them. A commit writes them to disk.
But commit is non-trivial. Imagine 10,000 streams, 200 topics × 50 partitions. If you open a file per stream and an index file per stream, you need 20,000 file descriptors. The system creaks. Worse, bytes arrive interleaved across all of them, and from the disk's point of view, that's random-access writing. Disks hate random access. Especially NVMe.
Why? NVMe, Non-Volatile Memory Express, is a device on PCIe that writes at ~20 Gbit/s and reads even faster. The data persists. It's flash. Flash has the property that blocks (about 8 MB) can be written but not overwritten. To rewrite, you call "erase block." Each block can be erased roughly 2,000 times before it dies. And you'd say: why am I paying for this device that breaks after 2,000 rewrites? Because inside the device there's a controller doing wear-leveling, distributing writes evenly across blocks. You write to block 345 twice, the second write doesn't go where you think, and the controller remaps. When you do random-access appends, at some point the controller can't find a free block. It has to take live data from a half-used block, move it elsewhere, and erase. Garbage collection. Not fast. Throughput collapses by a factor of one hundred.
So with many small files, GC kicks in and performance dies. You need a storage engine that writes everything append-only. And if you're a streaming data system that appends, then recursively, you should be appending into something that appends, not into individual files.
This is what we do. We have our own engine for this. It's a big research area. The best the industry has come up with is log-structured merge trees, LSM trees, used in LevelDB, RocksDB, and many databases built on top of those. The previous version of our engine used LSM. The new engine, written by us, is different.
I looked at RocksDB, several hundred thousand lines of code. Everyone said "use Rocks, why are you loading LevelDB?" I looked at LevelDB, around 20,000 lines, written by Jeff Dean. I thought: that's like getting a library from Donald Knuth. I'll take it without thinking. RocksDB had "the team at Meta, half a million lines" on it. I looked at their commit log and bugs were not disappearing. Constant churn. I looked at LevelDB's commit log and the bugs had simply disappeared. It was written and it worked. Life hack: judge a library by its commit history.
Our new engine does what we previously did with LevelDB in about a thousand hand-written lines. The trick, of course, is that we generate about 20,000 lines under it. The complexity, to me, is 20× smaller because I don't read generated code. I specify the data structures and the code is generated for me.
Interviewer: Let's pause here, because in two hours you said "code generation" for the first time. Before we go there, give us numbers. How much faster than Kafka?
Alexei: At different loads, latency is roughly 100× lower. Throughput is roughly 10× higher under many loads, though not all. There are loads where Kafka fully saturates its resources and we cannot beat physics. But where data is read, written, and served simultaneously to some number of consumers, we win significantly. Where Kafka delivers a message in 5 ms, we deliver in 50 µs. If you ask that 99% of publishes complete within 200 µs, we can sustain throughput tens of times higher than Kafka, because Kafka can't fit a single message in that window. When you saturate hard, we still win because we scale out. Data isn't fed from one point but from many, propagated by multicast.
Depending on profile and saturation: 10× to 100× on both latency and throughput.
Interviewer: Now, code generation. You said the new engine is 1000 hand-written lines. What gives?
Alexei: It's an open-source engine called OpenACR, at openacr.org. The generator is called amc, the Algo Model Compiler.
You don't write C++ classes. You write records in a relational plain-text format called ssim, for self-similar or super-simple, take your pick. Each line is one tuple in some table, and a tag at the start says which table. We don't say "class," "subclass." None of that exists. There is a ctype, a structure. It has fields. The names are not coded. We don't try to hide what they are. I tell the generator: I want a pool of instances of this type, I want to index it by this field with a hash, I want to group-by these records using a binary heap.
Say I have connections, and I have packets that need to be sent. A table for connections describing the connection, with no functions in it, because the processor executes instructions, not objects. A table for packet bytes. A set of those packets. A group-by: bytes-to-send collected under each connection via a binary heap, ordered by some priority key. A cross-reference: connections looked up by a key in a hash. I describe the data structure, and the generator generates code. From C++ I just call allocate_packet, fill the fields, call XrefMaybe, and the record is inserted into all relevant indices. If I delete somewhere, cascade-delete propagates: pointers are cleared, records are removed from indices. Simple, tedious, repetitive, predictable.
You optimize the schema for your task, and that is your performance. The generated header contains your functions and structs. The generator never enters human code, and the human never edits generated code. But the generated code is formatted for humans. Every function is documented, the implementation is right there, and you can step through it with a debugger. Single entry, single exit. Set a breakpoint and it resolves to one place, not to 150 places like with templates. With templates, what you see in the header is not code. It's a template for code. The sources aren't shown to you. Back in 1997 some C++ compilers would actually instantiate templates as C++ snippets in separate files, and you could debug them. They don't do that anymore. They rewrite the AST in memory and the code is nowhere. You can't even disassemble it. Distasteful.
The generator has no templates. For each concrete task, the generator emits the concrete code, with all symbols unique, no overload resolution, nothing hidden. You step through with the debugger, in your code, generated for your task.
Interviewer: What does ssim look like?
Alexei: Plain-text key-value records. The format is ssim, for self-similar or super-simple. Each line is one record. Any union of ssim files is an ssim file. Any subset of lines of an ssim file is an ssim file. Elements of a set. Like rows in a table, but with a type tag on each line, so you can mix rows of different tables in one file. SQL doesn't have a single universal format that annotates table name, column names, and values in one self-contained row. ssim does. The maximally simple format for describing rows of a database such that it commits cleanly to git, has no spurious merge conflicts, is read and written by humans and machines, and looks like a command line.
I designed the format. It's about 15 years old. It's open. Trivial to support in any language. Claude writes ssim flawlessly. And it's economical: every line of ssim generates about 20–25 lines of code. Counting hand-written versus generated, the generator emits about 95% of the code.
The 1000 lines of the new engine are hand-written lines. The ssim is on top of that. I just counted, about 100 ssim records for the engine. Those describe the in-memory data structures used for the log-structured trie, the mem-table, the on-disk tables.
Interviewer: With this approach, you can build a custom library on the fly?
Alexei: Yes. And you do, as needed. Libraries are slightly cursed. Their users want new features the library doesn't have. The existing users by definition don't want those features. They haven't read them, they don't use them. You constantly lower the library's efficiency by adding features for other people's use cases. You potentially change the API. Breaking changes, feature creep.
A large class of libraries is essentially some pre-arranged data structures (a few tables, particular cross-references) plus algorithms. The algorithms are almost never universal. They depend on the application. If you can specify the data structure as a few elementary records, with no syntax to speak of, since they are just rows in a database, and then write your specific algorithm against it, the glue you used to write against the library often exceeds the size of the library. The library doesn't quite fit, and you adapt it. If instead you generate the exact data structure straight into your module, you get a clean criterion for adding or removing things. You can't apply this criterion to a library, because adding is a breaking change, and removing is a breaking change. The tragedy of the commons.
Middleware can't be inlined. That has to be one shared implementation everywhere. But pure data structures should be linked into the program, generated, not used via libraries. The result is very compact.
Interviewer: Recursive code generation, is that something compilers do?
Alexei: Compilers also generate code, the assembly we don't see. So yes, code generation is everywhere. But this is different. You know the classic Lisp REPL, read-eval-print loop, and you've probably read the famous Lisp REPL written in Lisp. It's a mind-bender. You understand it implements Lisp, in Lisp. Lisp is self-similar. It can interpret itself. But there's a serious problem: each level of interpretation costs you a factor of ~100 in performance. Do it twice and it's already impractical. Beautiful, unusable for a financial exchange.
A second example of self-similarity: a compiler that compiles itself. Run the object code on its own source and you get the same object code. This is a milestone in any compiler's life. After that, every subsequent compiled version is built by itself. You can throw away the bootstrap ladder. GCC compiles GCC.
What we have at OpenACR is a third example, less known: sources that generate their own sources. Not interpretation, not a Lisp REPL, and not just object code generating itself. Sources generating sources. Without performance loss, because everything ends up as compiled C++. About 95% of the generator's own source is generated by the generator.
You start writing the generator the normal way. Then you notice a data structure inside it that the generator could itself produce. You replace it with ssim. You regenerate. Iterate. What remains is a small core that walks tables and emits text. Everything else lives as ssim.
If you open amc and read it, you see straight transformations: walk fields of a struct, write into the C++ file we're emitting. The boilerplate that every library otherwise rebuilds is simply not there. It's factored out. That's why the engine fits in a thousand lines.
Interviewer: And if the generated code is not optimal for performance somewhere, what do you do? Inline assembly? Hot-patch?
Alexei: It does happen. I see a default generation that could be a little better. My answer is almost always: leave it. Why? My algorithm doesn't change. My algorithm needs the data structure and certain functions. The fact that those functions are slightly slower than ideal is not a property of my algorithm. I just got run on a slower computer. The computer can be upgraded without changing my program.
So I open a session, add a new ssim field that controls generation, defaulting to 0, and define mode 1 that emits a faster variant. I flip the bit on one record. In 90–100% of cases, I never end up doing it. It was good enough. The actual bottleneck was elsewhere. In a large system the bottleneck floats and is almost never where you think it is.
Take this file engine. Lots of files, lots of indices, parallelized. It's slow. Because parallelism creates many locations, and many locations imply switching cost. You touch one place, you touch another, that's switching. In memory, bytes in the same cache line (same address modulo 64) are very cheap to access together. Different cache line, switching cost. One level up: every 4 KB is a page. First touch faults, the kernel allocates real memory, returns. You don't notice it, but it's switching cost. Having many open files is switching cost. Somewhere in the kernel there's a list of dirty files and somebody walks it. In our case the best improvement was: collapse it all into one file. And in our new engine the index goes into the same file. We don't have an index file. You just keep streaming. Conceptually trivial: you write messages, and periodically you write a message that lets you find the earlier ones via a formula.
Interviewer: Backwards-readable?
Alexei: Yes. Append-only means information lives at the end. You write 1000 messages, then write an index record that indexes those 1000. Another 1000, another index record. After a thousand of those, a higher-level record indexes the previous thousand index records. You get a structure we call a log-structured trie, which lets us keep all streams in one file. Two files would have been fine. Ten thousand wouldn't. Kafka with 100,000 topics simply can't be created. It would take many minutes for 10,000, because each partition becomes a file, and the cost grows quadratically.
Interviewer: Time to bring this back. We set out to discuss what writing code as humans looks like in 2026. Now let's teleport to the reality we actually live in: programming with Claude Code, with Opus 4.6.
Alexei: I'll just say: I think Claude will be excellent at building this kind of system inside this set of constraints. He follows them well. By laying down the boundaries, you're more confident that you get what you wanted, because he conceptually couldn't have done otherwise. And it's better that the code be generated by a tested generator than dumped into the codebase by Claude himself, polluting context for no reason. The amplification is real. With the generator on top, you go a level higher and ship better, without sacrificing quality or system complexity. The big problem most people have today is: we can generate code, but what do we do with the result?
If I offered you a 20× larger context window, you'd say "twenty million tokens, that's the model of the future." Letting Claude work with much less data is the same as giving him a much bigger context window. He fits a lot more, because he doesn't need to hold auxiliary things that could have been generated.
I admit: I've used Claude Code for just over a month. The moment I downloaded it, it stayed pinned on my screen. Most of my work goes through it: documentation, git commits, the new engine. There were things I knew exactly how to write by hand, and I wrote them with Claude anyway, partly to train myself in how to specify what I want. To feel how the system reasons. The system is shockingly powerful. I hear Anthropic has an even stronger thinking model they aren't releasing. Black Hat would be in disarray if they did. Until the industry pays Anthropic a few more billion to fix the bugs, hackers don't get it. It's going to get stronger.
Could I have written a LevelDB-class engine with Claude if I didn't understand what I was writing? No. So I formed an understanding, and with Claude I drove on a bicycle the path I would have walked on foot with a backpack. Same destination, faster.
Everyone in the company, I push them. I have a window where I see who is using how much. If you're not using Claude, use it more. I'm not worried about your brain atrophying. If it wasn't there, it won't appear. If it was, it isn't going anywhere. This is purely a bicycle. You sit on it and ride. You don't lose the ability to walk.
Interviewer: How do you use it?
Alexei: I go to it and say, "let's discuss this topic." I watch whether it surfaces something that shifts my understanding. Sometimes I have a name in mind for something. I say, "how about this word?" It says: "good, but it can be confused with this other thing where it already means something else." I think, "you're right, I like your version better." But I have to go to it. It won't open my repo on its own and say "rename this." Although having Claude with full access to the repo for the past month has fundamentally changed the relationship. Before, I pasted snippets into a chat. That's completely different.
It helps you think because you can't always go deep on your own. You have some concentration ceiling. Noises pull you back, chores pull you back. You can't drop down to level 10. But from level 3 you can hand Claude an instruction, and it brings you to level 6, maybe level 10. You can work on the move. Aggregate efficiency rises sharply. I recently drove a long way to the mountains and managed to write something with Claude. He thought while I drove.
Interviewer: I had a phase I called "claudophoria." I realized I could write all my software myself. I opened eight tmux panes and started building five products in parallel. My brain ran out roughly two weeks in. I had a working podcast platform, a podcast player, a VPN, half a dozen others. And then I sat down and asked: now what? I burned out. Now I understand: the tool is great, you have to know your limits. My thinking muscle gets more strained on a normal workday with Claude than before. Time was taken away. Before, to understand a piece of code and solve a task, I had time, a day, two days. The brain chews. With Claude I'll be done in an hour. Now I don't have those two days. My brain has an hour to ride a bicycle, but I still have to filter. Where to dig in, where not, what to delegate.
Alexei: If there's little code and it does a lot, you're fine. The code has to fit in your head while you work on it. All of Linux doesn't fit in your head. Not in mine either. But the kernel of the kernel does. Linux is rich in data structures, a bottomless well of inventiveness. The core idea costs nothing to understand: a process writes into a buffer, another reads from it, blocks when there are no bytes, blocks when the buffer is full, and the kernel in the middle switches them. That's half of Linux. Disk, page faults, and so on are stuck onto the sides.
Open the Linux kernel with Claude, study it. Don't read it top to bottom. Ask: "what does this do, show me, explain it at the interface level." For understanding an existing codebase, this tool is brilliant. He becomes a study buddy.
Interviewer: I learned Rust about a year and a half ago, full time, for myself. I jumped on the last carriage of the train of people for whom learning Rust still made any sense. The borrow checker, lexers, parsers. I sat with it, suffered, because I wanted to write systems code. Right after I learned it, Sonnet 4 came out and started writing decent Rust, then Opus 4 and 5. Now I find it hard to motivate any single programmer to sit down and learn Rust today. I jumped on the last carriage.
Alexei: I have learn Rust on a todo somewhere. I want to feel what the creator wanted. I think it's a particular lexical structuring of code that excludes the wrong access patterns. But it probably won't become my language, because I don't think that way. Excluding wrong access patterns lexically is possible but undesirable, because I want full freedom: circular data structures rooted in no single place, arbitrary substitutions and transformations on them. If it's been proven correct, you can use it. You shouldn't constantly hold a bare circular saw in your hands. You wear goggles, hold the board correctly, clamp it down properly. A powerful tool grows a safety culture around it. It doesn't shut down possibilities. I want to use the laws of physics. I want everything the computer can offer me to be available. So I don't want a "safe" language. Maybe I'm wrong, and I should read it, not lazy out.
Modern C++ is some kind of baroque. They went somewhere I don't want to follow. Computing primes at compile time with exponential time and space, in template language. Not for me. For me C++ is a way to turn a line of code into assembly, with a few guards against the saw. Assigning structs, returning structs, those are good features. The rest of C is essentially ideal. Using-namespace, no. Overloading is useful but I avoid it. I want unique function names. Even Claude finds it easier when functions don't repeat. He doesn't have to resolve.
Interviewer: With this bicycle, what skills should engineers cultivate, and what can they drop?
Alexei: I'm probably the wrong person, because I work at a fundamental level, on infrastructure others build on top of. My output is a data plane. Users do whatever they do: monitoring factory temperatures, or whatever else. My space is well defined. But since you're asking, here's what I do. I go to Claude and say: "let's discuss this topic." I see if he shifts my understanding. I pitch a name for something. He says "it collides with X." I get a better name. He helps me think.
You have a concentration ceiling. You can't always go deep on your own. From a shallower level, you can hand Claude an instruction and end up deeper than you started. That is the muscle. Filtering. Choosing where to go deep, what to delegate. That's the work, harder than before in some ways. If code is small and does a lot, the relationship is healthy. If code is large and does a little, you're burying yourself.
Interviewer: Final word.
Alexei: If any of the topics interest you, whether SESE, Single Static Action, the streaming OS, the code generator, or working with Claude, write to me. Happy to keep talking.
Want more context on the platform? Read about the product, or get in touch on the contact page.