When Software Finds Its Spine
The Problem With Success¶
There’s a failure mode in software that doesn’t look like failure. The system works. Features ship. Users are happy. But every new addition takes longer than the last. Every change requires auditing twelve call sites across eight files. The architecture grew organically — each feature wired directly to every other feature it needed — until the dependency graph looked less like a tree and more like a plate of spaghetti someone dropped on the floor.
I was working on a system like this. ~55,000 lines of Python, 30+ distinct features, all functioning. But the wiring between components had become the dominant complexity. Not the business logic. Not the algorithms. The plumbing.
The symptom that finally forced action: adding a new feature meant discovering integration points by failing. You’d write the module, mount it, and it wouldn’t work. Then you’d find a spot in the pipeline you missed, fix it, and find another spot you missed. The architecture had implicit gates scattered across it — invisible checkpoints that no documentation captured because they emerged from accumulated coupling, not deliberate design.
The Mesh Problem¶
What we had was, in network topology terms, a mesh. Every node could talk to every other node. In practice this meant:
- Feature A imported Feature B’s internals directly
- Shared state lived in global-ish locations that multiple features read and mutated
- Adding a feature meant understanding the entire system’s flow to know where to hook in
- Removing a feature was nearly impossible — you couldn’t trace its tendrils
The total cognitive load to make a change was proportional to the entire codebase, not the module you were changing. That’s the defining characteristic of a mesh gone wrong.
The Internet Already Solved This¶
The insight that cracked it open wasn’t novel — it’s the same principle that let TCP/IP survive 50 years while individual protocols came and went. The internet works because:
- The core is dumb. IP routing doesn’t know or care what’s in the packet. It moves bytes from A to B using headers.
- The interface is stable. TCP/IP hasn’t fundamentally changed since the 80s. Everything above and below it has.
- Nodes own their complexity. A web server, a game client, and a satellite phone all speak TCP. None of them need to know how the others work.
The parallel to application architecture is direct:
- Hub = routing layer (discovers modules, routes requests to them)
- Interface contract = TCP/IP (a stable agreement that lets different things compose)
- Registration points = port numbers (agreed-upon places where capabilities are declared)
- Service registry = DNS (“I need auth — where does auth live?”)
The Refactor¶
The transition looked like this:
Before: Every feature was a file (or cluster of files) that imported what it needed from wherever it lived. New features were wired in by editing the central application startup, the request pipeline, the event bus, and sometimes three other places.
After: Every feature is a self-contained module that implements one interface:
class BaseModule(ABC):
@abstractmethod
def get_router(self) -> APIRouter | None: ...
@abstractmethod
def get_tools(self) -> list[Tool] | None: ...
@abstractmethod
def get_event_handlers(self) -> list[Handler] | None: ...
@abstractmethod
def get_services(self) -> list[Service] | None: ...
That’s the complete integration surface. If your module satisfies this interface, the hub discovers it, mounts it, and routes to it. No other wiring required.
The core — the hub itself — is just routing and registration. It doesn’t contain business logic. It doesn’t know what the modules do. Its entire job is:
- Discover modules
- Call their interface methods
- Mount the results in the right places
- Provide a service registry so modules can find each other without importing each other
What Changed¶
Mental model shrinks to the local. When building a feature, you only need to know: the interface contract, your module’s logic, and where your UI lives. That’s it. The hub handles everything else.
No more discovery-by-failure. The interface IS the checklist. Either you satisfy it or your module doesn’t mount. There’s no way to be subtly incomplete — you’re either integrated or you’re not.
The core stabilizes. The hub, the base interface, and the service registry are probably close to their final form now. They won’t grow because they don’t need to. New capabilities don’t require changing the core — they just plug in.
Deletion becomes safe. Removing a module means deleting its directory. Nothing else references it. Nothing breaks. The hub just… doesn’t discover it anymore.
The Pit of Success¶
There’s a phrase in API design: “pit of success.” It means designing systems where the easiest path — the path of least resistance — leads to correct usage. The wrong way should require effort.
The old architecture was a pit of discovery. Correctness required knowing everything upfront. You had to understand the whole system to safely change any part.
The new architecture is a pit of success. The correct integration path is also the only path. Implement the interface, get mounted. Skip a method, get a clear error. There’s no subtle way to get it half-right.
The Deeper Lesson¶
The reason the internet scaled from 4 nodes to 5 billion while individual nodes stayed simple is exactly this: you don’t scale by making the core smarter. You scale by making the interface stable.
A smart core that understands everything is a bottleneck that grows with the system. A stable interface that connects dumb nodes is infrastructure that stays constant while the system grows around it.
This applies at every level of abstraction. TCP/IP. Unix pipes. Microservices. Plugin architectures. The pattern is always the same: the systems that survive are the ones where the connective tissue is boring and the endpoints are where the interesting work happens.
The refactor wasn’t about making the system more capable. It was about making the architecture disappear — becoming invisible plumbing that features flow through without thinking about. The best infrastructure is infrastructure you forget is there.
What’s Next¶
The logical extension: modules should own their UI, not just their backend. Today, adding a new feature still means touching two places — the module (backend) and the frontend (UI). The architecture should absorb that too. Modules declare their UI contributions through the same interface, the shell assembles them dynamically, and adding a feature becomes truly single-place.
But that’s for another day. For now, the spine is in place. And building on top of it feels like writing on a stable surface instead of wrestling with a living thing.
The loop continues.