All posts

Context is becoming expensive real estate

Token cost used to be a developer footnote. As agents get deployed for real, it turns into infrastructure cost, latency, and margin. Notes on a category that is starting to form.

Someone found Conduit on its launch day and described it back to me better than I have managed myself. Vivi put it roughly like this: token cost is no longer just a developer concern. As AI agents get deployed widely, it becomes infrastructure cost, latency cost, and eventually enterprise margin. And then the line that stuck: context is becoming expensive real estate.

I had been circling that idea for weeks without saying it that cleanly, so I want to write it down.

The footnote that grew up

If you have read the other post, you know the mechanism. Every MCP server you connect dumps its entire tool list into your agent’s context on every single request. Three servers can cost around 24,000 tokens of definitions before you have asked anything. It is an invisible tax: you pay it on every turn, before the agent does any real work, which is exactly why it stays hidden until it surfaces somewhere you can feel it.

For one person running one agent, that tax is a slightly slower, slightly dumber assistant. It is annoying, and easy to ignore.

Now change the scale. Picture a company running agents in production: thousands of sessions a day, each one re-sending the same wall of tool schemas on every turn, multiplied across users and multiplied again across every model call inside a single task. The waste did not change. The bill did.

The same overhead that reads as “my agent feels sluggish” for one person reads as a line item for a business: more input tokens on every call, higher latency on every turn, and a context window filling up with plumbing instead of the actual problem. Past a certain volume, how efficiently you use context stops being a preference and becomes a cost center.

Context is a budget

The mental shift underneath all of this is treating context as real estate rather than a junk drawer.

The default behavior of most tooling today is load everything, just in case. Every server, every tool, every schema, sitting in front of the model at all times, on the theory that the agent might need any of it. That made sense when models had small toolsets. It does not scale, because the cost of carrying a capability you might use is paid constantly, while the value of having it shows up rarely.

The more interesting default is the opposite: context is a budget you spend on purpose, and a tool has to earn its place in it. Pull in what is relevant, when it is relevant, and leave the rest in a catalog the agent can reach for. That is what lazy discovery does for tools specifically, but the principle is bigger than tools. It applies to memory, to retrieved documents, to history, to anything competing for the same finite and increasingly valuable space.

What the category actually is

It is tempting to call this token compression and move on, but I do not think that is quite it. Compression is about squeezing the same payload smaller. The real shift is about relevance: getting the right things in front of the model at the right moment, and being honest about what does not earn a seat.

Let me be straight about where the hard part is. Deciding what is relevant, on demand, without the agent losing track of what is even possible, is not a solved problem. It is easy when a task obviously names the tool it needs. It gets genuinely hard when an agent has to plan across tools it did not know it would reach for. I have measured the easy case, and I am still working out the hard one. Anyone telling you this part is finished is selling something.

But the direction feels right, and her framing is why I think this is a category rather than a feature. As agents move from demos to deployed systems, the layer that decides what occupies context, and does it well at scale, starts to look less like tooling and more like infrastructure. “Token optimization infrastructure” is an ugly phrase for a real thing.

The bet

Conduit is one early, narrow bet on this. It sits at the gateway layer, the single chokepoint every tool call already passes through, and does the relevance work there so each agent does not have to. Today that means lazy discovery and a flat, three-tool surface no matter how many servers you connect. The honest reason it lives at the gateway and not inside one agent is that this is the kind of problem you want to solve once, in the place everything routes through, and then stop thinking about.

If context really is becoming expensive real estate, then deciding what gets to live there is worth doing well. That is the thing I am trying to build.

github.com/tsouth89/conduit

Download Conduit Star on GitHub