Every conversation about agents starts with the same question: which model to use. It's the wrong question, or at least the least important one. The model is the engine. The harness - the whole system wrapped around the model - is the car. And I've watched plenty of Formula 1 engines spin out inside a badly built car.
Harness engineering is the discipline of building everything AROUND the model: the system prompt and instructions, the set and quality of the tools, the skills, the hooks that run before and after each tool call, the permissions, context and memory management, and the feedback loops - tests, verification, retries. It's where reliability in production is won or lost.
The engine is the easy part
You swap a frontier model in one config line. What you don't swap in one line is the harness: the tools you designed, the guardrails you wrote, the context you chose to show. A good harness turns a mid-tier model into something dependable. A weak harness wastes the best model on the market - it has the power but nowhere to plant its feet.
"Swapping the model is tuning the fuel injection. If the tires are bald, the car still slides off the track on the same corner."
Where the harness decides the game
Three harness decisions that changed how our agents behaved in production, in the order they hurt the most.
First, tool design. We had a generic "run query" tool the agent got right about half the time. We broke it into narrow tools, with names that say exactly what they do and arguments that don't let the agent improvise. The hit rate went up without touching a single comma of the model. A tool is the agent's interface to the world: a bad name and an ambiguous argument are a product bug, not an AI bug.
Second, a hook that runs AFTER every file edit and formats the code automatically. The agent doesn't have to remember to run the formatter, doesn't burn a turn on it, and the diff comes out clean every time. It's deterministic - it doesn't depend on the model being in a good mood. On this very site formatting runs that way, in a post-edit hook, not in the agent's head.
Third, a hook that runs BEFORE every shell command and blocks dangerous patterns. It exits with code 2, which actually denies the call and hands the reason back to the agent to read - unlike a warning that just complains and lets it through. It's the difference between "please don't do that" and "the system won't let you do that".
Before asking for a better model, ask whether the agent has the right tools, the right guardrails, and the right context. Almost always the problem is in the harness, not in the engine.
The best thing about the harness is that it's yours. The model is another company's product, updated and deprecated outside your control. The tools, the hooks, the instructions, and the feedback loops are code you write, version, and test (mine lives in the Harness Engineering Playbook). That's where the durable advantage lives - and it's why I spend far more time on the car than on picking the engine.