* CASE STUDY Nº 06 · PERSONAL · ONGOING

Harness Engineering Playbook

Collection of skills, hooks, and harness configurations I use and continuously tune in production - living material from AI Committee research.

WIP LIVING DOC

12+ SKILLS TRACKED

8+ HOOKS

∞ STILL TUNING

i THE PROBLEM

The model is a commodity. The harness is where reliability lives.

Swapping models is trivial and gets cheaper every month. What separates an agent that demos well from one that runs reliably day to day is not the model - it is everything around it: the system prompt, the available tools, the skills, the hooks, the permissions, and how context and memory are managed.

That "everything around it" is the harness, and it is bespoke. By default I kept reconfiguring the same things in each repo, redoing the same refactors by hand, and hoping the agent would remember the rules. Manual repetition is debt - it needed to become versioned configuration.

ii THE APPROACH

Treat the harness as a personal product: versioned, reviewed, and continuously tuned.

- Thesis: the model is a commodity; the harness is where the leverage and real reliability live.
- Reusable skills in SKILL.md for recurring tasks - the agent loads them only when the description matches the context.
- Hooks as deterministic guardrails: formatting on PostToolUse, blocking destructive commands on PreToolUse.
- Permissions and context/memory management tuned from Bamse's AI Committee applied research.

iii STACK & ARCHITECTURE

AGENT

Claude CodeSkills (SKILL.md)SubagentsMCP

GUARDRAILS

HooksPreToolUsePostToolUsePermissions

CONFIG & CONTEXT

settings.jsonCLAUDE.mdMemoryContext compaction

iv PROCESS

Capture what already works before generalizing it.

Every skill or hook starts from a concrete pain that showed up more than once. If I redid the same refactor by hand three times, it becomes playbook material.

Write the guardrail as code, not as a polite request.

Auto-formatting, blocking dangerous commands, and lint checks shouldn't depend on the model remembering. They live in PreToolUse/PostToolUse hooks that the harness runs, not the agent.

Version everything and treat it as production code.

Skills, hooks, and settings.json live in Git, go through review, and evolve as diffs. A change to the agent behavior is a code change - traceable and reversible.

Take findings to the AI Committee and bring them back.

What survives daily use on the OMO project becomes a committee recommendation; what the committee validates comes back into my personal harness. Closed loop.

v CURRENT STATE

The playbook has no final version - it is the living record of what I actually run every day, and it changes every time a new case shows up.

- Skills now cover the refactors and scaffolds I repeated most - less retyping, more consistency across PRs.
- PostToolUse hooks run lint and format without me asking; PreToolUse blocks `rm -rf` and pushes to the wrong branch before they happen.
- Tuned settings.json and permissions cut the number of permission prompts without giving up the guardrails.
- It is living material: nothing here is "done". Every week of real use exposes a new case that becomes a hook, a skill, or a line in CLAUDE.md.

NEXT CASE STUDY

Harness Engineering Playbook

Capture what already works before generalizing it.

Write the guardrail as code, not as a polite request.

Version everything and treat it as production code.

Take findings to the AI Committee and bring them back.

OMO Laundromats