tylergray.dev

Field Notes/2026

Give the Agent What It Wants

The smartest software ever made does its work through terminals, plain text, and diffs. A field guide to the parts of the 2026 stack that survived contact with production, and the one rule that explains why.

June 7, 2026 · 8 min read

he most capable software ever made starts its day like a programmer from 1978: a terminal, a pile of plain text files, grep, and a diff. Claude Code and Cowork are where my work actually happens now; the dashboards and web apps I used to live in are still running underneath, but I rarely visit them. For a while the mismatch bothered me. Frontier intelligence, boring tools. Then I noticed that nobody imposed the tools on the agent. It picked them. Plain text is what it was raised on, what fits its context budget, and what it can verify. Everything that follows is one rule, applied: give the agent what it wants.

Agents: the word finally means something

For two years “agent” meant whatever the person saying it needed it to mean. The definition has since settled into something you can build against: a model running in a loop, choosing its own next step, with tools it can act on and an environment that pushes back. Andrej Karpathy calls the larger arrangement “a new kind of computer,” one you program in English, and dates it to the 1960s of its own history: real but early, expensive to run, reached through a terminal. At that age, what the agent wants from you is modesty. Anthropic’s Building Effective Agents is still the best writing on the subject, and its central advice is exactly that: simple, composable patterns, not complex frameworks. Start with one model call plus retrieval. When you need structure, prefer a workflow, a fixed and debuggable path, over a true agent. The multi-agent swarm looks wonderful on an architecture slide; in production it costs ten to fifteen times the tokens and fails in ways you get to debug across processes.

The autonomy that makes an agent useful is the same autonomy that compounds its errors and its bill.

After Anthropic Engineering, Building Effective Agents

MCP: the boring standard that won

Every computer eventually needs a bus: a standard way to attach things to it. For this one it is the Model Context Protocol, the rare case of hype converging on something real. Anthropic open-sourced it in November 2024 to fix an unglamorous problem: every model-to-tool connection was bespoke and fragile. Within months OpenAI, Google, and Microsoft shipped support, and in December 2025 Anthropic handed the protocol to a vendor-neutral foundation under the Linux Foundation. That is the real signal, more than any download count: a standard becomes infrastructure when the companies competing on top of it stop trying to own it. What the vendor decks leave out is that the agent does not want your connector count. Most public servers are thin wrappers that dump verbose JSON into the context window, and Stacklok’s 2026 survey puts production use near forty percent: substantial, not universal. A model handed forty mediocre tools reliably performs worse than one handed six good ones.

Markdown: the format the models were raised on

Of everything the agent wants, the cheapest to grant is format. Markdown won for unsentimental reasons: it is cheap, and the models know it cold. The same content rendered as production HTML can cost roughly three times the tokens, because every <div>, class name, and inline style is overhead the model pays for and then ignores. The training corpora are full of markdown, so models read it the way you read your native language. Strip whatever you feed a model down to markdown first. The caveat is that markdown is not always what it wants: deeply nested or interdependent structure comes through more cleanly in XML-style tags, and complex tables often survive better as HTML.

HTML: the interface comes back through the side door

The agent does not want your dashboard; you do, sometimes, and that is where the interface comes back. In January 2026 the MCP ecosystem shipped MCP Apps, which lets a tool return an interactive HTML surface that renders inside the conversation in a sandboxed iframe. Figma used it for inline component editing; Hex put a filterable dashboard directly in the chat. Thariq Shihipar, who leads engineering on Claude Code at Anthropic, pushed the same logic further this May: he has nearly stopped writing markdown documents for people and has Claude generate disposable HTML instead. Markdown is for what the model reads; rendered surfaces are for what humans read. What dies is the permanent dashboard, the one you navigate to and maintain; the interface survives by becoming disposable, generated in the moment, scoped to the question you actually asked, and thrown away afterward. Screens stop being destinations and become output. The open problem is trust: you are now rendering model-influenced HTML, so sandboxing and provenance matter more here, not less.

Cowork: delegation, not conversation

Claude Cowork, released as a research preview in January 2026, is the moment the new layer left the engineering department. It is Claude Code for the rest of your work: point it at a folder, describe an outcome, and it executes the multi-step job. Ask what the agent wants here and the architecture answers plainly. Simon Willison’s read is the one to internalize: it boots a Linux sandbox and mounts only the files you hand it, so it cannot touch anything you did not explicitly grant. A scoped folder, a clear outcome, a place to work. That is the entire wish list. The deeper shift is in the verbs. You stop writing prompts and start assigning tasks, which is to say the batch job has come back: describe the work, submit it, inspect the output. It still has the rough edges of a research preview, so the right mental model is a fast junior colleague whose work you sign off on before it ships.

And nothing more

Every rule this clean needs its amendment. The failure pattern is measured, not anecdotal: METR found that models complete few-minute tasks nearly every time and multi-hour tasks less than ten percent of the time, with reliability decaying exponentially in between. Every additional step is another roll of the same dice. Unlike a compiler, this layer is not deterministic; when the abstraction leaks, someone still has to understand what is underneath and fix it there. And there is a difference between what the agent wants and what it will ask for. It will ask for access: your files, your accounts, the open web, an outbound channel. Grant all of it at once and you have built what Willison calls the lethal trifecta: private data, untrusted content, and a way out. That combination is an exfiltration engine waiting for instructions. None of this refutes the rule; it completes it. Give the agent what it wants, and nothing more than you can verify. The layer holds where failure is cheap and checkable, which is why code went first: code ships with its own verifier in the compiler, the test suite, the diff. The interesting work of the next few years is making everything else look more like that.

The Monday version

If you skipped to the end, here is the whole argument as things you can do this week.

  1. Build the smallest thing that works. One model call beats a workflow beats an agent beats a swarm, in that order.
  2. Adopt MCP, then spend your effort on context discipline rather than connector count. Six good tools beat forty.
  3. Convert everything to markdown before it reaches a model. Fence structured context in tags. Count tokens.
  4. Stop maintaining dashboards for one-off questions; let the agent render disposable HTML scoped to the task.
  5. Move from prompting to delegating. Assign outcomes, sandbox aggressively, review before shipping.
  6. Give the agent what it wants, and nothing more than you can verify. Never combine private data, untrusted input, and a way out.
  7. Be suspicious of any pitch that leans on the word autonomous. The tools that hold up describe themselves as boring.

In 1991 Mark Weiser wrote that the most profound technologies are the ones that disappear: they weave themselves into everyday life until they are indistinguishable from it. This one is disappearing into the dullest tools we have. Terminals, text files, diffs. The through-line is restraint. From Anthropic’s own engineers to the protocol authors to the independent builders, everyone keeps arriving at the same place: do less, hand the model real tools, keep a human in the loop. The computer has a new user, and it is easy to please. The future arrived after all. It just turned out to be boring.

← All notes