Field Notes/2026

Less Magic, More Plumbing

MCP, Markdown, HTML, Agents, and Cowork. A field guide to the parts of the 2026 stack that survived contact with production, and the marketing that did not.

June 7, 2026 · 9 min read

here is a tell that separates the people who ship AI systems from the people who sell them. The sellers reach for the word autonomous. The shippers reach for the word boring. If you read the people whose advice has actually aged well over the past eighteen months, you notice they keep arriving at the same unglamorous conclusion: the systems that work are the ones with the least cleverness in them. What follows is a tour of five things everyone is talking about, sorted into what holds up and what is theater.

Agents: the word finally means something

For two years “agent” was a Rorschach test. By late 2025 the definition had quietly stabilized into something usable: a model running in a loop, deciding its own next step, with access to tools and an environment it can act on. Simon Willison spent the year refusing to use the term until it earned its keep, and even then his framing is deflationary. A function that loops and calls a model is an agent. Nothing mystical happened. We just got models good enough that the non-deterministic loop is worth running.

The most quoted piece of practical guidance on this is still Anthropic’s Building Effective Agents, and its central claim is almost rude in its plainness: the teams that succeed use simple, composable patterns rather than complex frameworks. Start with a single model call plus retrieval and a few examples. Most problems never need more than that. When you do need structure, prefer a workflow, a fixed path you can predict and debug, over a true agent. Reserve the open-ended agent for cases where you genuinely cannot hardcode the steps, and where you trust the model enough to let it run for many turns.

The autonomy that makes an agent useful is the same autonomy that compounds its errors and its bill.
Paraphrasing Anthropic Engineering, Building Effective Agents

The honest weakness lives in the multi-agent fantasy. Orchestrating swarms of specialized sub-agents photographs beautifully on a slide. In practice a multi-agent setup burns roughly ten to fifteen times the tokens of a single agent and adds coordination failure modes you now have to debug across processes. The “intelligence” rarely justifies the cost. Most teams that reach for it would have been better served by one good agent and a tighter tool set.

MCP: the boring standard that won

The Model Context Protocol is the clearest example of hype converging on something real. Anthropic open-sourced it in November 2024 to solve a dull problem: every model-to-tool connection was bespoke, fragile plumbing rebuilt from scratch. Within months OpenAI, Google, and Microsoft shipped support. By early 2026 the official SDKs were pulling tens of millions of monthly downloads, a curve that took React roughly three years and MCP roughly sixteen months. In December 2025 Anthropic handed the protocol to a vendor-neutral foundation under the Linux Foundation, co-founded with Block and OpenAI. That governance move, not the download count, is the signal. A standard only becomes infrastructure when the people competing on top of it stop trying to own it.

The useful frame here is the one that treats MCP as the data-and-tools layer, the Westlaw or LexisNexis of agents, rather than another UI skin. Agents do not need prettier dashboards. They need a uniform way to reach authoritative data and take action, and MCP is that uniform way. If your conviction is that interfaces collapse into intent and the surface area moves to the protocol, the last year reads as confirmation.

Now the part the vendors skip. Most public MCP servers are thin wrappers that dump verbose JSON straight into your context, and tool results are where context windows quietly die. Adoption is also less universal than the marketing implies. The most credible survey of the year put production use in the software cohort somewhere around forty percent, a strong number for a young protocol and a long way from the “everyone is already in production” line. More tools is not better. A model handed forty mediocre tools performs worse than one handed six good ones, because every tool description and every bloated result competes for the same finite attention.

Markdown: the format the models were raised on

Markdown is not winning because it is elegant. It is winning because of two unglamorous facts. First, it is cheap. The same content rendered as production HTML can cost roughly three times the tokens of clean markdown, because every <div>, class name, and inline style is pure overhead a model has to pay for and then ignore. Second, the models were trained on oceans of it: README files, documentation, forum posts. They read markdown the way you read your native language, without a translation pass.

The discipline that follows is simple. Whatever you feed a model, strip it to markdown first. Scraping a page for context? Convert it. Building a corpus? Store it as markdown. You reclaim context window and you remove ambiguity, because ## means heading and nothing else, whereas HTML can express the same heading a dozen structurally different ways and make the model guess.

The fluff is the claim that markdown is always optimal. It is not. When structure is interdependent or deeply nested, explicit XML-style tags give the model cleaner demarcation than markdown’s flat syntax. Complex tables sometimes survive better as HTML than as pipe-and-dash markdown that collapses on the third nested cell. The rule is not “markdown everywhere.” The rule is: default to markdown, reach for tags when structure carries meaning, and measure tokens before you assume.

HTML: the interface comes back through the side door

Here is the plot twist for anyone who declared dashboards dead. In January 2026 the MCP ecosystem shipped its first official extension, MCP Apps, which lets a tool return an interactive HTML interface that renders directly inside the conversation in a sandboxed iframe. Figma used it for inline component editing. Hex used it to render a filterable dashboard in the chat. Anthropic and OpenAI built the extension together, and it launched with support across multiple clients on day one.

The dashboard did not die. It stopped being a destination and became a thing the agent assembles for you, on demand, then throws away.
On the agents-versus-interfaces debate

This resolves the lazy version of “agents replace dashboards.” The static dashboard you navigate to, click through, and maintain is the thing under pressure. The interface itself is not going anywhere. It is just generated in the moment, scoped to the question you actually asked, and discarded after. HTML is the rendering target for that. The agent does the reasoning, then hands you a small purpose-built surface to act on the result. That is a better deal than either a wall of prose or a permanent dashboard nobody updates.

The caveat is security and trust. You are now rendering model-influenced HTML inside your tools. Sandboxing matters, provenance matters, and “the agent generated a UI” is not a reason to skip the same review you would give any other untrusted markup.

Cowork: delegation, not conversation

Claude Cowork, which arrived in January 2026, is the clearest consumer-facing version of the whole thesis. It is described, accurately, as Claude Code for the rest of your work. You point it at a folder, describe an outcome, and it executes the multi-step job: drafts the memo, runs the analysis, edits the spreadsheet, queries an MCP server, and shows its reasoning as it goes. The architecture is the interesting part. By Willison’s reading it boots a custom Linux environment inside a virtualization sandbox and mounts only the files you hand it, so it literally cannot touch what you did not grant.

The shift it forces is the one worth internalizing. You stop writing prompts and start assigning tasks, and the unit of interaction moves from “answer this question” to “produce this deliverable.” It takes a different muscle, and that is the muscle 2026 keeps rewarding across every tool on this list.

The honest limitations are real and the marketing glides past them. It is a research preview with rough edges. Early testers reported it chewing through gigabytes of files it should not have. Sub-agent coordination is impressive and experimental in the same breath, which means it drifts. The desktop session is not yet cloud-persistent. None of this makes it a toy. It makes it a power tool, and power tools earn respect by being reviewed before the output ships.

The Monday version

If you skipped the prose, here is the whole argument as a checklist you can act on this week.

Build the smallest thing that works. One model call beats a workflow beats an agent beats a swarm, in that order of preference.
Adopt MCP, then spend your effort on context discipline, not connector count.
Convert everything to markdown before it hits a model. Fence structured context in tags. Count tokens.
Stop maintaining dashboards for one-off questions. Let agents render disposable HTML surfaces instead.
Move from prompting to delegating. Assign outcomes, sandbox aggressively, review before shipping.
Be wary when a pitch leans on the word autonomous. The tools that actually hold up tend to describe themselves as boring.

The through-line is restraint. Every credible voice in this space, from Anthropic’s own engineers to the independent people who actually build, keeps landing on the same place: the winning move is to do less, hand it real tools, and keep a human in the loop. The orchestration diagrams are mostly fluff; the plumbing is where the substance lives. In 2026 the unsexy stack is the one that ships.

← All notes