What 466 AGENTS.md files teach about context engineering

Jitpal Kocher · · 11 min read

Key takeaway

A study of 466 open source projects accepted to MSR 2026 finds that developers writing AGENTS.md files cluster into five context modes: descriptive, prescriptive, prohibitive, explanatory, and conditional. Half of all files are never updated after the first commit, and the most common sections are conventions, contribution rules, and architecture. The taxonomy is the first empirical structure for context engineering as a discipline.

A study of 466 open source projects, accepted to the 23rd International Conference on Mining Software Repositories in 2026, gives the first empirical look at how developers write context for AI agents. The researchers scanned 10,000 active repositories, found that 5% had already adopted some kind of AI context file, and analyzed 155 AGENTS.md files in depth. The results are the closest thing the field has to a snapshot of context engineering as it is actually practiced.

Two findings stand out. The first is a structural one: developers writing for AI agents cluster into five distinct modes, and the spread of those modes across files explains a lot of agent failures. The second is a maintenance one: 50% of AGENTS.md files are never updated after the first commit. Together, they describe a discipline that has a working format and almost no practice around it yet.

AGENTS.md is becoming the de facto always-read layer

AGENTS.md is winning a specific job, not every job. Modern coding harnesses run on multiple context layers stacked together: instructions ingested in full at the start of every task, reference material retrieved on demand from the codebase or documentation, and state that persists across sessions like memory, decisions, and prior conversations. The mistake is treating these as substitutes. They have different latency, freshness, and review requirements, and pushing one job into the wrong layer is what produces the failures the MSR 2026 paper is documenting.

The always-read layer is the one that needs to be small, durable, and structurally embedded in the project. That layer is what AGENTS.md (an open standard donated to the Linux Foundation alongside MCP), CLAUDE.md, and .github/copilot-instructions.md are converging on. The MSR 2026 study found AGENTS.md alone in 466 of 10,000 active repositories, with Copilot instructions and CLAUDE.md adopted at similar rates. A .md file at the root of the repo is the only surface that:

  • Lives in source control, so every change is reviewable.
  • Travels with the codebase, so a fresh clone has the same context the original developer had.
  • Is portable across agent vendors, since all major coding tools now read some variant of the format.
  • Survives the agent’s session, because the file is durable and the chat is not.

Most of the alternatives (chat memory, IDE-specific settings, in-app instructions) fail at least one of these properties for the always-read job. Tightly coupling the always-read layer to the repo as markdown is what lets project-specific guidance get versioned, reviewed, and shipped with the code.

The other layers belong elsewhere. Reference context (API docs, large codebases, third-party libraries, anything the agent needs to consult selectively) does not belong in AGENTS.md and breaks the format the moment someone tries: the file grows, gets stale, and the agent reads it all on every task regardless of whether the section is relevant. That work belongs in a retrieval surface or MCP server. State (prior decisions, session memory, what the agent learned last time) does not belong there either: it changes per session, has no review surface, and is exactly the kind of context that benefits from being stored and queried rather than ingested wholesale.

The MSR 2026 paper makes the gap in the always-read layer visible. Copilot instructions average 310 lines per file and CLAUDE.md averages 287 lines, while AGENTS.md averages 142. The shorter average is not a sign that AGENTS.md authors are doing more with less. It reflects the format being newer and most files being early-stage scaffolding rather than serious operating documents for the always-read tier.

The five-mode taxonomy is the structural finding worth keeping

The study’s most useful result is a taxonomy. Across 50 closely analyzed “Conventions and Best Practices” sections, the researchers identified five stylistic modes developers use to write instructions for agents. The categories describe individual statements, not whole files, and most production AGENTS.md files mix all five.

ModeWhat it looks likeWhen it helps the agent
Descriptive”This project uses pnpm workspaces.”Orienting the agent to the shape of the codebase
Prescriptive”Run pnpm typecheck before every commit.”Defining the workflow the agent should follow
Prohibitive”Never edit files under vendor/.”Carving out hard boundaries the agent must respect
Explanatory”We use camelCase because the old codebase fought tooling for years.”Giving the agent the rationale, so it can extrapolate
Conditional”If a test file exists, run it before claiming a fix is done.”Capturing rules that apply in some situations and not others

The taxonomy is useful because each mode targets a different failure pattern. Descriptive statements prevent the agent from making wrong assumptions about the codebase. Prescriptive statements prevent it from skipping required steps. Prohibitive statements prevent destructive edits. Explanatory statements give the agent enough rationale to handle adjacent cases without needing a separate rule for each one. Conditional statements encode the kind of “it depends” knowledge that experienced developers carry in their heads.

The teams that report best results with coding agents are the ones whose files use all five modes deliberately. A file that is entirely descriptive reads like a wiki and gives the agent no behavioral signal. A file that is entirely prescriptive reads like a checklist and breaks the moment the situation deviates from the script. The mix matters. This is the same principle that drives structured context for retrieved data: type information lets a consumer reason about what it is reading, instead of treating every chunk the same.

What the most-common sections reveal about what agents actually need

The paper’s second quantitative result is a frequency count of section headings across the 140 files with structural analysis. The top categories, in order of how often they appeared:

Section typeFiles containing it (of 140)What it gives the agent
Conventions and best practices50Naming, formatting, idiomatic patterns
Contribution guidelines48Branch flow, commit format, PR expectations
Architecture and project structure47Where code lives, how modules relate
Build and run commands40The exact commands required to verify a change
Project description32What the project is for

The ranking matters because it inverts the order of priorities most teams use when first writing context for an agent. The temptation is to start with the project description (humans need this), then architecture, then conventions. The data says the agent benefits most from conventions and contribution rules, with project description ranking last. The agent does not need to know what the project is for in order to fix a bug correctly; it needs to know how this team writes commits and which directories are off limits.

The build-commands result is the one to internalize. Forty out of 140 files explicitly document the test, lint, and build commands. The remainder leave the agent to infer them from package.json, the README, or by trying commands until one works. Every team that has watched an agent run npm test in a project that only has pnpm test configured has met this failure mode. A two-line “build commands” section eliminates it.

What 50% staleness means for context engineering

The paper’s evolution numbers are sobering. 50% of AGENTS.md files were unchanged after the initial commit. 23% had been modified once. 21% had been modified two to seven times. Only 6% of files had been touched ten times or more. The most common changes, when they did happen, were adding a new instruction (78 occurrences across the corpus) or modifying an existing one (59 occurrences).

A 50% staleness rate at this point in the format’s life is what context engineering looks like before practice catches up to format. The same dynamic plays out elsewhere. Most retrieved-knowledge bases for agents are written once, indexed, and rarely re-evaluated against the projects they cover. Most system prompts get tuned during initial development and then frozen. Most MCP tool descriptions are authored when the tool ships and never revised based on how the agent actually uses them. The agents themselves get updated; the context they read does not.

The result is a slow-motion version of context rot. The agent’s behavior remains stable, but the project around it drifts. A naming convention changes, a directory moves, a test runner gets replaced, and the AGENTS.md file still describes the world as it was at the first commit. The agent confidently follows out-of-date instructions, and someone has to debug why a passing local test breaks in CI.

The teams whose files appeared in the high-edit tail (six percent, ten or more commits) are the ones treating the file as a living document. Anecdotally, those teams report the highest satisfaction with coding agents. The paper does not measure this directly, but the relationship is plausible: context that gets revised tracks the codebase; context that does not, decays.

How to apply the taxonomy to a real AGENTS.md

The MSR 2026 paper does not prescribe a structure, but the data points at one. A file that earns its keep covers each of the five modes deliberately and updates as the project changes.

Start from a failure pattern, not a template

A blank or copy-pasted file is worse than no file because it sets the agent’s expectations incorrectly. The teams that get value from AGENTS.md start with a specific failure pattern: the agent keeps committing in the wrong format, the agent edits the wrong directory, the agent uses the wrong build command. Each entry in the file should map to a failure the team has actually seen.

Mix the modes deliberately

Use descriptive statements to orient (where things live, what the project is). Use prescriptive statements for the workflow the agent must follow (commands, commit format, review steps). Use prohibitive statements for hard boundaries (never edit generated files, never push to main). Use explanatory statements when a rule looks arbitrary and a smart agent might second-guess it. Use conditional statements to encode the “it depends” rules that distinguish a contributor from a stranger to the codebase.

Treat the file as code

The 50% staleness rate is the gap to close. Every time a workflow changes (a new linter, a new test runner, a moved directory), the AGENTS.md file is a place to update, the same way a README is. The lowest-friction version is a habit: when a code review surfaces a “the agent did this wrong” comment, add a one-line rule to the file before merging.

Pair repository context with retrieval

AGENTS.md is durable, scoped, and version-controlled, which makes it the right surface for stable project facts. It is also bounded: the average file is 142 lines, and even the long tail of CLAUDE.md files tops out around 287. Most coding tasks need more context than this, particularly when the agent has to reason across multiple files, prior decisions, or external knowledge. An agent connected to a Wire container reads AGENTS.md for stable repo facts and calls wire_search, wire_navigate, and wire_explore for the cross-session, per-project context that should not be hand-maintained in a markdown file. The five-mode taxonomy still applies to both surfaces: descriptive entries live in either layer, prescriptive and prohibitive rules belong in the repo file where review applies, conditional and explanatory context is where stored memory pays off most.

The taxonomy is the start of a discipline

The five-mode taxonomy is the first piece of an empirical vocabulary for context engineering. Before this paper, the field’s working categories were “system prompt,” “retrieved knowledge,” and “tool descriptions,” all of which describe where context lives, not what kind of context it is. Descriptive, prescriptive, prohibitive, explanatory, and conditional describe the work each statement is doing. That is the level at which context engineering becomes teachable and reviewable.

The staleness number is a separate observation. A discipline with a format but no practice around it will produce exactly the result the paper documents: files written once, frozen, and quietly drifting against the codebases they describe. The path forward is the same as for any code artifact. Treat the file as a living document, review it the way you review tests, and update it when the world changes.

If you take one thing from the study, take the taxonomy and apply it to whatever context layer you already maintain (AGENTS.md, CLAUDE.md, system prompts, retrieved knowledge). Most of the failures attributed to “the model is bad at instruction following” are statements that landed in the wrong mode. Prescriptive rules written as descriptions get ignored. Conditional rules written as prescriptions break the moment the condition does not apply. The categories are coarse enough to be useful and specific enough to change behavior.


Sources: Context Engineering for AI Agents in Open-Source Software (arXiv 2510.21413) · MSR 2026 conference · Donating the Model Context Protocol (Anthropic)

Frequently asked questions

Why does AGENTS.md matter for context engineering practice?
AGENTS.md is the first widely adopted format for storing agent context inside a software project's repository. It puts context where the code lives, which means version control, code review, and team ownership all apply. That makes it the closest thing the field has to a standard surface for studying how developers actually write context for AI agents in production conditions.
What are the five AGENTS.md writing modes from the study?
Descriptive (informational, describing the project), prescriptive (mandatory instructions the agent must follow), prohibitive (explicit don'ts), explanatory (rationale and reasoning behind a rule), and conditional (context-dependent guidance that applies only in certain situations). Most production files mix several modes; the categories describe individual statements, not whole files.
Why do half of AGENTS.md files never get updated?
Most repositories treat the file as a one-time setup artifact rather than a living document. Once an agent works on a sample task, the file gets archived in memory and never reopened. The 50% staleness rate is the most concrete evidence available that context engineering is not a default practice yet, even among teams that adopt the format.
How does AGENTS.md compare to CLAUDE.md and Copilot instructions?
All three serve the same role: a markdown file in the repo that ships context to coding agents. CLAUDE.md averages 287 lines and Copilot instructions average 310 lines, while AGENTS.md averages 142 lines. The shorter length is a side effect of the format being newer and the tooling enforcing fewer conventions, not a sign that less context is needed.
Should every project have an AGENTS.md file?
Only if a coding agent is actually working in the repo and getting tasks wrong without it. A blank or copy-pasted file is worse than no file because it sets false expectations. The teams that benefit have a specific failure pattern in mind (wrong test command, wrong commit format, wrong directory layout) and write the file to address it.

Ready to give your AI agents better context?

Wire transforms your documents into structured, AI-optimized context containers. Upload files, get MCP tools instantly.

Create Your First Container