Why Agent Tool Standardization Fails in Practice¶

This post is part of ongoing research into AI agents and skills by Tianjie Shen, a software engineering intern at Wavether. Tianjie studies Mathematics and Economics at the University of Toronto.

Standardization bodies exist to prevent fragmentation. The Agent Skills standard was designed to enable portable skill distribution across multiple AI agents. The premise is sound: write a skill once, deploy it everywhere. However, while the Agent Skills standard establishes skill scanning in the .agents/skills directory or an agent-specific directory .<agent>/skills as a convention, not all agents respect this. The five agents Claude Code CLI, Copilot CLI, Copilot in VS Code, Cursor, and Codex all scan different paths when looking for skills and behave differently when skills with the same name appear in multiple locations. As a result, skills organized in the filesystem according to the behaviour of one agent is unlikely to work for other agents, and the agents themselves also do not provide configurability, which would otherwise allow users to manually introduce interoperability.

Why Multiple Agents Matter¶

Organizations increasingly deploy multiple AI agents across their developer workflows. A team may use Claude for architectural reasoning, Copilot for IDE integration, Cursor for speed-of-coding, and Codex for specific domain work. This allows for optimization: each task can be given to multiple agents to determine which agent is most suited for the task, then similar tasks in the future can be directed to that agent.

Developers also face vendor lock-in risk; committing to a single agent ecosystem means accepting that agent's design choices, performance characteristics, and pricing model as fixed constraints. On the other hand, maintaining multiple agent configurations allows the team to shift their work between agents for the best possible performance or the lowest cost, especially as the performance or pricing of agents change. This advantage is often significant enough to justify the added operational complexity.

Understanding the Agent Skills Standard¶

A Skill is a self-contained, discoverable capability that an agent can invoke at runtime. It is packaged as a directory containing a mandatory SKILL.md metadata file (which declares the skill's name, description, and operational parameters) alongside optional subdirectories for scripts, references, and assets. Skills allow for automating specific workflows; for example, a developer may write a custom linter skill that enforces team coding standards.

The Agent Skills open standard (agentskills.io) was designed to make skills portable across agent ecosystems, and eliminate reimplementation overhead. A developer writes a skill once and then deploys it to a standard location where all agents can discover and invoke it. Every agent that supports the standard (full list here), is able to add the contents of the skill to its context when working on a relevant task.

Gaps in the Standard¶

The Agent Skills standard establishes a convention for skill paths at user and project levels:

User level: ~/.agents/skills, ~/.<agent>/skills (e.g., ~/.claude/skills)
Project level: .agents/skills, .<agent>/skills (e.g., .github/skills)

In addition, when two skills with the same name exist in different locations (i.e. there is a name collision), the convention is that project-level skills override user-level skills. If the conventions were consistently applied by all agents, then deploying skills to multiple agents is a trivial process: just publish once to the standard user-level location ~/.agents/skills and to the .agents/skills for each project.

However, neither the scanning location nor the collision handling behaviour is a requirement. In our testing of the five agents Claude Code CLI, Copilot CLI, Copilot in VS Code, Cursor, and Codex, only the behaviour of Copilot CLI was found to be fully in-line with the conventions. For example, Claude Code CLI does not read skills placed at ~/.agents/skills; in case of a collision, Claude Code CLI and Codex show all copies, and in Cursor, user-level skills override project-level skills.

As a result of this lack of standardization, developers need to maintain separate skill implementations and deployment procedures for each agent, introducing much more complexity into the task of making the same skill available across multiple agents. The aforementioned approach of placing one copy in ~/.agents/skills and the .agents/skills for each project will result in Claude and Codex accepting both versions, Copilot CLI accepting only the project version, and Cursor accepting only the user version.

With no enforcement of the conventions with respect to where skills should be discovered, these inconsistencies are a result of architectural choices made independently by each team. Those choices express different answers to a question the standard does not address: what should a skill be?

Three Incompatible Architectures¶

Testing revealed three distinct architectural patterns across the five agents.

Pattern 1: Curated Library (Claude Code CLI, Codex)¶

Claude and Codex treat skills as a curated collection. When a developer creates skills at multiple locations with the same name, both agents display all versions to the user. The user can then choose which version to invoke. This pattern assumes skill name collisions are rare and that transparency is preferable to automatic resolution.

For Claude, the user-level path is ~/.claude/skills and the project level path is .claude/skills. Both are read.

For Codex, the user-level generic path is ~/.agents/skills, Codex-specific paths include ~/.codex/skills and ~/.codex/skills/.system (reserved for built-ins), and the project level includes .agents/skills and .codex/skills. All are read.

Pattern 2: Layered Search with Fallback (Copilot CLI)¶

Copilot CLI implements a strict scope hierarchy. When invoked from a subdirectory of a project, it searches for skills in this order: subdirectory scope, then project root scope, then user scope. It stops at the first scope where the skill is found. This pattern assumes that closer scopes should override broader scopes. A developer can override a user-level skill by creating a project-level version with the same name; within a project, a subdirectory can override both.

Pattern 3: Scope-Limited Search (Copilot in VS Code, Cursor)¶

These agents apply hard boundaries on scope.

Copilot in VS Code reads user-level paths (~/.agents/skills, ~/.copilot/skills) and project root paths (.agents/skills, .claude/skills, .github/skills). It does not read subdirectories at all, even when invoked from a subdirectory. The project is treated as a fixed boundary.

Cursor reads all locations most permissively for user-level skills, showing ~/.agents/skills, ~/.claude/skills, ~/.codex/skills, and ~/.cursor/skills. At project level, it reads .agents/skills, .claude/skills, and .codex/skills. However, when the same skill exists at both user and project levels, Cursor prefers the user version and ignores the project version.

Why These Differences Exist¶

Each architectural choice reflects a different assumption about the purpose and lifecycle of skills.

Claude and Codex assume skills are first-class library resources, like packages in a registry. Multiple versions can coexist, and the user should see them. This design prioritizes transparency and choice over automatic conflict resolution.

Copilot CLI assumes skills are scoped resources configured at multiple levels. The hierarchy (subdirectory > project > user) reflects the principle that more specific scopes should take precedence. This design supports the use case where a developer sets up user-level defaults but then refines them for specific projects or directories. It is the pattern familiar from shell configuration (~/.bashrc overridden by project .bashrc) and development tool configuration systems.

Copilot in VS Code and Cursor assume skills have a bounded scope. Copilot in VS Code treats the project root as the boundary, reflecting the model that skills are project-level infrastructure. Cursor prioritizes user-level skills, treating skills as user-level configuration that persists across projects. These choices limit scope overlap and reduce ambiguity at the cost of restricting flexibility.

The Cost of Divergence¶

The architectural diversity creates four deployment risks for developers targeting multiple agents.

Silent Non-Detection¶

A developer places a skill at .agents/skills in a project subdirectory, expecting all agents to find it. Copilot in VS Code silently does not detect it because it reads only project root, not subdirectories. There is no error message or diagnostic, and developers may assume the skill is broken rather than misplaced, wasting time on unproductive debugging.

Skill Shadowing¶

A developer maintains a user-level skill and a project-level skill with the same name. On Copilot CLI, the project version takes precedence; on Cursor, the user version does. If the skills have different content or are modified at some point, then there will be silent inconsistency across agents.

Cross-Project Contamination¶

User-level skills are globally visible across all projects on the machine. A developer creating a skill at ~/.agents/skills for one project may find it bleeding into other projects where it causes unexpected behavior. The standard does not establish conventions for isolating user-level skills or managing their lifecycle, leaving developers to manage contamination manually.

Precedence Unpredictability¶

Developers targeting multiple agents cannot rely on a single precedence rule; Copilot CLI uses subdirectory > project > user, Copilot in VS Code uses user + project (ignores subdirectory), and Cursor uses user (ignores project). A developer who understands Copilot CLI precedence will be surprised to find that the same folder structure produces different results in Cursor.

Why Standards Fail Without Enforcement¶

The Agent Skills standard documents these conventional paths. The standard does not mandate that agents read all paths, nor does it define a required precedence hierarchy. It prescribes conventions that each agent implementation can choose to follow, partially follow, or reinterpret.

This approach assumes convergence through voluntary adoption. In practice, each agent team makes independent architectural choices that reflect their product's design philosophy and their users' expectations. Without enforcement mechanisms, the standard becomes a reference rather than a contract. Developers read the standard, assume it describes reality, and then encounter divergent behavior at deployment time.

Enforceable standards require three components:

Mandatory coverage: Every agent must read at least the generic paths (~/.agents/skills, .agents/skills) to ensure baseline portability.
Documented precedence: The standard should define a precedence hierarchy (e.g., subdirectory > project > user) and require all agents to conform or explicitly document their deviation in prominent developer documentation.
Diagnostic infrastructure: Each agent should expose commands to help developers verify skill visibility at each path. Without diagnostics, developers have no way to validate deployment without trial-and-error testing.

The Agent Skills standard currently specifies none of these. As a result, it functions as aspirational documentation rather than a binding contract between agents and developers.

Actionable Takeaways¶

If you are developing skills for cross-agent deployment:

Test each target agent individually with your skill at each location you plan to deploy. Treat vendor documentation and standard claims as hypotheses to verify, not facts. Build a deployment checklist specific to your target agent set.
Establish a per-project convention for skill placement and document it prominently. Use one location across all agents where possible (e.g., .agents/skills at project root), and accept that some agents may not find project-level skills. Avoid duplicating skills across locations; if you must maintain them in multiple places, use a build tool to ensure they stay in sync.
For user-level skills, use agent-specific paths rather than the generic path. Place Claude skills at ~/.claude/skills, Copilot skills at ~/.copilot/skills, and so on. This avoids creating globally visible skills that may conflict with other projects.
Add a validation step to your skill release process that confirms detection across all target agents before marking a release as complete. Automate this if possible using agent CLIs or APIs.
Engage with agent teams to advocate for enforcement mechanisms in the standard. The conversation between standard maintainers and implementation teams needs to shift from "what should agents do?" to "what must agents do?" and "how do we verify it?" Until that shift happens, cross-agent skill distribution will remain error-prone.