feat: SDK tier assessment CLI and skill by felixweinberger · Pull Request #142 · modelcontextprotocol/conformance

felixweinberger · 2026-02-11T16:24:30Z

Summary

Adds a tier-check CLI subcommand and Claude Code skill for assessing MCP SDK repositories against SEP-1730 (SDK Tiering System).

Motivation and Context

SEP-1730 defines tier requirements for MCP SDKs (conformance pass rates, issue triage, P0 resolution, docs, policies). This PR provides tooling to evaluate SDKs against those requirements — both deterministically (CLI) and with AI-assisted judgment (skill).

Components

tier-check CLI — deterministic checks:

Server + client conformance test pass rates
GitHub issue triage compliance and P0 resolution times
Label taxonomy (supports GitHub native issue types)
Stable release detection
Policy file existence (ROADMAP.md, DEPENDENCY_POLICY.md, VERSIONING.md, dependabot.yml, etc.)
Spec tracking gap

Claude Code skill (.claude/skills/mcp-sdk-tier-audit/) — AI-assisted evaluation:

Documentation coverage against canonical 48-feature checklist
Policy content evaluation (reads only files the CLI found — no searching)
Produces tier classification with evidence tables and remediation guide

Clean separation: CLI handles all deterministic file-existence and metric checks. AI evaluates content quality of files the CLI identified. No duplication.

How Has This Been Tested?

Tested against TypeScript SDK v1.x and Python SDK v1.x
Server + client conformance tests running locally
Reports generated to results/ directory

Types of changes

New feature (non-breaking change which adds functionality)

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added or updated documentation as needed

Additional context

See .claude/skills/mcp-sdk-tier-audit/README.md for full usage documentation covering CLI, Claude Code, other AI agents, and manual review workflows.

Adds a 'tier-check' subcommand to the conformance tool that automates SDK tier assessment against SEP-1730 criteria. Checks performed: - Conformance test pass rate (via everything-server) - GitHub label taxonomy (priority/status/area labels) - Issue triage SLA compliance - P0 bug resolution tracking - Stable release detection - Required file existence (CHANGELOG, SECURITY, etc.) - Spec tracking (SDK release within 30d of spec release) Also includes a Claude Code skill (skills/mcp-sdk-tier-audit/) for judgment-based checks that require codebase analysis (feature coverage, docs quality, policy evaluation). Usage: npx tsx src/index.ts tier-check --repo modelcontextprotocol/typescript-sdk npx tsx src/index.ts tier-check --repo ... --conformance-server-cmd '...' \ --conformance-server-cwd ... --conformance-server-url ... --output json

- Move skill to .claude/skills/ so it's auto-available in Claude Code - Remove feature-coverage subagent (redundant with conformance tests) - Remove hardcoded ~/src/mcp paths from all skill files - Trim conformance server table to TS + Python only - Rename file_existence check to policy_signals (informational, not blocking) - Add GitHub native issue types detection to labels check - Add missing features to docs-coverage checklist (tasks, elicitation URL mode, JSON Schema 2020-12) - Add README with CLI quick start and escape hatch for non-Claude-Code users - Use --limit 500 instead of --limit 100 for gh issue list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use npx @modelcontextprotocol/conformance instead of node dist/index.js - Add full GitHub auth instructions (gh auth login, GITHUB_TOKEN, --token) - Point TS SDK conformance server to typescript-sdk/test/conformance/ - Fix Python SDK URL to localhost:3001/mcp (not TBD) - Remove manual gh issue list / gh release list from SKILL.md (CLI handles it) - Remove Claude Code-specific subagent_type references - Assume user is already in conformance repo - Clean up policy-evaluation-prompt.md: remove redundant grep commands, focus on content evaluation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove unused variable assignments flagged by eslint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add "tier-check" npm script so users can run `npm run tier-check --` instead of `node dist/index.js tier-check` - Update SKILL.md, README, and skill README to use npm run tier-check - Add full conformance examples with --conformance-server-cmd/cwd/url flags and realistic paths (~/src/mcp/typescript-sdk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ut pollution - Add 30s per-scenario timeout to prevent tier-check from hanging - Allow --conformance-server-url without --conformance-server-cmd (server already running) - Move runner status logs to stderr so --output json produces clean JSON - Update SKILL.md and README with --silent flag and pre-start server workflow

The skill now requires two arguments: 1. Local path to the SDK checkout (for direct file inspection) 2. URL where the everything server is already running The GitHub owner/repo is derived from git remote. This eliminates cloning, server startup complexity, and branch confusion (v1.x vs main).

Reports go to results/tier-audits/<sdk>-<date>/ (already gitignored). Claude's console output is now just the tier classification, pass/fail summary line, top 3 actions, and file paths.

- Add checkClientConformance() that runs core client scenarios (initialize, tools_call, elicitation-defaults, sse-retry, auth) by spawning the SDK's conformance client via --client-cmd - Add client_conformance to TierScorecard type - Wire --client-cmd option into the CLI - Update tier logic: both server + client conformance feed into Tier 1 (100%) and Tier 2 (>=80%) requirements - Update terminal and markdown output to show both conformance types - Update skill to auto-detect conformance client or accept explicit client-cmd argument - Update README with new option and examples

- Change executive summary from pipe-delimited line to a readable table with T2/T1 columns - Move assessment and remediation file writing into parallel subagents to keep the main conversation thread clean

… numbered gaps

Fail fast with clear error messages if GitHub CLI is not authenticated or if the conformance server URL is not reachable, rather than failing deep into the scorecard run.

- Claude Code section: explain client-cmd auto-detection for TS/Python, show explicit 3-arg form for other SDKs, add examples for all three - Fix TypeScript build command (npm run build, not pnpm build:all) - Fix Python server command (add --port, use uv sync --package) - Fix Python client path (.github/actions/conformance/client.py) - Expand 'Other SDKs' section with guidance on everything server - Add gh auth login prerequisite to Claude Code steps

Client command is now always passed as the third argument. If omitted, client conformance is skipped and noted as a gap. No more magic path detection — clearer and more predictable.

…icy eval Docs coverage: - Table now has numbered rows matching all 48 non-experimental features from the canonical list (was missing 7: tools text/image/audio/embedded/ error/notifications, protocol version negotiation) - Hardcode total as 48 in summary so agents don't miscount Policy evaluation: - Simplified from deep content analysis to file-existence checks - Dependency policy: DEPENDENCY_POLICY.md, dependabot.yml, or CONTRIBUTING.md section - Roadmap: ROADMAP.md must exist (GitHub milestones alone not sufficient) - Versioning: VERSIONING.md or CONTRIBUTING.md section - Removed GitHub API calls for milestones and releases from policy eval

Create references/feature-list.md with all 48 non-experimental + 5 experimental features. The docs-coverage prompt now references this file instead of duplicating the list. One place to update when features change.

CLI (files.ts): now checks all policy files deterministically — DEPENDENCY_POLICY.md, docs/dependency-policy.md, dependabot.yml, renovate.json, ROADMAP.md, docs/roadmap.md, VERSIONING.md, docs/versioning.md, BREAKING_CHANGES.md (in addition to existing CHANGELOG.md, SECURITY.md, CONTRIBUTING.md). AI policy eval: receives CLI output showing which files exist, then reads ONLY those files to judge content quality. No longer searches the repo for files — clean separation of concerns.

pkg-pr-new · 2026-02-11T16:24:58Z

Open in StackBlitz

npx https://pkg.pr.new/modelcontextprotocol/conformance/@modelcontextprotocol/conformance@142

commit: abed710

pcarleton

can we test on go-sdk and csharp-sdk before merging to see what it displays?

src/tier-check/checks/test-conformance-results.ts

Address PR feedback: conformance.ts was duplicating the normal conformance running code. Now shells out to 'node dist/index.js server/client' with -o to save results to a temp dir, then parses the checks.json files. Also removes --conformance-server-cmd and --conformance-server-cwd options since the server must be pre-started.

felixweinberger · 2026-02-12T15:13:21Z

Outputs from running on latest main of go-sdk:

Outputs from csharp-sdk:

Avoids confusion with src/runner/ (the actual conformance runner). This file just invokes the CLI and parses output.

The tier-check CLI was only counting scenarios that produced a checks.json file. Scenarios that crashed or failed to run (e.g., auth scenarios when OAuth is not implemented) were invisible, making the denominator artificially small (e.g., 4/4 instead of 4/23). Now both checkConformance and checkClientConformance reconcile their parsed results against the known scenario lists, adding failure entries for any expected scenario that didn't produce results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clarify what counts as documented vs just having code: - Conformance test servers don't count as docs or examples - Examples without prose = PARTIAL, not PASS - Go Example* test functions explicitly allowed - Clear PASS/PARTIAL/FAIL verdict definitions

The executive summary and assessment report were missing two SEP-1730 requirements: label taxonomy compliance and spec tracking (new protocol features timeline). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pcarleton

lgtm, a typing nit

pcarleton · 2026-02-12T17:36:56Z

src/tier-check/checks/test-conformance-results.ts

+import { ConformanceResult } from '../types';
+import { listScenarios, listActiveClientScenarios } from '../../scenarios';
+
+interface ConformanceCheck {


i think this can re-use the ConformanceCheck type?

conformance/src/types.ts

Lines 13 to 24 in 37225ce

export interface ConformanceCheck {

id: string;

name: string;

description: string;

status: CheckStatus;

timestamp: string;

specReferences?: SpecReference[];

details?: Record<string, unknown>;

metadata?: Record<string, unknown>;

errorMessage?: string;

logs?: string[];

}

felixweinberger and others added 24 commits February 10, 2026 16:54

fix: resolve lint errors in conformance.ts

cdc7402

Remove unused variable assignments flagged by eslint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: apply prettier formatting

5e62b5f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix prettier formatting in SKILL.md

0a91672

refactor: write detailed reports to files, show concise summary

394b6f4

Reports go to results/tier-audits/<sdk>-<date>/ (already gitignored). Claude's console output is now just the tier classification, pass/fail summary line, top 3 actions, and file paths.

docs: update READMEs for new skill interface and pre-start workflow

b6672f0

simplify: flat file output instead of nested directory

c035764

fix: remediation always shows path to Tier 2 and Tier 1

efae415

improve: table summary output, write reports via subagents

e198a24

- Change executive summary from pipe-delimited line to a readable table with T2/T1 columns - Move assessment and remediation file writing into parallel subagents to keep the main conversation thread clean

improve: list tier gaps as numbered items instead of one-line blob

5307492

improve: finalize summary format with separator, high-priority fixes,…

ffa04c0

… numbered gaps

improve: add pre-flight checks for gh auth and server reachability

6e662a7

Fail fast with clear error messages if GitHub CLI is not authenticated or if the conformance server URL is not reachable, rather than failing deep into the scorecard run.

simplify: remove client-cmd auto-detection, require explicit argument

12eb095

Client command is now always passed as the third argument. If omitted, client conformance is skipped and noted as a gap. No more magic path detection — clearer and more predictable.

refactor: extract canonical feature list into single source of truth

6f3ec84

Create references/feature-list.md with all 48 non-experimental + 5 experimental features. The docs-coverage prompt now references this file instead of duplicating the list. One place to update when features change.

style: apply prettier formatting

f5beda8

Merge branch 'main' into fweinberger/tier-check-cli

de6cfd8

felixweinberger marked this pull request as ready for review February 11, 2026 16:27

revert: undo unrelated console.log change in runner/server.ts

d55be40

felixweinberger requested a review from pcarleton February 11, 2026 17:53

pcarleton requested changes Feb 12, 2026

View reviewed changes

src/tier-check/checks/test-conformance-results.ts Show resolved Hide resolved

felixweinberger added 2 commits February 12, 2026 14:48

docs: add Go and C# SDK examples to README and SKILL.md

4606e0d

felixweinberger and others added 6 commits February 12, 2026 15:48

fix: add --framework net9.0 to C# server command

4ac0834

rename: conformance.ts -> test-conformance-results.ts

d78d689

Avoids confusion with src/runner/ (the actual conformance runner). This file just invokes the CLI and parses output.

style: prettier formatting

b708334

docs: add Labels and Spec Tracking rows to audit report templates

ffde2d9

The executive summary and assessment report were missing two SEP-1730 requirements: label taxonomy compliance and spec tracking (new protocol features timeline). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pcarleton previously approved these changes Feb 12, 2026

View reviewed changes

felixweinberger mentioned this pull request Feb 12, 2026

SDK Working Group Meeting Notes - Feb 11, 2026 modelcontextprotocol/modelcontextprotocol#2237

Closed

fix: reuse ConformanceCheck type from src/types.ts instead of redefining

abed710

felixweinberger dismissed pcarleton’s stale review via abed710 February 12, 2026 17:58

felixweinberger enabled auto-merge (squash) February 12, 2026 17:58

pcarleton approved these changes Feb 12, 2026

View reviewed changes

felixweinberger merged commit 3b4a92b into main Feb 12, 2026
8 checks passed

felixweinberger deleted the fweinberger/tier-check-cli branch February 12, 2026 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SDK tier assessment CLI and skill#142

feat: SDK tier assessment CLI and skill#142
felixweinberger merged 35 commits intomainfrom
fweinberger/tier-check-cli

felixweinberger commented Feb 11, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

pcarleton left a comment

Uh oh!

Uh oh!

felixweinberger commented Feb 12, 2026 •

edited

Loading

Uh oh!

pcarleton left a comment

Uh oh!

pcarleton Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	export interface ConformanceCheck {
	id: string;
	name: string;
	description: string;
	status: CheckStatus;
	timestamp: string;
	specReferences?: SpecReference[];
	details?: Record<string, unknown>;
	metadata?: Record<string, unknown>;
	errorMessage?: string;
	logs?: string[];
	}

Conversation

felixweinberger commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation and Context

Components

How Has This Been Tested?

Types of changes

Checklist

Additional context

Uh oh!

pkg-pr-new bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarleton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felixweinberger commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcarleton left a comment

Choose a reason for hiding this comment

Uh oh!

pcarleton Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felixweinberger commented Feb 11, 2026 •

edited

Loading

pkg-pr-new bot commented Feb 11, 2026 •

edited

Loading

felixweinberger commented Feb 12, 2026 •

edited

Loading