-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add symbol extraction to get_file_contents #1983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sammorrowdrums/tree-sitter-semantic-diff
Are you sure you want to change the base?
Add symbol extraction to get_file_contents #1983
Conversation
Adds an optional 'symbol' parameter to get_file_contents that uses tree-sitter to extract a specific named symbol (function, class, type, method, etc.) from a file. Instead of returning the entire file, only the matching symbol's source code is returned. Supports all languages from the structural diff engine: Go, Python, JavaScript, TypeScript, Ruby, Rust, Java, C/C++. For unsupported file types, returns an error suggesting the feature is not available. If the symbol is not found, the error message includes a list of available symbols in the file to help the model self-correct. This pairs well with the structural diff tool — a model can see which symbols changed via compare_file_contents, then fetch specific symbols via get_file_contents to examine them in detail.
…n docs Documents the tree-sitter structural diff engine, compare_file_contents tool, symbol extraction via get_file_contents, CGO requirement, and how to add new language support. Also updates build commands to include CGO_ENABLED=1.
Symbol text is just code — a ResourceContents wrapper with URI/MIME type adds no value. Use NewToolResultText for a simpler, more natural response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extends the get_file_contents MCP tool to optionally return only the source code for a named symbol (function/class/type/etc.) by reusing the repo’s existing tree-sitter language configs.
Changes:
- Adds a new optional
symbolinput parameter toget_file_contentsand performs symbol extraction for text/code files. - Introduces
ExtractSymbolhelper implementation for symbol lookup using tree-sitter declarations. - Adds unit tests for symbol extraction across several languages and updates the tool schema snapshot.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pkg/github/repositories.go |
Adds symbol parameter parsing and returns extracted symbol source instead of full file content when requested. |
pkg/github/symbol_extraction.go |
New symbol extraction helper built on top of the existing tree-sitter declaration extraction utilities. |
pkg/github/symbol_extraction_test.go |
New unit tests covering symbol extraction behavior across multiple languages and error cases. |
pkg/github/__toolsnaps__/get_file_contents.snap |
Updates tool schema snapshot to include the new symbol parameter. |
Comments suppressed due to low confidence (1)
pkg/github/repositories.go:658
- Repository docs list tool parameters in README, and
get_file_contentscurrently documents params only up throughsha. Since this PR adds a newsymbolinput, regenerate and commit the README/docs output so the published tool docs match the updated schema.
"symbol": {
Type: "string",
Description: "Optional: extract a specific symbol (function, class, type, etc.) from the file. For supported languages, returns only the symbol's source code instead of the entire file. If the symbol is not found, returns a list of available symbols.",
},
| "symbol": { | ||
| Type: "string", | ||
| Description: "Optional: extract a specific symbol (function, class, type, etc.) from the file. For supported languages, returns only the symbol's source code instead of the entire file. If the symbol is not found, returns a list of available symbols.", | ||
| }, | ||
| }, | ||
| Required: []string{"owner", "repo"}, |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the symbol property changes the tool input schema, but Test_GetFileContents in repositories_test.go currently asserts the old schema keys (it checks for sha but not symbol). That test will fail once this PR is merged; update the schema assertions and consider adding a tool-level test case that passes symbol and verifies the extracted symbol is returned.
This issue also appears on line 655 of the same file.
| // ExtractSymbol searches source code for a named symbol and returns its text. | ||
| // It searches top-level declarations first, then recursively searches nested | ||
| // declarations (e.g. methods inside classes). Returns the symbol text and its | ||
| // kind, or an error if the symbol is not found or the language is unsupported. |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment says nested declarations are searched "recursively", but this implementation only checks top-level declarations and then the declarations nested within each top-level declaration. Either adjust the comment to match the actual depth supported, or extend the search to recurse into nested declarations-of-declarations.
| // Search nested declarations (methods inside classes, etc.) | ||
| for _, decl := range decls { | ||
| nested := extractChildDeclarationsFromText(config, decl.Text) | ||
| if text, kind, found := findSymbol(nested, symbolName); found { | ||
| return text, kind, nil | ||
| } | ||
| } | ||
|
|
||
| // Build list of available symbols for the error message | ||
| available := listSymbolNames(config, decls) | ||
| return "", "", fmt.Errorf("symbol %q not found. Available symbols: %s", symbolName, strings.Join(available, ", ")) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the symbol is not found, this code re-parses declaration bodies multiple times (once during the nested search loop, and again while building the available-symbol list). Consider collecting nested declarations/names during the first pass (or caching per top-level decl) to avoid redundant tree-sitter parses, especially on large files.
| // Build list of available symbols for the error message | ||
| available := listSymbolNames(config, decls) | ||
| return "", "", fmt.Errorf("symbol %q not found. Available symbols: %s", symbolName, strings.Join(available, ", ")) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Available symbols" error string is unbounded and can become very large for files with many declarations, which can bloat tool responses and logs. Consider sorting + de-duping, and/or truncating the list (e.g., first N symbols plus a count of remaining) to keep the error response size predictable.
Adds InstructionsFunc to the repos toolset describing how to combine compare_file_contents (structural diff) with get_file_contents symbol extraction for efficient code review. Server instructions focus on multi-tool flows only — single-tool features are already documented in each tool's own description.
Summary
Adds an optional
symbolparameter to theget_file_contentstool that uses tree-sitter to extract a specific named symbol from a file. Instead of returning the entire file contents, only the matching symbol's source code is returned.Example usage
{ "owner": "github", "repo": "github-mcp-server", "path": "pkg/github/repositories.go", "symbol": "GetFileContents" }Returns just the
GetFileContentsfunction definition instead of the entire 900+ line file.How it works
symbolparameter is provided and the file is a supported language, tree-sitter parses the sourcefunction_declaration,method_definition)Supported languages
Go, Python, JavaScript, TypeScript, Ruby, Rust, Java, C/C++ — reuses the tree-sitter configs from the structural diff engine.
Pairs with structural diffs
This creates a powerful workflow:
compare_file_contentsshows which symbols changed (structural diff)get_file_contentswithsymbolfetches just the specific symbol to examineTesting
get_file_contentstests still pass (parameter is optional)Dependencies
Stacked on #1982 (tree-sitter structural diff) → #1981 (semantic data diffs)
Part of #1973