-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Apply semantic diffs to get_commit, get_diff, and get_files #1984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sammorrowdrums/symbol-extraction
Are you sure you want to change the base?
Apply semantic diffs to get_commit, get_diff, and get_files #1984
Conversation
Integrates the semantic diff engine into existing tools that return file patches: - get_commit: Adds Patch field to MinimalCommitFile and applies semantic diff for supported formats (JSON, YAML, CSV, TOML) and structural diff for code files - pull_request_read (get_diff): Splits multi-file raw diff by file and applies semantic diff per-file where beneficial - pull_request_read (get_files): Applies semantic diff to each file's Patch field before returning Unsupported file types keep their original unified diff patches unchanged. Patches are reconstructed from hunks to feed the semantic diff engine — this works well for structured data where the full content is typically in the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR integrates the existing semantic/structural diff engine into the remaining diff-producing GitHub MCP tools so they return more token-efficient patches (not just compare_file_contents). It post-processes per-file patches and multi-file PR diffs to apply semantic diffs where supported, while falling back to the original unified diff when not.
Changes:
pull_request_read.get_diff: post-processes the raw multi-file diff, splitting by file and applying semantic/structural diffs per file when beneficial.pull_request_read.get_files: applies semantic/structural diffs to each file’spatchfield when possible.get_commit: adds apatchfield toMinimalCommitFileand applies semantic/structural diffing to commit file patches.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/github/pullrequests.go | Applies semantic diff processing to PR raw diffs (get_diff) and file patches (get_files). |
| pkg/github/minimal_types.go | Adds patch to MinimalCommitFile and applies semantic diffing during minimal commit conversion. |
| pkg/github/diff_integration.go | Introduces helpers to reconstruct base/head from hunks, split multi-file diffs, and apply semantic diffs per file. |
| pkg/github/diff_integration_test.go | Adds unit tests for patch reconstruction, diff splitting, and semantic application behavior. |
Comments suppressed due to low confidence (1)
pkg/github/pullrequests.go:300
- This mutates each file's
Patchto a semantic/structural diff when possible, but currentget_filestests don't assert anything about thepatchfield or semantic transformation. Add coverage that includes a realistic patch body for a supported format and assert the marshaled response has the transformed patch for that file (and leaves unsupported formats unchanged).
// Apply semantic diffs to file patches where beneficial
for _, file := range files {
if file.Patch != nil && file.Filename != nil {
semanticPatch := applySemanticDiffToUnifiedPatch(file.GetFilename(), file.GetPatch())
file.Patch = &semanticPatch
| } | ||
|
|
||
| semanticPatch := applySemanticDiffToUnifiedPatch(section.filename, section.patch) | ||
| if semanticPatch != section.patch { |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a per-file patch is converted to a semantic diff, the output drops the original per-file diff header (e.g., diff --git, index, new file mode, rename from/to, deleted file mode). This loses important metadata (notably renames and mode changes) and can make the result misleading. Preserve the original header information even when emitting a semantic/structural diff (or explicitly include any rename/mode metadata alongside the semantic diff).
| if semanticPatch != section.patch { | |
| if semanticPatch != section.patch { | |
| // Preserve the original per-file header (including metadata such as renames/mode changes) | |
| result.WriteString(section.header) | |
| if section.header != "" { | |
| result.WriteString("\n") | |
| } |
| // Return the raw response, with semantic diffs applied where beneficial | ||
| return utils.NewToolResultText(processMultiFileDiff(string(raw))), nil |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes get_diff output by post-processing the raw multi-file diff, but existing pull request tests only cover a Markdown diff (which is expected to remain unchanged). Add a test case that includes at least one supported semantic/structural file type (e.g., JSON or Go) and assert that the returned diff is transformed accordingly, so regressions in semantic application/splitting are caught.
This issue also appears on line 296 of the same file.
| for _, file := range commit.Files { | ||
| patch := file.GetPatch() | ||
| if patch != "" { | ||
| patch = applySemanticDiffToUnifiedPatch(file.GetFilename(), patch) | ||
| } |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MinimalCommitFile now includes a patch field and convertToMinimalCommit applies semantic/structural diffing to it. There doesn't appear to be test coverage asserting that get_commit returns the new patch field and that semantic diffing is applied (or that include_diff=false still omits diffs). Adding a focused test will help prevent accidental regressions in commit output.
Import declarations now show package names instead of opaque _import_declaration_N identifiers. Go imports extract short package names (e.g., 'fmt, http, context'). For large import blocks, shows first two and count (e.g., 'fmt, http, ... (5 packages)'). Import blocks are keyed by position rather than name, so changing the imported packages shows as 'modified' with inline diff instead of separate remove+add entries. Added declarations now show their signature (first line) to give context about what was added, rather than just the word 'added'.
…ed files Package clauses now extract the package name from the package_identifier child node, showing 'package handler' instead of '_package_clause_0'. New and deleted code files now show a declaration summary listing all top-level symbols with their signatures, rather than just 'file added' or 'file deleted'. This gives the model a table-of-contents view of what was added/removed.
Multi-var/const blocks now extract all declared names from var_spec_list children (e.g., 'globalMap, oldRegexp' instead of '_var_declaration_19'). For >3 names, shows first two plus count. Var/const blocks are keyed by position (like imports) so adding/removing a variable shows as 'modified' with inline diff rather than remove+add. When name extraction returns empty, falls back to showing the first line of the declaration text instead of opaque '_kind_N' indices. This ensures no tree-sitter internal names appear in output.
When searching for a symbol like 'RegisterRoutes', if no exact match is found, try suffix matching against receiver-qualified names like '(*Handler).RegisterRoutes'. Returns the match only when unambiguous (exactly one suffix match). Error messages now suggest the closest match when a suffix match exists but wasn't used (e.g., when there are multiple types with the same method name). Updated the symbol parameter description to document the receiver prefix format and note that bare names work when unambiguous.
Summary
Integrates the semantic diff engine into the existing tools that return file patches, so models get more token-efficient diffs across all diff-producing tools — not just
compare_file_contents.Tools updated
get_commitpatchfield to each file in the response, with semantic diff appliedpull_request_readget_diffpull_request_readget_filespatchfieldHow it works
Patch reconstruction
The
reconstructFromPatchfunction extracts base and head content from unified diff hunks. This works well because:Testing
Dependencies
Stacked on #1983 (symbol extraction) → #1982 (tree-sitter) → #1981 (semantic data diffs)
Part of #1973