Skip to content

Comments

refactor: move Python static analysis modules to languages/python/static_analysis/#1546

Merged
KRRT7 merged 22 commits intomainfrom
follow-up-reference-graph
Feb 19, 2026
Merged

refactor: move Python static analysis modules to languages/python/static_analysis/#1546
KRRT7 merged 22 commits intomainfrom
follow-up-reference-graph

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Feb 19, 2026

Summary

  • Moves 7 Python-specific static analysis files (static_analysis.py, concolic_utils.py, coverage_utils.py, line_profile_utils.py, edit_generated_tests.py, code_extractor.py, code_replacer.py) from codeflash/code_utils/ to codeflash/languages/python/static_analysis/
  • Updates all ~35 consumer import sites across source and test files
  • Aligns with the multi-language architecture where Python-specific code lives under languages/python/

Test plan

  • uv run python -c "from codeflash.languages.python.static_analysis import ..." — all 7 modules import
  • uv run pytest tests/test_static_analysis.py tests/test_code_replacement.py tests/test_add_needed_imports_from_module.py -x — 74 tests pass
  • uv run ruff check — clean

KRRT7 and others added 2 commits February 19, 2026 04:45
The optimization achieves a **523% speedup** (from 2.29s to 367ms) by eliminating expensive libcst metadata operations and replacing the visitor/transformer pattern with direct AST manipulation.

## Key Performance Improvements

**1. Removed MetadataWrapper (~430ms saved, ~9% of total time)**
- Original: `cst.metadata.MetadataWrapper(cst.parse_module(optimized_code))` then `optimized_module.visit(visitor)` took 5.45s combined
- Optimized: Direct `cst.parse_module(optimized_code)` takes only 183ms
- The metadata infrastructure was unnecessary for this use case since we only need to identify and extract function definitions, not track parent-child relationships

**2. Replaced Visitor Pattern with Direct Iteration (~5.3s saved, ~78% of total time)**
- Original: Used `OptimFunctionCollector` visitor class with metadata dependencies, requiring full tree traversal and metadata resolution
- Optimized: Simple for-loop over `optimized_module.body` to collect functions and classes
- Direct iteration avoids the overhead of visitor callback infrastructure and metadata lookups

**3. Eliminated Transformer Pattern (~87ms saved, ~1.6% of total time)**
- Original: Used `OptimFunctionReplacer` transformer to traverse and rebuild the entire AST
- Optimized: Manual list building with targeted `with_changes()` calls only where needed
- Reduces redundant tree traversals and object creation

**4. Improved Memory Efficiency**
- Pre-allocated data structures instead of using visitor state
- Single-pass collection instead of multiple tree traversals
- Direct list manipulation instead of transformer's recursive rebuilding

## Test Performance Pattern

The optimization excels across all test cases:
- **Simple functions**: 587-696% faster (e.g., `test_replace_simple_function`: 2.62ms → 459μs)
- **Class methods**: 509-549% faster (e.g., `test_replace_function_in_class`: 2.24ms → 367μs)
- **Large files**: Still shows gains even with parsing overhead (e.g., `test_replace_function_in_large_file`: 9.37ms → 7.32ms, 28% faster)
- **Batch operations**: Dramatic improvement in loops (e.g., 1000 iterations: 1.91s → 201ms, 850% faster)

## Impact on Workloads

Based on `function_references`, this optimization benefits:
- **Test suites** that perform multiple function replacements during test execution
- **Code refactoring tools** that need to replace functions while preserving surrounding code
- **Language parity testing** where consistent performance across language support implementations matters

The optimization is particularly valuable for batch processing scenarios (as shown by the 850% improvement in the loop test), making it highly effective for CI/CD pipelines and automated code transformation workflows.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 523% (5.23x) speedup for PythonSupport.replace_function in codeflash/languages/python/support.py

⏱️ Runtime : 2.29 seconds 367 milliseconds (best of 16 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch follow-up-reference-graph).

Static Badge

@claude
Copy link
Contributor

claude bot commented Feb 19, 2026

PR Review Summary

Prek Checks

Fixed and verified. Auto-fixed 4 ruff errors and 6 formatting issues:

  • codeflash/languages/base.py: unsorted imports (I001)
  • codeflash/languages/python/static_analysis/code_replacer.py: unused import (F401 - Language)
  • codeflash/languages/python/support.py: unsorted imports (I001, 2x)
  • 6 files reformatted (line length, import wrapping)

All fixes committed and pushed. Prek now passes cleanly.

Mypy

Pre-existing mypy errors in moved files (libcst union-attr, missing generic type params). These are not introduced by this PR — the refactoring moved files without changing logic.

Code Review

No new critical issues found. Two previously flagged issues still apply:

  1. Index offset bug (code_replacer.py:491-503): When both new classes and new functions are inserted, max_function_index becomes stale after class insertion grows new_body. Existing comment

  2. Self-import nit (treesitter.py:1807): extract_calling_function_source imports TreeSitterAnalyzer from its own module. Existing comment

Test Coverage

File (PR path) Main path Main PR Delta
languages/base.py same 98% 98% 0
languages/javascript/code_replacer.py same 77% new
languages/javascript/support.py same 74% 71% -3
languages/javascript/treesitter.py same 92% 92% 0
languages/python/support.py same 51% 49% -2
python/static_analysis/code_extractor.py code_utils/code_extractor.py 68% 69% +1
python/static_analysis/code_replacer.py code_utils/code_replacer.py 83% 76% -7
python/static_analysis/concolic_utils.py code_utils/concolic_utils.py 88% 88% 0
python/static_analysis/coverage_utils.py code_utils/coverage_utils.py 98% 98% 0
python/static_analysis/edit_generated_tests.py code_utils/edit_generated_tests.py 78% 95% +17
python/static_analysis/line_profile_utils.py code_utils/line_profile_utils.py 87% 87% 0
python/static_analysis/static_analysis.py code_utils/static_analysis.py 88% 88% 0
optimization/function_optimizer.py same 19% 19% 0
optimization/optimizer.py same 19% 19% 0
result/create_pr.py same 67% 67% 0
verification/concolic_testing.py same 33% 33% 0
verification/coverage_utils.py same 22% 22% 0
verification/test_runner.py same 62% 62% 0
Overall (changed files) 60% 65% +5

Notable changes:

  • edit_generated_tests.py: +17% — functions were split out and only Python-relevant ones remain, improving coverage of the remaining code
  • code_replacer.py: -7% — the replace_functions_in_file rewrite added new code paths (new classes, new functions insertion) that are not yet fully exercised by tests
  • javascript/support.py: -3% — new language support methods added with limited test coverage
  • Overall coverage improved by +5 percentage points across changed files

Codeflash Optimization PRs

2 open PRs targeting main (#1389, #1291) — both have CI failures (type-check-cli, pre-commit hooks). Not merging.


Last updated: 2026-02-19T12:00:00Z

github-actions bot and others added 3 commits February 19, 2026 09:54
…2026-02-19T09.51.40

⚡️ Speed up method `PythonSupport.replace_function` by 523% in PR #1546 (`follow-up-reference-graph`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 19, 2026

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 12% (0.12x) speedup for _filter_new_declarations in codeflash/languages/javascript/code_replacer.py

⏱️ Runtime : 783 microseconds 698 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch follow-up-reference-graph).

Static Badge

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 19, 2026

⚡️ Codeflash found optimizations for this PR

📄 1,230% (12.30x) speedup for _insert_declaration_after_dependencies in codeflash/languages/javascript/code_replacer.py

⏱️ Runtime : 4.61 milliseconds 347 microseconds (best of 8 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch follow-up-reference-graph).

Static Badge

Comment on lines +501 to +503
if new_functions:
if max_function_index is not None:
new_body = [*new_body[: max_function_index + 1], *new_functions, *new_body[max_function_index + 1 :]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Index offset when both new classes and new functions are inserted.

max_function_index is computed from the original module body, but after new classes are inserted at line 497, new_body has grown by len(unique_classes) elements. If new_classes_insertion_idx < max_function_index, the function insertion will use a stale index and place new functions at the wrong position.

Example: original body = [import, ClassA, func_B, func_C]max_class_index=1, max_function_index=3. After inserting NewClass at index 1, new_body becomes [import, NewClass, ClassA, func_B, func_C]. Functions would be inserted at index 4 instead of 5.

Consider adjusting the index:

if new_functions:
    offset = len(unique_classes) if new_classes and unique_classes and (new_classes_insertion_idx or 0) <= (max_function_index or 0) else 0
    idx = max_function_index + 1 + offset if max_function_index is not None else ...


"""
try:
from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer, TreeSitterLanguage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is a self-import — extract_calling_function_source is defined in codeflash.languages.javascript.treesitter, so importing TreeSitterAnalyzer and TreeSitterLanguage from the same module is unnecessary. This was carried over from when the function lived in code_extractor.py. Consider removing the import and using the classes directly.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@KRRT7 KRRT7 merged commit 465316a into main Feb 19, 2026
32 of 36 checks passed
@KRRT7 KRRT7 deleted the follow-up-reference-graph branch February 19, 2026 12:45
KRRT7 added a commit that referenced this pull request Feb 19, 2026
refactor: move Python static analysis modules to languages/python/static_analysis/
@KRRT7 KRRT7 restored the follow-up-reference-graph branch February 19, 2026 16:33
KRRT7 added a commit that referenced this pull request Feb 19, 2026
…java-support

Merge main's lang_support refactor (PR #1546) with Java support branch.
Key decisions: grouped Python-specific imports (main's style), kept Java
routing in parse_test_output and verification_utils, added tree-sitter-java
dep and B009 ruff ignore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants