Conversation
Store only the type string instead of the full Jedi Name object, removing the need for arbitrary_types_allowed and the runtime dependency on jedi in the model layer.
Introduces CallGraph that uses Jedi infer()+goto() to build call edges, stores them in codeflash_cache.db with content-hash invalidation, and serves as a drop-in replacement for get_function_sources_from_jedi().
Create CallGraph in Optimizer.run() for Python runs, pass it through FunctionOptimizer to code_context_extractor where it replaces get_function_sources_from_jedi() calls when available.
Covers same-file calls, cross-file calls, class instantiation, nested function exclusion, module-level exclusion, site-packages exclusion, empty/syntax-error files, and cache persistence.
Replace the simple progress bar with a Live + Tree + Panel display that shows files being analyzed, call edges discovered, cache hits, and summary stats during call graph indexing.
…cy summary Add cross-file edge detection to IndexResult, replace tree sub-entries with flat per-file dependency labels using plain language, and add a post-indexing summary panel showing per-function dependency stats.
Use the call graph to sort functions by callee count (most dependencies first) in --all mode without benchmarks, replacing arbitrary ordering.
Separate Jedi analysis (CPU-bound) from DB persistence so uncached files can be analyzed across multiple worker processes. Files are dispatched to a pool of up to 8 workers when >= 8 need indexing, with sequential fallback for small batches or on pool failure.
Use bounded deque for results, batch updates every 8 results with manual refresh to reduce flicker, and filter source_files to Python-only before passing to the call graph indexer.
Add DependencyResolver protocol and IndexResult to base.py, move call_graph.py to languages/python/, and use factory method in optimizer instead of is_python() gating.
…ve paths in call graph Display file paths relative to project root in the call graph live display for easier navigation. Filter indexed files by the language support's file extensions to avoid processing irrelevant file types.
…h sections Split the runtime estimate and PR message into separate log lines to avoid awkward line wrapping. Add console rules between sections for clearer visual separation.
…bles Replace the normalized relational hierarchy (cg_projects → cg_languages → cg_indexed_files/cg_call_edges) with two self-describing tables (indexed_files, call_edges) where every row includes project_root and language as text columns.
Skip dependency resolver creation in CI environments where the cache DB doesn't persist between runs. Also apply ruff formatting to call_graph.py.
PR Review SummaryPrek Checks✅ All checks pass ( Mypy✅ No new mypy errors introduced by this PR. The new Code ReviewStill-open bug (from prior review):
Resolved from prior reviews:
No new critical issues found in the latest changes. The reference graph feature is currently disabled in Test Coverage
Overall project coverage: 79% Coverage notes:
Test results: 2411 passed, 8 failed (all in Last updated: 2026-02-19T07:50:00Z |
Deeply nested expression trees (e.g. large dict/list literals) at module or class level caused the recursive ast.NodeVisitor to exceed Python's default recursion limit. Replace the FunctionWithReturnStatement visitor class with an iterative stack-based traversal.
The optimized code achieves a **26% runtime improvement** by making the AST traversal in `function_has_return_statement` more targeted and efficient.
**Key Optimization:**
The critical change is in how `function_has_return_statement` traverses the AST when searching for `Return` nodes:
**Original approach:**
```python
stack.extend(ast.iter_child_nodes(node))
```
This visits *all* child nodes including expressions, names, constants, and other non-statement nodes.
**Optimized approach:**
```python
for child in ast.iter_child_nodes(node):
if isinstance(child, ast.stmt):
stack.append(child)
```
This only pushes statement nodes onto the stack, since `Return` is a statement type (`ast.stmt`).
**Why This Is Faster:**
1. **Reduced Node Traversal**: In typical Python functions, there are many more expression nodes (variable references, literals, operators, etc.) than statement nodes. For example, a simple `return x + y` has 1 Return statement but multiple Name and BinOp expression nodes underneath. The optimization skips all the expression-level nodes.
2. **Lower Python Overhead**: Fewer nodes in the stack means fewer loop iterations, fewer `isinstance` checks on non-Return nodes, and less list manipulation overhead.
3. **Preserved Correctness**: Since `Return` nodes are always statements in Python's AST (they inherit from `ast.stmt`), filtering to only statement nodes cannot miss any Return nodes.
**Performance Impact by Test Case:**
The optimization shows particularly strong gains for:
- **Functions without returns** (up to 91% faster): Early termination without traversing deep expression trees
- **Large codebases** (34-41% faster on tests with 1000+ functions): The cumulative effect across many function bodies
- **Functions with complex expressions but no returns** (82% faster): Avoiding expensive traversal of unused expression subtrees
- **Generator functions without explicit returns** (64% faster): Skipping yield expression internals
The optimization maintains correctness across all test cases including nested classes, async functions, properties, and various control structures, while delivering consistent runtime improvements.
⚡️ Codeflash found optimizations for this PR📄 26% (0.26x) speedup for
|
Replace per-function SQL loops in get_callees() and count_callees_per_function() with temp table JOINs, and thread resolved path strings through to avoid redundant resolve() calls.
…2026-02-18T22.22.36 ⚡️ Speed up function `find_functions_with_return_statement` by 26% in PR #1460 (`call-graphee`)
|
This PR is now faster! 🚀 @KRRT7 accepted my optimizations from: |
The optimized code achieves a **146% speedup** (from 1.47ms to 595μs) by eliminating the overhead of `ast.iter_child_nodes()` and replacing it with direct field access on AST nodes. **Key optimizations:** 1. **Direct stack initialization**: Instead of starting with `[function_node]` and then traversing into its body, the stack is initialized directly with `list(function_node.body)`. This skips one iteration and avoids processing the function definition wrapper itself. 2. **Manual field traversal**: Rather than calling `ast.iter_child_nodes(node)` which is a generator that yields all child nodes, the code directly accesses `node._fields` and uses `getattr()` to inspect each field. This eliminates the generator overhead and function call costs associated with `ast.iter_child_nodes()`. 3. **Targeted statement filtering**: By checking `isinstance(child, ast.stmt)` or `isinstance(item, ast.stmt)` only on relevant fields (handling both single statements and lists of statements), the traversal focuses on statement nodes where `ast.Return` can appear, avoiding unnecessary checks on expression nodes. **Why this is faster:** - **Reduced function call overhead**: `ast.iter_child_nodes()` is a generator function that incurs call/yield overhead on every iteration. Direct attribute access via `getattr()` is faster for small numbers of fields. - **Fewer iterations**: The line profiler shows the original code's `ast.iter_child_nodes()` line hit 5,453 times (69% of runtime), while the optimized version's field iteration hits only 3,290 times (17.4% of runtime). - **Better cache locality**: Direct field access patterns may benefit from better CPU cache utilization compared to generator state management. **Test case performance:** The optimization shows dramatic improvements particularly for: - **Functions with many sequential statements** (2365% faster for 1000 statements, 1430% faster for 1000 nested functions) - **Simple functions** (234-354% faster for basic return detection) - **Moderately complex control flow** (80-125% faster for nested conditionals/loops) The speedup is consistent across all test cases, with early-return scenarios benefiting the most as the optimization allows faster discovery of the return statement before processing unnecessary nodes.
⚡️ Codeflash found optimizations for this PR📄 147% (1.47x) speedup for
|
…2026-02-18T22.34.56 ⚡️ Speed up function `function_has_return_statement` by 147% in PR #1460 (`call-graphee`)
|
This PR is now faster! 🚀 @KRRT7 accepted my optimizations from: |
# Conflicts: # .codex/skills/.gitignore # .gemini/skills/.gitignore # codeflash/languages/python/context/code_context_extractor.py
Add DependencyResolver parameter back to get_code_optimization_context() that was lost during file move from codeflash/context/ to codeflash/languages/python/context/. When call_graph is available, use it for helper discovery instead of Jedi-based fallback.
feat: add reference graph for Python
Summary
ReferenceGraphincodeflash/languages/python/behind aDependencyResolverprotocol, removingis_python()gating from the optimizerindexed_files,call_edges) with full text keyscompat.pyby removing unnecessary class wrapperTest plan
tests/test_reference_graph.pycovering indexing, caching, cross-file edges, persistenceuv run prek run --from-ref origin/mainpasses